Joachim Breitner's Homepage
A Candid explainer: Opt is special
This is the third post in a series about the interface description language Candid.
The record extension problem
Initially, the upgrade rules of Candid were a straight-forward application of the canonical subtyping rules. This worked and was sound, but it forbid one very commonly requested use case: Extending records in argument position.
The subtyping rule for records say that
record { old_field : nat; new_field : int }
<: record { old_field : nat }
or, in words, that subtypes can have more field than supertypes. This makes intuitive sense: A user of a such a record value that only expects old_field
to be there is not bothered if suddenly a new_field
appears. But a user who expects new_field
to be there is thrown off if it suddenly isn’t anymore. Therefore it is ok to extend the records returned by a service with new fields, but not to extend the record in your methods’s argument types.
But users want to extend their record types over time, also in argument position!
In fact, they often want to extend them in both positions. Consider a service with the following interface, where the CUser
record appears both in argument and result position:
type CUser = record { login : text; name : text };
service C : {
register_user : (CUser) -> ();
get_user_data : (text) -> (CUser);
}
It seems quite natural to want to extend the record with a new field (say, the age
of the user). But simply changing the definition of CUser
to
type CUser = record { login : text; name : text; age : nat }
is not ok, because now register_user
requires the age
field, but old clients don’t provide it.
So how can we allow developers to make changes like this (because they really really want that), while keeping the soundness guarantees made by Candid (because we really really want that)? This question has bothered us for over two years, and we even created a meta issue that records the dozen approached we considered, discussed and eventually ditched.
Dynamic subtyping checks in opt
I will spare you the history lesson, though, and explain the solution we eventually came up with.
In the example above it seems unreasonable to let the developer add a field age
of type nat
. Since there may be old clients around, the service unavoidably has to deal with records that don’t have an age
field. If the code expects an age
value, what should it do in that case?
But one can argue that changing the record type to
type CUser = record { login : text; name : text; age : opt nat }
could work: If the age
field is present, use the value. If the field is absent, inject a null
value during decoding.
In the first-order case, this rather obvious solution would work just fine, and we’d be done. But Candid aims to solve the higher order case, and I said things get tricky here, didn’t I?
Consider another, independent service D
with the following interface:
type DUser = record { login : text; name : text };
service D : {
on_user_added : (func (DUser) -> ()) -> ()
}
This service has a method that takes a method reference, presumably with the intention of calling it whenever a new user was added to service D
. And because the types line up nicely, we can compose these two services, by once calling D.on_user_added(C.register_user)
, so that from now on D
calls C.register_user(…)
. These kind of service mesh-ups are central to the vision of the Internet Computer!
But what happens if the services now evolve their types in different ways, e.g. towards
type CUser = record { login : text; name : text; age : opt nat }
and
type DUser = record { login : text; name : text; age : opt variant { under_age; adult }}
Individually, these are type changes that we want to allow. But now the call from D
to C
may transmit a value of record { …; age = opt variant { under_age } }
when C
expects a natural number! This is precisely what we want to prevent by introducing types, and it seems we have lost soundness.
The best solution we could find is to make the opt
type somewhat special, and apply these extra rules when decoding at an expected type of opt t
.
If this was a record field, and the provided record value doesn’t even have a field of that name, don’t fail but instead treat this as
null
. This handles the first-order example above.If there is a value, look at its type (which, in Candid, is part of the message).
If it is
opt t'
andt' <: t
, then decode as normal.If it is
opt t'
butt' <: t
does not hold, then treat that asnull
.This should only happen in these relatively obscure higher-order cases where services evolved differently and incompatibly, and makes sure that the calls that worked before the upgrades will continue to work afterwards.
It is not advisable to actually use that rule when changing your service’s interface. Tools that assist the developer with an upgrade check should prevent or warn about the use of this rule.
Not strictly required here, but since we are making
opt
special anyways:If its type
t'
is not of the formopt …
, pretend it wasopt t'
, and also pretend that the given value was wrapped inopt
.This allows services to make record field in arguments that were required in the old version optional in a new version, e.g. as a way to deprecated them. So it is mildly useful, although I can report that it makes the meta-theory and implementation rather complex, in particular together with equirecursion and beasts like
type O = opt O
. See this discussion for a glimpse of that.
In the above I stress that we look at the type of the provided value, and not just the value. For example, if the sender sends the value opt vec {}
at type opt vec nat
, and the recipient expects opt vec bool
, then this will decode as null
, even though one could argue that the value opt vec {}
could easily be understood at type opt vec bool
. We initially had that, but then noticed that this still breaks soundness when there are references inside, and we have to do a full subtyping check in the decoder. This is very unfortunate, because it makes writing a Candid decoder a noticeably harder task that requires complicated graph algorithms with memoization (which I must admit Motoko has not implemented yet), but it’s the least bad solution we could find so far.
Isn’t that just dynamic typing?
You might have noticed that with these rules, t <: opt t'
holds for all types t
and t'
. In other words, every opt …
is a top type (like reserved
), thus all optional types are equivalent. One could argue that we are throwing all of the benefits of static typing over board this way. But it’s not quite as bad: It’s true that decoding Candid values now involves a dynamic check inserting null
values in certain situations, but as long as everyone plays by the rules (i.e. upgrades their services according to the Candid safe upgrade rules, and heeds the warning alluded to above), these certain situations will only happen in the obscurest of cases.
It is, by the way, not material to this solution that the special subtyping behavior is applied to “the” opt
type, and a neighboring point in the design space would be a dedicated upgraded t
type constructor. That would allow developers to use the canonical opt
type with the usual subtyping rules and communicate their intent (“clients should consider this field optional” vs. “old clients don’t know about this field, but new clients should use it”) more cleanly – at the expense of having more “non-canonical” types in the type system.
To see why it is convenient if Candid has mostly just the “normal” types, read the next post, which will describe how Candid can be integrated into a host language.
Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.