Joachim Breitner's Homepage
How to audit an Internet Computer canister
I was recently called upon by Origyn to audit the source code of some of their Internet Computer canisters (“canisters” are services or smart contracts on the Internet Computer), which were written in the Motoko programming language. Both the application model of the Internet Computer as well as Motoko bring with them their own particular pitfalls and possible sources for bugs. So given that I was involved in the creation of both, they reached out to me.
In the course of that audit work I collected a list of things to watch out for, and general advice around them. Origyn generously allowed me to share that list here, in the hope that it will be helpful to the wider community.
Inter-canister calls
The Internet Computer system provides inter-canister communication that follows the actor model: Inter-canister calls are implemented via two asynchronous messages, one to initiate the call, and one to return the response. Canisters process messages atomically (and roll back upon certain error conditions), but not complete calls. This makes programming with inter-canister calls error-prone. Possible common sources for bugs, vulnerabilities or simply unexpected behavior are:
Reading global state before issuing an inter-canister call, and assuming it to still hold when the call comes back.
Changing global state before issuing an inter-canister call, changing it again in the response handler, but assuming nothing else changes the state in between (reentrancy).
Changing global state before issuing an inter-canister call, and not handling failures correctly, e.g. when the code handling the callback rolls backs.
If you find such pattern in your code, you should analyze if a malicious party can trigger them, and assess the severity that effect
These issues apply to all canisters, and are not Motoko-specific.
Rollbacks
Even in the absence of inter-canister calls the behavior of rollbacks can be surprising. In particular, rejecting (i.e. throw
) does not rollback state changes done before, while trapping (e.g. Debug.trap
, assert …
, out of cycle conditions) does.
Therefore, one should check all public update call entry points for unwanted state changes or unwanted rollbacks. In particular, look for methods (or rather, messages, i.e. the code between commit points) where a state change is followed by a throw
.
This issues apply to all canisters, and are not Motoko-specific, although other CDKs may not turn exceptions into rejects (which don’t roll back).
Talking to malicious canisters
Talking to untrustworthy canisters can be risky, for the following (likely incomplete) reasons:
The other canister can withhold a response. Although the bidirectional messaging paradigm of the Internet Computer was designed to guarantee a response eventually, the other party can busy-loop for as long as they are willing to pay for before responding. Worse, there are ways to deadlock a canister.
The other canister can respond with invalidly encoded Candid. This will cause a Motoko-implemented canister to trap in the reply handler, with no easy way to recover. Other CDKs may give you better ways to handle invalid Candid, but even then you will have to worry about Candid cycle bombs that will cause your reply handler to trap.
Many canisters do not even do inter-canister calls, or only call other trustwothy canisters. For the others, the impact of this needs to be carefully assessed.
Canister upgrade: overview
For most services it is crucial that canisters can be upgraded reliably. This can be broken down into the following aspects:
- Can the canister be upgraded at all?
- Will the canister upgrade retain all data?
- Can the canister be upgraded promptly?
- Is there a recovery plan for when upgrading is not possible?
Canister upgradeability
A canister that traps, for whatever reason, in its canister_preupgrade
system method is no longer upgradeable. This is a major risk. The canister_preupgrade
method of a Motoko canister consists of the developer-written code in any system func preupgrade()
block, followed by the system-generated code that serializes the content of any stable var
into a binary format, and then copies that to stable memory.
Since the Motoko-internal serialization code will first serialize into a scratch space in the main heap, and then copy that to stable memory, canisters with more than 2GB of live data will likely be unupgradeable. But this is unlikely the first limit:
The system imposes an instruction limit on upgrading a canister (spanning both canister_preupgrade
and canister_postupgrade
). This limit is a subnet configuration value, and sepearate (and likely higher) than the normal per-message limit, and not easily determined. If the canister’s live data becomes too large to be serialized within this limit, the canister becomes non-upgradeable.
This risk cannot be eliminated completely, as long as Motoko and Stable Variables are used. It can be mitigated by appropriate load testing:
Install a canister, fill it up with live data, and exercise the upgrade. If this succeeds with a live data set exceeding the expected amount of data by a margin, this risk is probably acceptable. Bonus points for adding functionality that will prevent the canister’s live data to increase above a certain size.
If this testing is to be done on a local replica, extra care needs to be taken to make sure the local replica actually performs instruction counting and has the same resource limits as the production subnet.
An alternative mitigation is to avoid canister_pre_upgrade
as much as possible. This means no use of stable var
(or restricted to small, fixed-size configuration data). All other data could be
- mirrored off the canister (possibly off chain), and manually re-hydrated after an upgrade.
- stored in stable memory manually, during each update call, using the
ExperimentalStableMemory
API. While this matches what high-assurance Rust canisters (e.g. the Internet Identity) do, this requires manual binary encoding of the data, and is marked experimental, so I cannot recommend this at the moment. - not put into a Motoko canister until Motoko has a scalable solution for stable variable (for example keeping them in stable memory permanently, with smart caching in main memory, and thus obliterating the need for pre-upgrade code.)
Data retention on upgrades
Obviously, all live data ought to be retained during upgrades. Motoko automatically ensures this for stable var
data. But often canisters want to work with their data in a different format (e.g. in objects that are not shared
and thus cannot be put in stable var
s, such as HashMap
or Buffer
objects), and thus may follow following idiom:
stable var fooStable = …;
var foo = fooFromStable(fooStable);
system func preupgrade() { fooStable := fooToStable(foo); })
system func postupgrade() { fooStable := (empty); })
In this case, it is important to check that
- All non-stable global
var
s, or globallet
s with mutable values, have a stable companion. - The assignments to
foo
andfooStable
are not forgotten. - The
fooToStable
andfooFromStable
form bijections.
An example would be HashMaps
stored as arrays via Iter.toArray(….entries())
and HashMap.fromIter(….vals())
.
It is worth pointiong out that a code view will only look at a single version of the code, but cannot check whether code changes will preserve data on upgrade. This can easily go wrong if the names and types of stable variables are changed in incompatible way. The upgrade may fail loudly in this cases, but in bad cases, the upgrade may even succeed, losing data along the way. This risk needs to be mitigated by thorough testing, and possibly backups (see below).
Prompt upgrades
Motoko and Rust canisters cannot be safely upgraded when they are still waiting for responses to inter-canister calls (the callback would eventually reach the new instance, and because of infelicities of the IC’s System API, could possibly call arbitrary internal functions). Therefore, the canister needs to be stopped before upgrading, and started again. If the inter-canister calls take a long time, this mean that upgrading may take a long time, which may be undesirable. Again, this risk is reduced if all calls are made to trustworthy canisters, and elevated when possibly untrustworthy canisters are called, directly or indirectly.
Backup and recovery
Because of the above risk around upgrades it is advisable to have a disaster recovery strategy. This could involve off-chain backups of all relevant data, so that it is possible to reinstall
(not upgrade
) the canister and re-upload all data.
Note that reinstall
has the same issue as upgrade
described above in “prompt upgrades”: It ought to be stopped first to be safe.
Note that the instruction limit for messages, as well as the message size limit, limit the amount of data returned. If the canister needs to hold more data than that, the backup query method might have to return chunks or deltas, with all the extra complexity that entails, e.g. state changes between downloading chunks.
If large data load testing is performed (as Irecommend anyways to test upgradeability), one can test whether the backup query method works within the resource limits.
Time is not strictly monotonic
The timestamps for “current time” that the Internet Computer provides to its canisters is guaranteed to be monotonic, but not strictly monotonic. It can return the same values, even in the same messages, as long as they are processed in the same block. It should therefore not be used to detect “happens-before” relations.
Instead of using and comparing time stamps to check whether Y has been performed after X happened last, introduce an explicit var y_done : Bool
state, which is set to False
by X and then to True
by Y. When things become more complex, it will be easier to model that state via an enumeration with speaking tag names, and update this “state machine” along the way.
Another solution to this problem is to introduce a var v : Nat
counter that you bump in every update method, and after each await
. Now v
is your canister’s state counter, and can be used like a timestamp in many ways.
While we are talking about time: The system time (typically) changes across an await
. So if you do let now = Time.now()
and then await
, the value in now
may no longer be what you want.
Wrapping arithmetic
The Nat64
data type, and the other fixed-width numeric types provide opt-in wrapping arithmetic (e.g. +%
, fromIntWrap
). Unless explicitly required by the current application, this should be avoided, as usually a too large or negatie value is a serious, unrecoverable logic error, and trapping is the best one can do.
Cycle balance drain attacks
Because of the IC’s “canister pays” model, all canisters are prone to DoS attacks by draining their cycle balance, and this risk needs to be taken into account.
The most elementary mitigation strategy is to monitor the cycle balance of canisters and keep it far from the (configurable) freezing threshold.
On the raw IC-level, further mitigation strategies are possible:
If all update calls are authenticated, perform this authentication as quickly as possible, possibly before decoding the caller’s argument. This way, a cycle drain attack by an unauthenticated attacker is less effective (but still possible).
Additionally, implementing the
canister_inspect_message
system method allows the above checks to be performed before the message even is accepted by the Internet Computer. But it does not defend against inter-canister messages and is therefore not a complete solution.If an attack from an authenticated user (e.g. a stakeholder) is to be expected, the above methods are not effective, and an effective defense might require relatively involved additional program logic (e.g. per-caller statistics) to detect such an attack, and react (e.g. rate-limiting).
Such defenses are pointless if there is only a single method where they do not apply (e.g. an unauthenticated user registration method). If the application is inherently attackable this way, it is not worth the bother to raise defenses for other methods.
Related: A justification why the Internet Identity does not use
canister_inspect_message
)
A motoko-implemented canister currently cannot perform most of these defenses: Argument decoding happens unconditionally before any user code that may reject a message based on the caller, and canister_inspect_message
is not supported. Furthermore, Candid decoding is not very cycle defensive, and one should assume that it is possible to construct Candid messages that require many instructions to decode, even for “simple” argument type signatures.
The conclusion for the audited canisters is to rely on monitoring to keep the cycle blance up, even during an attack, if the expense can be born, and maybe pray for IC-level DoS protections to kick in.
Large data attacks
Another DoS attack vector exists if public methods allow untrustworthy users to send data of unlimited size that is persisted in the canister memory. Because of the translation of async-await code into multiple message handlers, this applies not only to data that is obviously stored in global state, but also local data that is live across an await
point.
The effectiveness of such attacks is limited by the Internet Computer’s message size limit, which is in the order of a few megabytes, but many of those also add up.
The problem becomes much worse if a method has an argument type that allows a Candid space bomb: It is possible to encode very large vectors with all values null
in Candid, so if any method has an argument of type [Null]
or [?t]
, a small message will expand to a large value in the Motoko heap.
Other types to watch out:
Nat
andInt
: This is an unbounded natural number, and thus can be arbitrarily large. The Motoko representation will however not be much larger than the Candid encoding (so this does not qualify as a space bomb).It is still advisable to check if the number is reasonable in size before storing it or doing an
await
. For example, when it denotes an index in an array,throw
early if it exceeds the size of the array; if it denotes a token amount to transfer, check it against the available balance, if it denotes time, check it against reasonable bounds.Principal
: APrincipal
is effectively aBlob
. The Interface specification says that principals are at most 29 bytes in length, but the Motoko Candid decoder does not check that currently (fixed in the next version of Motoko). Until then, aPrincipal
passed as an argument can be large (the principal inmsg.caller
is system-provided and thus safe). If you cannot wait for the fix to reach you, manually check the size of the princial (viaPrincipal.toBlob
) before doing theawait
.
Shadowing of msg
or caller
Don’t use the same name for the “message context” of the enclosing actor and the methods of the canister: It is dangerous to write shared(msg) actor
, because now msg
is in scope across all public methods. As long as these also use public shared(msg) func …
, and thus shadow the outer msg
, it is correct, but it if one accidentially omits or mis-types the msg
, no compiler error would occur, but suddenly msg.caller
would now be the original controller, likely defeating an important authorization step.
Instead, write shared(init_msg) actor
or shared({caller = controller}) actor
to avoid using msg
.
Conclusion
If you write a “serious” canister, whether in Motoko or not, it is worth to go through the code and watch out for these patterns. Or better, have someone else review your code, as it may be hard to spot issues in your own code.
Unfortunately, such a list is never complete, and there are surely more ways to screw up your code – in addition to all the non-IC-specific ways in which code can be wrong. Still, things get done out there, so best of luck!
Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.