Ethereum Virtual Machine Internals – Part 2
This is the second part of our series on Ethereum Virtual Machine (EVM) internals. Part 1, which can be found here, introduced the EVM call context and its architecture, followed by a deep dive into the non-persistent Memory section, function selection and visibility, and how contract control flow can be bypassed at the bytecode level.
This installment will focus on the EVM’s persistent storage region and explore how various Solidity types and data structures are represented and referenced in contract Storage.
EVM Storage and Contract States
In part 1, we dove into the memory section of the call context, used to hold temporary, mutable data necessary for inter-function logic. As a decentralized state machine however, the EVM needs a dedicated mechanism to keep track of persistent data that represents a given contract’s overall state between function calls. This is the job of the EVM’s Storage region.
EVM implementations treat the storage region as a dictionary of 2256 32-byte keys and values. These keys are considered to be logically initialized to zero by default. However, like all data structures that constitute the EVM, it is up to specific implementations to define exactly how the storage region is implemented. Most implementations, including the widely used Go-Ethereum (geth
) client stays true to the overall idea of the Storage region by implementing storage as hash maps or key-value pairs1, as does the C++-based EMVC implementation of the EVM, which uses the C++ standard library std::map
object2.
The relevant opcodes for operating on storage are the SSTORE
and SLOAD
opcodes, which allow writing and reading 32 bytes to and from storage respectively. It should be noted that EVM operations that work on the Storage region incur some of the highest gas fees relative to other opcodes3, therefore a sufficient understanding of this region is especially important for writing gas-efficient contracts.
Storage Slots
EVM implementations use the concept of storage slots, which are discrete sections within the storage space in which state variables are stored.
Within the wider Ethereum network, storage slots are also sometimes referenced using the keccak256
hash of the hexadecimal slot number, left padded to 32 bytes.
keccak256(0x0000…0000) = 0x290decd9548b62a8d60345a988386fc84ba6bc95484008f6362f93160ef3e563
keccak256(0x0000…0001) = 0xb10e2d527612073b26eecdfd717e6a320cf44b4afac2b0732d9fcbe2b7fa0cf6
keccak256(0x0000…0002) = 0x405787fa12a823e0f2b7631cc41b3ba8828b3321ca811111fa75cd3aa3bb5ace
Below is an example of a simple storage layout as shown in Remix of three 32-byte uint256
values, each in their own storage slot:
It is important to note that within the EVM execution context, slot keys are usually referenced via their literal hexadecimal values, most evident when storage slots are referenced by the commonly used SSTORE4
and SSLOAD5
EVM opcodes. However, hashed slot indexes shown by Remix and similar tools play an important role in the wider Ethereum ecosystem.
Merkel Patricia Tries
The reason why slots are sometimes also displayed via a hash of the slot numbers themselves stems from a fundamental data structure used by the wider Ethereum network for storing account data and contract storage, called Merkle Patricia Tries (MPTs)6. In an MPT, each key-value pair is hashed using the keccak256
hash function, and the resulting hash is used as the key for the next level of the trie.
This creates a hierarchical structure where each level of the trie corresponds to a different prefix of the key, and each leaf node in the trie corresponds to a complete key-value pair. MPTs are used in various contexts throughout the Ethereum ecosystem, including for storing account data, contract storage, and transaction receipts.
At a high level, an MPT is a modified version of the standard trie data structure. A trie7 is a generic tree-based data structure used for storing and retrieving key-value pairs. It allows for efficient prefix-based searches by organizing keys based on their common prefixes.
MPTs incorporate elements of Merkle and Patricia tries, which are derivative trie structures. Merkle tries, also known as hash trees, use hashed representations of keys and values to save on search and storage overhead8. Patricia tries are optimized for memory usage by compressing and merging multiple nodes within a single child node, further reducing storage requirements9.
The Bitcoin network uses Merkle tries to archive and verify the integrity of transactions that have been confirmed with enough blocks10, whereas the Ethereum network requires MPTs to facilitate smart contract and account state storage. Outside of blockchain systems, Patricia tries and the more general Radix tries are commonly used in applications that require fast prefix-based searches, such as IP routing tables and dictionary implementations.
Storage slots are referenced by their keccak256
hashes in the MPT when contract balances and other state information is read and written, largely because MPTs need a standardized method of referencing variable-length key values. However, the EVM references storage slots via their explicit storage key values. These are not always hexadecimal values; for more complicated data structures, additional keccak256
hashing must be done to derive slot keys, as will be demonstrated later.
MPTs and the data structures used to maintain the wider Ethereum network are out of scope for this article. The rest of this article will focus on EVM storage. The EVM has a peculiar way of storing state variables; necessitated in part by the EVM’s implementation as a register-less stack machine, and as a means of minimizing the gas costs of storing smart contract states and data.
For the remainder of this article however, we will omit slot indexes when displaying key-value sets in storage for clarity.
Static Variable Storage
Simple Variable Storage & Packing
Consider the following Solidity contract which demonstrates the concepts of simple variable storage and static variable packing. The modify()
function assigns values to the state variables, which have been implemented as unsigned integers of varying sizes. The integers are represented in hex format for clarity. You may follow along with this contract via Remix here.
pragma solidity >=0.7.0 <0.9.0; contract VarPacking { uint256 slot_0; uint128 slot_1; uint64 still_slot_1; uint64 slot_1_again; uint128 slot_2; function modify() public { slot_1 = 0xAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA; // slot 1 slot_0 = 0xBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB; // slot 0 still_slot_1 = 0xCCCCCCCCCCCCCCCC; // still in slot 1 slot_1_again = 0xDDDDDDDDDDDDDDDD; // still in slot 1 slot_2 = 0xEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE; // slot 2 } }
The order in which state variables are assigned values does not affect which slot they will occupy. This is because the storage slots are determined depending on these factors:
- The order in which state variables are declared.
- The type of the state variable.
- The size of the state variable.
- Whether the variable is fixed-size (simple unsigned integers, fixed-size arrays), or dynamically sized (mappings and unbounded arrays).
Until variables are assigned values, no data will be written to the storage slot, and no storage fees are incurred until storage is interacted with. Pay attention to the state variable names, which have been named according to how they will occupy storage slots.
We will now go through the execution of the modify()
function and show how each storage slot is populated.
Note that Solidity compiler optimizations have been turned off for all examples shown in this post, as compiler optimizations can affect the order in which slots are populated. These optimizations do not affect the values stored in the slots, however. As of writing, the main factor that affects which slot a value is stored in is still the order in which the variable was declared.
Integers
unit
and uint256
types are 32 bytes large. When state variables of these types are assigned values, the next available storage slot is used in its entirety. This can be confirmed by deploying the VarPacking
contract, placing a breakpoint on line 12, and debugging the call to modify()
. After the first SSTORE operation, storage will resemble the following:
"key": "0x0000 . . . 0001",
"value": "0x00000000000000000000000000000000aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
The 128-bit value 0xAA..AA
has been written to the latter half of slot 1. Continuing past the next SSTORE
operation will result in the 256-bit value 0xBB…BB
being written to slot 0, as the slot_1
state variable was assigned a value before slot_0
.
"key": "0x0000 . . . 0001", // slot 0x00…01
"value": "0x00000000000000000000000000000000aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
"key": "0x0000 . . . 0000", // slot 0x00…00
"value": "0xbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
Stepping through until the still_slot_1
value is assigned to storage (just before line 14) results in slightly different storage behavior however:
"key": "0x0000 . . . 0001", // slot 0x00…01
"value": "0x0000000000000000ccccccccccccccccaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
"key": "0x0000 . . . 0000", // slot 0x00…00
"value": "0xbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
Variable packing:
Unlike slot_1
, the value for still_slot_1
was stored after the 8th byte of storage slot 1. This is because the consecutively declared slot_1
and still_slot_1
are of types uint128
and uint64
, and are 16 and 8 bytes large respectively. If we continue execution until the end of the modify()
function, we will notice that the values for slot_1
, still_slot_1
, slot_1_again
, and slot_2
are also not all stored in their own storage slots.
To save on storage fees, the EVM has attempted to “pack” these values together such that more than one value can occupy a single storage slot. This is called “variable packing”, and is subject to the following rules11:
- The first item in a storage slot is stored lower-order aligned.
- Value types use only as many bytes as are necessary to store them.
- If a value type does not fit the remaining part of a storage slot, it is stored in the next storage slot.
- Structs and array data always start a new slot and their items are packed tightly according to these rules.
- Items following struct or array data always start a new storage slot.
The storage layout will present the following values after the call has completed:
"key": "0x0000 . . . 0001", // slot 0x00…01
"value": "0xddddddddddddddddccccccccccccccccaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
"key": "0x0000 . . . 0000", // slot 0x00…00
"value": "0xbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb",
"key": "0x0000 . . . 0002", // slot 0x00…02
"value": "0x00000000000000000000000000000000eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee"
Take a moment to reflect on how the previously mentioned variable packing rules are at play. Stepping through the bytecode executed between lines 14 and 15 will show these rules being applied, shifting and rearranging the values as necessary until the SSTORE
opcode is used to preserve the values in the correct slot. Other static types are stored differently; however the same overall goal of maximizing slot usage is maintained.
Compiler Optimizations:
Because opcodes like SSTORE
and SLOAD
are gas intensive, with Solc compiler optimizations turned on, optimized bytecode will be generated which attempts to limit the number of storage operations. This can be observed using the previous VarPacking contract as an example, using this Remix link. Here, the “Enable optimization” flag has been selected under Advanced Configurations in the Solidity Compiler tab. Optimizations are enabled by default in Remix and most other development frameworks.
With optimizations, even though the slot_0
variable was assigned after slot_1
, setting a breakpoint on line 14 shows that storage slot 0 was written to first. This is then followed by a single SSTORE
operation to store the values of slot_1
, still_slot_1
and slot_1_again
to slot 1 within a single operation.
This is because the contract’s bytecode was optimized to pre-arrange the values to be stored in a particular slot before any storage operations are performed. Thus, the previously described variable packing rules are still in effect, however they are instead performed in the Memory region and stack instead of successive storage read/writes.
For simplicity, compiler optimization will be disabled for the remaining contract links in this article.
Fixed-Size Arrays:
Elements in fixed-size arrays are stored as they otherwise would be if the elements were declared sequentially in singular storage variables. After executing modify()
and debugging the transaction for the following contract, each element of the array will occupy its own storage slots. This is only because each element is 32 bytes large. As with singular state variables, array state variable elements can share the same slot, subject to the same packing rules discussed before.
pragma solidity >=0.7.0 <0.9.0; contract FixedArray { uint256[3] arr; function modify() public { // slots 0 and 1 assigned values of 1, 2. // slot 2 will still contain zeros. arr = [1,2]; // slot 2 assigned value of 3 arr[2] = 3; } }
When assigning values for up to n-1 elements in an array, the Solidity compiler will explicitly store a value of 0 for the unassigned values. This can be seen by placing a breakpoint at the first assignment in modify()
and stepping through execution until a value of 0 is pushed to the stack, to eventually be stored at slot 2.
After executing the SSTORE
instruction, slot 2 contains a value of 0.
This is later overwritten by the value of 3 after the final element has been assigned. These variable packing rules suffice for simple unsigned integer types. However, the EVM must perform additional storage packing logic to store more complex data types.
String and byte types:
Strings and byte variations are stored according to their own packing heuristics. Each character in a Solidity string takes up 1 byte. The following contract demonstrates string storage:
pragma solidity >=0.7.0 <0.9.0; contract StringStorage { //same applies to byte types string short_string; string long_string; function modify() public { short_string = "ABCD"; long_string = "ABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCD"; //84 characters long } }
Strings and byte values that are less than or equal to 31 bytes in length are indexed in storage in the following way, where n
represents the next available storage slot for 0 ≤ n
≤ 2256-1 slots:
String/Byte Storage ( of size 1 ≤ 31 bytes) | |
---|---|
Slot Key | Slot Value |
n | String data up to 31 bytes in higher-order bytes, and length of string * 2 in lowest-order byte |
Placing a breakpoint at line 11, after the assignment of short_string
, and debugging the call to modify()
will show the string ABCD (0x41424344)
stored in slot 0, aligned to the higher order of the storage slot. Double the length of the string, 0x08
, is stored at the lower order of the slot (shown in red):
"key": "0x0000 . . . 0000", // slot 0x00…00
"value": "0x414243440000 . . . 0008",
. . .
A different method is used to index and store strings and byte values over 32 bytes in length, however it still retains the concept of storing string/byte data first, followed by the length of the data. Instead of referencing slot keys via their literal values, the keccak256
hash of the previous slot key number is used as the current slot key. This hash is then incremented for every remaining 32 byte chunk of string data. This is summarized in the table below, for n
storage slots, storing i
32 byte chunks of string data, for 0 ≤ n
, i
≤ 2256-1:
String/Byte Storage ( of size 1 ≤ 32 bytes) | |
---|---|
Slot Key | Slot Value |
. . . | . . . |
keccak256(n)+(i-1 ) | i-1th 32 byte chunk of string data. |
keccak256(n)+i | ith 32 byte chunk of string data, left-aligned with 0 if i ≤ 32 bytes (e.g., 0x41424344 . . . 000000) |
n+1 | (Length of string/bytes * 2) + 1 in lower-order bytes |
After long_string
has been assigned its value, we see several more storage keys being used:
. . .
"key": "0xb10e . . . 0cf6", // slot keccak256(0x00…00)
"value": "x4142434441424344414243444142434441424344414243444142434441424344",
"key": "0xb10e . . . 0cf7", // slot keccak256(0x00…00) + 0x00…01
"value": "0x4142434441424344414243444142434441424344414243444142434441424344",
"key": "0xb10e . . . 0cf8", // slot keccak256(0x00…00) + 0x00…02
"value": "0x4142434441424344414243444142434441424344000000000000000000000000",
"key": "0x0000. . . 0001", // slot 0x00…01
"value": "0x0000 . . . 00a9"
In this case, the keccak256
hash of the storage identifier for slot 1 (0xb10e…0cf6
) is used as the first slot key for the first 32-byte chunk of the string. This slot key is incremented by one for all remaining 32-byte chunks, with the last chunk being padded with zeros if it is less than 32 bytes in length.
Finally, the length of the string is stored at the storage slot corresponding to when the long string was declared. We can see that this mirrors the way shorter string storage variables are stored, with the data taking up higher order bytes, and the length of the string in the lower order bytes.
Enums
The storage of the enum
type is not currently covered in the Solidity documentation12. Testing shows that enum
elements are stored as lower-aligned single-byte values within a single slot for enums
up to 32 elements in size.
This is evident with the following contract, with enum E
containing 35 elements. Note that all elements of enum E
and their assignments in the modify()
function have been omitted for brevity in the example below:
pragma solidity >=0.7.0 <0.9.0; contract EnumStorage { enum E {t1, . . . ,t35} E e1; . . . E e35; function modify() public { e1 = E.t1; . . . e35 = E.t35; } }
For enums
of more than 32 elements, the next available slot n
is used to store the ith
group of up to 32 elements, for 0 ≤ n
,i
≤ 2256-1:
Enums | |
---|---|
Slot Key | Slot Value |
. . . | . . . |
n+(i-1) | i-1th set of 32 elements, each represented as a lower-ordered single byte. |
n+i | ith set of up to 32 elements, each represented as a lower-ordered single byte, padded with zeros if ≤ 31 elements. |
Storage will resemble the following after executing modify()
in this contract.
"key": "0x00 . . . 00", // slot 0x00…00
"value": "0x1f1e1d1c1b1a191817161514131211100f0e0d0c0b0a09080706050403020100",
"key": "0x00 . . . 01", // slot 0x00…01
"value": "0x0000000000000000000000000000000000000000000000000000000000222120"
Structs
In Solidity, structs
can include elements of different data types. Like fixed-size arrays, the elements of struct state variables are allocated slots sequentially based on declaration order and type. Contract developers can save considerably on gas fees by declaring the elements of state variable structs to maximize the amount of data stored per slot13. This is demonstrated in the following contract:
pragma solidity >=0.7.0 <0.9.0; contract StructStorage { struct S1 { uint128 a; uint256 b; uint128 c; } struct S2 { uint128 d; uint128 e; uint256 f; } S1 expensive_struct; S2 cheaper_struct; function modify() public { expensive_struct.a = 1; expensive_struct.b = 1; expensive_struct.c = 1; } function modify2() public { cheaper_struct.d = 1; cheaper_struct.e = 1; cheaper_struct.f = 1; } }
EIP-2929 and Access Sets
According to Remix, the cost of executing modify()
was reported as 66588 gas, whereas modify2()
incurred only 44760 gas. This is because assigning values to all expensive_struct
’s members requires 3 storage slots, while cheaper_struct
only requires two.
Struct | Execution Cost (gas) | Storage Changes After Execution |
expensive_struct | 66588 |
"key": "0x00…00", |
cheaper_struct | 44760 |
"key": "0x00…03", |
This is because the first two elements of struct S2
, which are of type uint128
and thus 16 bytes each, can be packed into the same slot, whereas the second element of struct S1
is a full 32 bytes, and therefore requires that each element of this type is stored in its own slot.
Writing non-zero values to “untouched” storage slots is more expensive than writing to a slot that has already been written to. This can be confirmed by deploying the StructStorage
contract and executing the modify2()
function. Debug the transaction, and step through until the first call to SSTORE
. Observe that 20000 gas is required to write to slot 0 in the Step Details window.
Stepping through to the next SSTORE
operation, which operates on the same storage slot, will instead require only 100 gas.
This is because the Berlin hard fork of the Ethereum network in April 2021 brought various changes to the network. One such change, EIP-292914, increased the gas cost of accessing “cold” (not already accessed) storage slots and account addresses relative to “warm” (previously accessed) ones. To differentiate between cold slots/addresses and ones that have been “warmed up” by means of prior access, the concept of Access Sets15 were introduced.
The motivation behind this was that storage-accessing opcodes were considered underpriced in terms of gas execution fees. The 2016 Shanghai DDOS attack against the Ethereum network exploited this, as the low gas fee of the EXTCODESIZE
opcode was abused16 to force nodes and miners to read state information from disk, thus congesting the network. The specific calculations for warm or cold reads and writes are best covered in the most recent version of the Ethereum Yellow Paper17.
Appendix G lays out the gas costs for common operations, including accessing warm and cold slots/addresses via Gwarmaccess and Gsset respectively. Note that writing zero values to any storage is still treated as if accessing a warm storage slot, as all storage slots are considered to be logically zero-initialized by default18.
Dynamic Variable Storage
The state variables demonstrated so far were all fixed-size types. With fixed-size state variables, the EVM can know ahead of time where and how to best store them according to packing rules. This is not possible for dynamic sized types such as mappings and dynamic-sized arrays. Storage of dynamic-sized array state variables are demonstrated in the following contract.
Dynamic arrays
pragma solidity >=0.7.0 <0.9.0; contract DynamicArray { uint256[] ints; uint256[][] int_ints; function modify() public { ints.push(0xaa . . . aa); ints.push(0xbb . . . bb); int_ints.push(ints); int_ints.push(ints); int_ints.push(ints); } }
Dynamic arrays are stored using a similar hash-of-hashes trick as with string and byte types. They are primarily referenced by a “base” slot identifier that is continually updated with the number of elements currently assigned to a dynamic array. The slot corresponding to the keccak256
hash of this base slot identifier then holds the first element of the array. For all subsequent elements, the inner hash is incremented by one, and a hash of that value is taken.
This process is summarized below, storing the number of elements in a one-dimensional dynamic array at the n
th storage slot, with the i
th element in the array being stored at the keccak256(n)
th storage slot, for 0 ≤ n
,i
≤ 2256-1:
One-Dimensional Dynamic Array Storage | |
---|---|
Slot Key | Slot Value |
n | The number of elements in the array so far |
. . . | . . . |
keccak256(n)+(i-1) | The i-1th element in the array |
keccak256(n)+i | The ith element in the array |
In this simple example, storage will resemble the following once modify()
has been executed:
"key": "0x0000. . . 0000", // slot 0x00…00
"value": "0x0000. . . 0002",
"key": "0x290d . . . e563", // slot keccak256(0x00…00)
"value": "0xaaaa . . . aaaa",
"key": "0x290d . . . e564", // slot keccak256(0x00…00) + 0x00…01
"value": "0xbbbb . . . bbbb",
. . .
Multidimensional dynamic arrays are stored via the same rule applied recursively. For a two-dimensional dynamic array, the nth storage slot contains the number of sub-arrays in the two-dimensional array stored so far. The index of the i
th sub-array is stored at the keccak256(n)+i
th storage slot, and the j
th element of the i
th sub-array is stored at the keccak256(keccak256(n)+i)+j
th storage slot, for 0 ≤ n
,i
,j
≤ 2256-1:
Two-Dimensional Dynamic Array Storage | |
---|---|
Slot Key | Slot Value |
n | Number of elements in the outermost array so far |
. . . | . . . |
keccak256(n)+(i-1) | i-1th sub-array |
. . . | . . . |
keccak256(keccak256(n)+(i-1))+(j-1) | j-1th element in the i-1th sub-array |
keccak256(keccak256(n)+(i-1))+j | jth element in the i-1th sub-array |
keccak256(n)+i |
ith sub-array |
. . . | . . . |
keccak256(keccak256(n)+i)+(j-1) | j-1th element in the ith sub-array |
keccak256(keccak256(n)+i)+j |
jth element in the ith sub-array |
This is repeated for the second and third arrays within int_ints
, demonstrated in the storage representation below:
. . .
"key": "0x00 . . . 01", // slot 0x00…01
"value": "0x00 . . . 03"
"key": "0xb10e . . . 0cf6", // slot keccak256(0x00…01)
"value": "0x00 . . . 02"
"key": "0xb5d9 . . . 7d22", // slot keccak256(keccak256(0x00…01))
"value": "0xaa . . . aa"
"key": "0xb5d9 . . . 7d23", // slot keccak256(keccak256(0x00…01)) + 0x00…01
"value": "0xbb . . . bb"
"key": "0xb1e2 . . . 0cf7", // slot keccak256(0x00…01) + 0x00…01
"value": "0x00 . . . 02",
"key": "0xea78 . . . bd31", // slot keccak256(keccak256(0x00…01) + 0x00…01)
"value": "0xaa . . . aa",
"key": "0xea78 . . . bd32", // slot keccak256(keccak256(0x00…01) + 0x00…01) + 0x00…01
"value": "0xbb . . . bb",
"key": "0xb1e2 . . . 0cf8", // slot keccak256(0x00…01) + 0x00…02
"value": "0x00 . . . 02",
"key": "0xb327 . . . e02b", // slot keccak256(keccak256(0x00…01) + 0x00…02)
"value": "0xaa . . . aa",
"key": "0xb327 . . . e02c", // slot keccak256(keccak256(0x00…01) + 0x00…02) + 0x00…01
"value": "0xbb . . . bb"
Aside – Deterministic Pitfalls – Storing Private Data
The Storage region provides a deterministic way of storing state data in the EVM. For instance, the exact storage slot of an m-dimensional array can be predicted before a contract implementing the array is even deployed. For an m-dimensional array, storage logic can be generalized by applying the storage heuristic of a two-dimensional array recursively. This is shown via the following expression for the j
th element in the i
th index of the m-dimensional array, declared starting at slot n
, where 0 ≤ n
,m
,i
,j
≤ 2256-1:
keccak256(keccak256(...(keccak256(keccak256(n)+i[1])+i[2])...+i[m-1])+i[m])+j
For a sufficiently-populated, 4-dimensional dynamic array arr[][][][]
, the base of which is stored at slot 3, the slot key for the element at arr[1][0][8][1]
can therefore be calculated as follows, and confirmed by viewing the fourth-to-last slot key written after executing the modify()
function of this contract:
keccak256(keccak256(keccak256((keccak256(0x00…03)+ 0x00…01))+ 0x00…00)+ 0x00…08)+ 0x00…01 = 0xb8928d09db2f3fc6a2c8bd4dafbdf7cd5aa6c337f2c2fad8d85a5e908c8ddf49
Any data in a contract’s Storage region is publicly available using block explorers or web3 clients. For example, the contents of storage slot 0 of the live Wrapped Ethereum (WETH) contract19 can be quickly queried using the cast
20 web3 CLI client from the Foundry development suite, given a free-tier Infura21 API access or another RPC endpoint:
$ cast storage 0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2 0 --rpc-url 'https://mainnet.infura.io/v3/<API-KEY>'
0x577261707065642045746865720000000000000000000000000000000000001a
While this data is simply the string Wrapped Ether
, subject to the previously described string storage logic, this is why storing sensitive data within a contract’s Storage region constitutes a security risk as contract storage is publicly available to all.
Mappings
Mappings in Solidity are similar to hash tables or dictionaries in other languages, the storage of which is demonstrated by this contract:
pragma solidity >=0.7.0 <0.9.0; contract Mappings { struct S { uint256 a; uint256 b; } mapping(uint256 => uint256 ) simple_map; mapping(uint256 => S) struct_map; mapping(uint256 => mapping(uint256 => S)) nested_map; function modify() public { simple_map[0] = 0x11 . . . 11; simple_map[1] = 0x22 . . . 22; struct_map[0] = S(0xaa . . . aa, 0xbb . . . bb); struct_map[1] = S(0xcc . . . cc, 0xdd . . . dd); nested_map[0][1] = S(0xee . . . ee, 0xff . . . ff); } }
Mapping values are also initialized to zero by default, meaning that mappings are assigned values during declaration. They may only be stored in storage, and the keccak256
hash of keys are stored in place of the literal keys. This is why it is not possible to directly query the length of a mapping in Solidity 22.
Mapping slot keys are derived by concatenating the storage slot key of the mapping itself (n)
and the index of the key where the element is to be stored (i)
, for all 0 ≤ n
,i
≤ 2256-1:
Mapping Storage | |
---|---|
Slot Key | Slot Value |
keccak256(i-1 . n) | ith element of mapping at storage slot n |
This can be confirmed by debugging the call to modify()
and observing the storage after simple_map[0]
is assigned a value of 0x11..11
. This value will be stored at storage slot keccak256(0x00..00 . 00..00) = 0xad32 . . . 5fb5
.
"key": "0xad32 . . . 5fb5", // slot keccak256(0x00…00 . 00…00)
"value": "0x1111 . . . 1111", // value of simple_map[0]
"key": "0xada5 . . . 9e7d", // slot keccak256(0x00…01 . 00…00)
"value": "0x2222 . . . 2222", // value of simple_map[1]
. . .
Mapping values like structs or fixed-length arrays combines the storage heuristic for structs and mappings. State variables of this particular type are stored by incrementing this slot key for each element in the value, and storing element values at these incremented slot keys. This can be observed by stepping through line 17, where the struct S(0xa…,0xb…)
is assigned as the first value in struct_map
:
. . .
"key": "0xa6ee . . . cb49", // slot keccak256(0x00…00 . 00…01)
"value": "0xaaaa . . . aaaa", // value of struct_map[0].a
"key": "0xa6ee . . . cb4a", // slot keccak256(0x00…00 . 00…01) + 0x00…01
"value": "0xbbbb . . . bbbb", // value of struct_map[0].b
. . .
Like dynamic arrays, nested mappings are stored recursively. However, since mapping keys are never stored, a somewhat simpler mapping heuristic is used. This involves taking the hash of the outermost mapping, and then taking a hash of this value concatenated with the index of the innermost mapping.
This is summarized below, where:
i
is the index of within the outermost mapping.j
is the index in the innermost mapping.n
is the next available storage slot.- Since
nested_map
storesstructs
, the value ofm
is added to the hash as an index for the element within the struct.
Nested Struct Mapping Storage | |
---|---|
Contents of Slot Key | Value |
keccak256(j . keccak256(i . n)) + k | Value at struct element k at innermost mapping i, at outermost mapping j, stored at slot n |
nested_map
is takes up slot 2, and stores an instance of struct S(0xe…,0xf…)
at index 1 of its innermost mapping. Therefore, the first element in the struct will be stored at key keccak256(0x…01 . keccak256(0x0…0 .0x0…2)) = 0x79c0…d89d
, and the second will be stored at this value incremented by one.
. . .
"key": "0x79c0 . . . e89d", // slot keccak256(0x0…01 . keccak256(0x00…00 .0x00…02))
"value": "0xeeee . . . eeee", // value of nested_map[0][1].a
"key": "0x79c0 . . . e89e", // slot keccak256(0x0…01 . keccak256(0x00…00 .0x00…02)) + 0x00…01
"value": "0xffff . . . ffff" // value of nested_map[0][1].b
Data Location Keywords and Reference Types
Solidity supports three data location keywords: storage
, memory
, and calldata
, which can be used to explicitly provide the data location within the EVM execution context, where certain variables are stored within functions. In all examples thus far, variables have been declared outside of functions and are thus state variables stored in the Storage region.
Whether or not variables require being prefixed with data location keywords depends on whether the variable’s type is considered a reference or value type23. Value types are types that do not store any data other than their own, while reference types store the data of other variables.
Structs, arrays, and mappings are therefore reference types, and almost all others are considered value types. Strings are also considered reference types in most situations. Declaring state variables is the only place where explicitly specifying data location can be omitted. Reference types declared inside functions must be prefixed with a data location keyword. It is currently not possible to declare data locations for non-state variable that are of value types within functions.
The storage
keyword essentially produces a pointer to an already-declared state variable, referred to as a storage pointer. It is not possible to declare new instances of state variables within a function; the storage keyword cannot be used unless a state variable has already been declared. When updating the storage pointer, the original state variable value will also be updated. Therefore, the storage keyword state variable allows for variables to be passed by reference if used in function arguments.
The memory
keyword can be used to specify that reference types declared within functions are to be stored in memory. When used in a function argument, it results in an in-memory copy of the passed argument being created, which will be discarded at the end of the function call if not returned or persisted elsewhere.
One use case for the memory keyword is to save on gas by limiting the amount of storage read/write operations. For example, by declaring a reference type as a state variable, initializing it, and then creating an in-memory copy of this state reference type. The in-memory copy can be manipulated with little to no SSTORE
/SSLOAD
operations. After the variable has been processed, it can be assigned back to the original state variable. This is particularly economical with multiple iterations of the same processing. Like “warm” storage, it is cheaper to write to already-allocated memory.
The calldata
keyword specifies that function arguments are to be stored in a temporary yet immutable region. It can only be specified for function arguments. It is only available for external, read-only function call parameters, however. This is primarily used for data that is sent with the call to the contract and doesn’t need to be changed during the execution of the function. Using calldata
is cheaper than both memory
and storage
. Check the official Solidity documentation for more information regarding data locations and how best to use them to write gas-efficient contracts.
References:
- https://docs.soliditylang.org/en/
- https://ethereum.org/en/developers/
- https://ethereum.github.io/yellowpaper/paper.pdf
- https://www.evm.codes/?fork=shanghai
Citations
- https://github.com/ethereum/go-ethereum/blob/41f89ca94423fad523a4427c9af2c327fe344fe2/core/state/state_object.go#L40
- https://github.com/ethereum/evmc/blob/58e16bf6a1be9a5f445b2dd443f018e55218e777/examples/example_host.cpp#L26https://github.com/ethereum/evmc/blob/58e16bf6a1be9a5f445b2dd443f018e55218e777/examples/example_host.cpp#L26
- https://www.evm.codes/#54
- https://www.evm.codes/#55
- https://www.evm.codes/#54
- https://ethereum.org/en/developers/docs/data-structures-and-encoding/patricia-merkle-trie/
- https://bioinformatics.cvr.ac.uk/trie-data-structure/
- https://www.investopedia.com/terms/m/merkle-tree.asp
- https://www.allisons.org/ll/AlgDS/Tree/PATRICIA/
- https://www.coinbase.com/bitcoin.pdf
- https://docs.soliditylang.org/en/latest/internals/layout_in_storage.html#layout-of-state-variables-in-storage
- https://docs.soliditylang.org/en/v0.8.20/internals/layout_in_storage.html#layout-of-state-variables-in-storage
- https://docs.soliditylang.org/en/v0.8.20/internals/layout_in_storage.html#layout-of-state-variables-in-storage
- https://eips.ethereum.org/EIPS/eip-2929
- https://www.evm.codes/about#accesssets
- https://blog.ethereum.org/2016/09/22/ethereum-network-currently-undergoing-dos-attack
- https://ethereum.github.io/yellowpaper/paper.pdf
- https://ethereum.org/en/developers/tutorials/yellow-paper-evm/#91-basics
- https://etherscan.io/address/0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2#code
- https://book.getfoundry.sh/cast/
- https://www.infura.io/pricing
- https://docs.soliditylang.org/en/latest/types.html#mapping-types
- https://docs.soliditylang.org/en/v0.8.20/types.html#reference-types
Explore more blog posts
CTEM Defined: The Fundamentals of Continuous Threat Exposure Management
Learn how continuous threat exposure management (CTEM) boosts cybersecurity with proactive strategies to assess, manage, and reduce risks.
Balancing Security and Usability of Large Language Models: An LLM Benchmarking Framework
Explore the integration of Large Language Models (LLMs) in critical systems and the balance between security and usability with a new LLM benchmarking framework.
From Informational to Critical: Chaining & Elevating Web Vulnerabilities
Learn about administrative access and Remote Code Execution (RCE) exploitation from a recent Web Application Pentest.