Blockchain Data - S3
We offer Amazon S3 bulk downloads to retrieve massive historical datasets for select data types, delivered in Apache Parquet or CSV format.
Blockchain Datasets
Market Type | Feature Type | Sample Files |
---|---|---|
Mempool* | Ethereum | Download |
Mempool* | Bitcoin | Download |
Mempool* | Litecoin | Download |
Transactions | Bitcoin | Download |
Transactions | Ethereum | Download |
Transactions | Arbitrum | Download |
Transactions | BNB | Download |
Transactions | Polygon | Download |
Transaction Logs | Ethereum | Download |
Transaction Logs | Arbitrum | Download |
Transaction Logs | BNB | Download |
Transaction Logs | Polygon | Download |
Account Balances | Ethereum | Download |
Account Balances | Arbitrum | Download |
Account Balances | BNB | Download |
Account Balances | Polygon | Download |
Blocks | Ethereum | Download |
Blocks | Arbitrum | Download |
Blocks | BNB | Download |
Blocks | Polygon | Download |
*Mempool collection is no longer supported after 2024-07-16 Historical mempool data is still available.
Blockchain Data Fields & Descriptions
Mempool
Field | Description |
---|---|
blockchainId | Amberdata Unique Identifier for the blockchain network |
hash | The transaction hash. |
transactionIndex | The transaction index in block. Will be 0 for pending transactions. |
blockHash | The block hash. |
blockNumber | The blocknumber of the block in which the specified transaction is contained. |
createdAt | The timestamp of when the transaction was seen in the mempool. |
createdAtNanoseconds | The timestamp for createdAt with nanoseconds. |
numLogs | The number of logs in the transaction. |
transactionTypeId | The transaction type: EOA_EOA, EOA_Contract, Contract_contract |
contractAddress | The address of the contract if this transaction created a contract and null otherwise. |
isCoinbase | (UTXO Chains) - first transaction in a block |
fee | The transaction fee. |
from | Contains data about the transaction sender. |
cumulativeGasUsed | The total gas used up to and including the transaction. |
gas | |
gasPrice | The value equal to the number of computation units (in Ethereum Wei) to be paid per unit of gas for all computation costs incurred as a result of the execution of this transaction. |
gasUsed | The value equal to the total amount of gas used by the transaction. |
maxFeePerGas | Maximum amount that can be paid to validate and include this transaction in the blockchain. |
maxPriorityFeePerGas | The value equal to the number of transactions sent by the address. |
input | The input data to the function. |
nonce | The value equal to the number of transactions sent by the address. |
status | The status of the transaction (successful, failed, etc.) |
timestamp | The value equal to the reasonable output of Unix’s time() at this transactions confirmation. |
timestampNanoseconds | The transaction timestamp with nanoseconds. |
to | Contains objects that hold data about the recipient address(es). |
tos | |
type | The transaction type:0 - legacy2 - EIP 1559 |
value | The scalar value equal to the number of units (in Ethereum Wei) to be transferred to the message call’s recipient or, in the case of contract creation, as an endowment to the newly created contract. |
accessList | |
logsBloom | The Bloom filter composed from indexable information (logger address and log topics) contained in each log entry from the receipt of each transaction in the transactions list. |
r | Value corresponding to the signature of the transaction and used to determine the sender of the transaction. |
s | Value corresponding to the signature of the transaction and used to determine the sender of the transaction. |
v | Value corresponding to the signature of the transaction and used to determine the sender of the transaction. |
opProInputValue | BTC & LTC only. Value of the transaction input. |
opProLockTime | BTC & LTC only. Block height a transaction will be confirmed. Optional. |
opProOutputValue | BTC & LTC only. Value of the transaction output. |
opProSize | BTC & LTC only. The number of bytes that the transaction takes up on the blockchain |
opProVersion | BTC & LTC only. Version of the transaction format. |
opProVirtualSize | BTC & LTC only. A measure of the complexity of the transaction, and it is used to calculate the fee. |
opProInputs | BTC & LTC only. Input address(es) |
opProOutputs | BTC & LTC only. Output address(es) |
Transactions
Field | Bitcoin - Description | Ethereum - Description |
---|---|---|
hash | The transaction hash. | The transaction hash. |
blockNumber | The block number of the block in which the specified transaction is contained. | The block number of the block in which the specified transaction is contained. |
contractAddress | N/A | The address of the contract if this transaction created a contract and null otherwise. |
cumulativeGasUsed | N/A | The total gas used up to and including the transaction. |
from | Contains data about the transaction sender. | Contains data about the transaction sender. |
gas | N/A | Gas used for this transaction. |
gasPrice | N/A | The value equal to the number of computation units (in Ethereum Wei) to be paid per unit of gas for all computation costs incurred as a result of the execution of this transaction. |
gasUsed | N/A | The value equal to the total amount of gas used by the transaction. |
index | The transaction index in block. Will be 0 for pending transactions. | The transaction index in block. Will be 0 for pending transactions. |
input | N/A | The input data to the function. |
logsBloom | N/A | The Bloom filter composed from indexable information (logger address and log topics) contained in each log entry from the receipt of each transaction in the transactions list. |
nonce | N/A | The value equal to the number of transactions sent by the address. |
numFunctionCalls | N/A | The number of function calls made during the execution of a smart contract as part of this transaction. |
publicKey | N/A | The public key associated with the sender of the transaction. |
raw | N/A | The raw hexadecimal representation of the entire transaction. |
root | N/A | Particularly relevant in the context of Ethereum's pre-EIP-1559 transactions and blocks, refers to the state root of the Ethereum world state after the transaction has been processed. |
status | The status of the transaction (0x1 if successful) | The status of the transaction (0x1 if successful) |
timestamp | The value equal to the reasonable output of Unix’s time() at this transactions confirmation. | The value equal to the reasonable output of Unix’s time() at this transactions confirmation. |
to | Contains objects that hold data about the recipient address(es). | Contains objects that hold data about the recipient address(es). |
value | Total scalar transaction (UTXO measurement) input value minus transaction fee | The scalar value equal to the number of units (in Ethereum Wei) to be transferred to the message call’s recipient or, in the case of contract creation, as an endowment to the newly created contract. |
meta | Metadata of the UTXO transaction, such as size of transaction, weight, and block version | Contains auxiliary information about the transaction that is not directly related to the execution or processing of the transaction on the Ethereum blockchain. |
r | N/A | Value corresponding to the signature of the transaction and used to determine the sender of the transaction. |
s | N/A | Value corresponding to the signature of the transaction and used to determine the sender of the transaction. |
v | N/A | Value corresponding to the signature of the transaction and used to determine the sender of the transaction. |
fees | The transaction fee (in UTXO scalar value) | The transaction fee. |
isCoinbase | Only for UTXO Chains - first transaction in a block otherwise known as a coinbase transaction. Not to be confused with Coinbase the CEX. | N/A |
blockHash | The block hash. | The block hash. |
type | N/A | The transaction type: 0 - legacy 2 - EIP 1559 |
adjustedValue | Total amount of value the recipient address received | An interpretation or adjustment of the transaction value to provide more context or clarity. |
maxFeePerGas | N/A | Maximum amount that can be paid to validate and include this transaction in the blockchain. |
maxPriorityFeePerGas | N/A | The value equal to the number of transactions sent by the address. |
accessList | N/A | A feature introduced as part of EIP-2930, used to specify which addresses and storage slots a transaction intends to access. |
transactionInputs | The Transaction Input Vector (raw input transaction information) | The data provided as input to a transaction, especially when the transaction involves interaction with a smart contract. |
transactionOutputs | The Transaction Output Vector (raw output transaction information) | The outputs generated by executing a transaction on the Ethereum blockchain. |
FAQs
Why is delivery via S3 important?
- With data in S3, our customers can bulk download historical data in an analytics friendly format. This allows them to dig deep into the data and perform their own proprietary research and test trading strategies without being limited by our REST API throughput.
How does a customer get access to these datasets?
- Customers will need to have their own AWS credentials in which we will provision for S3 access. If you are interested in downloading data via S3, please contact your Account Executive.
Why Parquet format instead of a GZIP compressed JSON file?
- Parquet is a columnar storage format for structured data that is optimized for querying and analysis. In Parquet format, data is stored in columns rather than rows, allowing for more efficient compression and encoding of data. This can result in significant performance improvements for analytical workloads that involve large datasets and complex queries. Parquet is widely used in big data environments for data warehousing, analytics, and machine learning applications and can be easily integrated into existing data pipelines.
How do I download a parquet sample file and open it to see which fields are returned?
- If you only want to see the fields, simply download the sample parquet file, load it as a pandas dataframe in Python and use dataframe.dtypes, that'll give you a quick output of the field types. Here is the code available for you to try out:
#Import the pandas library
import pandas as pd
# Replace 'your_parquet_file.parquet' with the path to your Parquet file
parquet_file = 'your_parquet_file.parquet'
# Load the Parquet file as a pandas DataFrame
df = pd.read_parquet(parquet_file)
# Display the data types of the DataFrame
print(df.dtypes)
- Now if you wanted to actually read the parquet data, once you've downloaded the sample parquet file, you can run the following Python code:
#Import the pandas library
import pandas as pd
# Replace 'your_parquet_file.parquet' with the path to your Parquet file
parquet_file = 'your_parquet_file.parquet'
pd.read_parquet(parquet_file, engine='pyarrow')
Updated 6 months ago