Blockchain Data - S3

We offer Amazon S3 bulk downloads to retrieve massive historical datasets for select data types, delivered in Apache Parquet or CSV format.

Blockchain Datasets

Market TypeFeature TypeSample Files
Mempool*EthereumDownload
Mempool*BitcoinDownload
Mempool*LitecoinDownload
TransactionsBitcoinDownload
TransactionsEthereumDownload
TransactionsArbitrumDownload
TransactionsBNBDownload
TransactionsPolygonDownload
Transaction LogsEthereumDownload
Transaction LogsArbitrumDownload
Transaction LogsBNBDownload
Transaction LogsPolygonDownload
Account BalancesEthereumDownload
Account BalancesArbitrumDownload
Account BalancesBNBDownload
Account BalancesPolygonDownload
BlocksEthereumDownload
BlocksArbitrumDownload
BlocksBNBDownload
BlocksPolygonDownload

*Mempool collection is no longer supported after 2024-07-16 Historical mempool data is still available.


Blockchain Data Fields & Descriptions

Mempool

FieldDescription
blockchainIdAmberdata Unique Identifier for the blockchain network
hashThe transaction hash.
transactionIndexThe transaction index in block. Will be 0 for pending transactions.
blockHashThe block hash.
blockNumberThe blocknumber of the block in which the specified transaction is contained.
createdAtThe timestamp of when the transaction was seen in the mempool.
createdAtNanosecondsThe timestamp for createdAt with nanoseconds.
numLogsThe number of logs in the transaction.
transactionTypeIdThe transaction type: EOA_EOA, EOA_Contract, Contract_contract
contractAddressThe address of the contract if this transaction created a contract and null otherwise.
isCoinbase(UTXO Chains) - first transaction in a block
feeThe transaction fee.
fromContains data about the transaction sender.
cumulativeGasUsedThe total gas used up to and including the transaction.
gas
gasPriceThe value equal to the number of computation units (in Ethereum Wei) to be paid per unit of gas for all computation costs incurred as a result of the execution of this transaction.
gasUsedThe value equal to the total amount of gas used by the transaction.
maxFeePerGasMaximum amount that can be paid to validate and include this transaction in the blockchain.
maxPriorityFeePerGasThe value equal to the number of transactions sent by the address.
inputThe input data to the function.
nonceThe value equal to the number of transactions sent by the address.
statusThe status of the transaction (successful, failed, etc.)
timestampThe value equal to the reasonable output of Unix’s time() at this transactions confirmation.
timestampNanosecondsThe transaction timestamp with nanoseconds.
toContains objects that hold data about the recipient address(es).
tos
typeThe transaction type:
0 - legacy
2 - EIP 1559
valueThe scalar value equal to the number of units (in Ethereum Wei) to be transferred to the message call’s recipient or, in the case of contract creation, as an endowment to the newly created contract.
accessList
logsBloomThe Bloom filter composed from indexable information (logger address and log topics) contained in each log entry from the receipt of each transaction in the transactions list.
rValue corresponding to the signature of the transaction and used to determine the sender of
the transaction.
sValue corresponding to the signature of the transaction and used to determine the sender of
the transaction.
vValue corresponding to the signature of the transaction and used to determine the sender of
the transaction.
opProInputValueBTC & LTC only. Value of the transaction input.
opProLockTimeBTC & LTC only. Block height a transaction will be confirmed. Optional.
opProOutputValueBTC & LTC only. Value of the transaction output.
opProSizeBTC & LTC only. The number of bytes that the transaction takes up on the blockchain
opProVersionBTC & LTC only. Version of the transaction format.
opProVirtualSizeBTC & LTC only. A measure of the complexity of the transaction, and it is used to calculate the fee.
opProInputsBTC & LTC only. Input address(es)
opProOutputsBTC & LTC only. Output address(es)

Transactions

FieldBitcoin - DescriptionEthereum - Description
hashThe transaction hash.The transaction hash.
blockNumberThe block number of the block in which the specified transaction is contained.The block number of the block in which the specified transaction is contained.
contractAddressN/AThe address of the contract if this transaction created a contract and null otherwise.
cumulativeGasUsedN/AThe total gas used up to and including the transaction.
fromContains data about the transaction sender.Contains data about the transaction sender.
gasN/AGas used for this transaction.
gasPriceN/AThe value equal to the number of computation units (in Ethereum Wei) to be paid per unit of gas for all computation costs incurred as a result of the execution of this transaction.
gasUsedN/AThe value equal to the total amount of gas used by the transaction.
indexThe transaction index in block. Will be 0 for pending transactions.The transaction index in block. Will be 0 for pending transactions.
inputN/AThe input data to the function.
logsBloomN/AThe Bloom filter composed from indexable information (logger address and log topics) contained in each log entry from the receipt of each transaction in the transactions list.
nonceN/AThe value equal to the number of transactions sent by the address.
numFunctionCallsN/AThe number of function calls made during the execution of a smart contract as part of this transaction.
publicKeyN/AThe public key associated with the sender of the transaction.
rawN/AThe raw hexadecimal representation of the entire transaction.
rootN/AParticularly relevant in the context of Ethereum's pre-EIP-1559 transactions and blocks, refers to the state root of the Ethereum world state after the transaction has been processed.
statusThe status of the transaction (0x1 if successful)The status of the transaction (0x1 if successful)
timestampThe value equal to the reasonable output of Unix’s time() at this transactions confirmation.The value equal to the reasonable output of Unix’s time() at this transactions confirmation.
toContains objects that hold data about the recipient address(es).Contains objects that hold data about the recipient address(es).
valueTotal scalar transaction (UTXO measurement) input value minus transaction feeThe scalar value equal to the number of units (in Ethereum Wei) to be transferred to the message call’s recipient or, in the case of contract creation, as an endowment to the newly created contract.
metaMetadata of the UTXO transaction, such as size of transaction, weight, and block versionContains auxiliary information about the transaction that is not directly related to the execution or processing of the transaction on the Ethereum blockchain.
rN/AValue corresponding to the signature of the transaction and used to determine the sender of the transaction.
sN/AValue corresponding to the signature of the transaction and used to determine the sender of the transaction.
vN/AValue corresponding to the signature of the transaction and used to determine the sender of the transaction.
feesThe transaction fee (in UTXO scalar value)The transaction fee.
isCoinbaseOnly for UTXO Chains - first transaction in a block otherwise known as a coinbase transaction. Not to be confused with Coinbase the CEX.N/A
blockHashThe block hash.The block hash.
typeN/AThe transaction type:
0 - legacy
2 - EIP 1559
adjustedValueTotal amount of value the recipient address receivedAn interpretation or adjustment of the transaction value to provide more context or clarity.
maxFeePerGasN/AMaximum amount that can be paid to validate and include this transaction in the blockchain.
maxPriorityFeePerGasN/AThe value equal to the number of transactions sent by the address.
accessListN/AA feature introduced as part of EIP-2930, used to specify which addresses and storage slots a transaction intends to access.
transactionInputsThe Transaction Input Vector (raw input transaction information)The data provided as input to a transaction, especially when the transaction involves interaction with a smart contract.
transactionOutputsThe Transaction Output Vector (raw output transaction information)The outputs generated by executing a transaction on the Ethereum blockchain.



FAQs

Why is delivery via S3 important?

  • With data in S3, our customers can bulk download historical data in an analytics friendly format. This allows them to dig deep into the data and perform their own proprietary research and test trading strategies without being limited by our REST API throughput.

How does a customer get access to these datasets?

  • Customers will need to have their own AWS credentials in which we will provision for S3 access. If you are interested in downloading data via S3, please contact your Account Executive.

Why Parquet format instead of a GZIP compressed JSON file?

  • Parquet is a columnar storage format for structured data that is optimized for querying and analysis. In Parquet format, data is stored in columns rather than rows, allowing for more efficient compression and encoding of data. This can result in significant performance improvements for analytical workloads that involve large datasets and complex queries. Parquet is widely used in big data environments for data warehousing, analytics, and machine learning applications and can be easily integrated into existing data pipelines.

How do I download a parquet sample file and open it to see which fields are returned?

  • If you only want to see the fields, simply download the sample parquet file, load it as a pandas dataframe in Python and use dataframe.dtypes, that'll give you a quick output of the field types. Here is the code available for you to try out:
#Import the pandas library
import pandas as pd

# Replace 'your_parquet_file.parquet' with the path to your Parquet file
parquet_file = 'your_parquet_file.parquet'

# Load the Parquet file as a pandas DataFrame
df = pd.read_parquet(parquet_file)

# Display the data types of the DataFrame
print(df.dtypes)
  • Now if you wanted to actually read the parquet data, once you've downloaded the sample parquet file, you can run the following Python code:
#Import the pandas library
import pandas as pd   

# Replace 'your_parquet_file.parquet' with the path to your Parquet file
parquet_file = 'your_parquet_file.parquet' 

pd.read_parquet(parquet_file, engine='pyarrow')