Blockchain Data - S3

We offer Amazon S3 bulk downloads to retrieve massive historical datasets for select data types, delivered in Apache Parquet or CSV format.

Note: Our data buckets are configured as Requester Pays buckets, meaning your company will be responsible for any Amazon data transfer fees incurred during downloads. To access the data, you must include the following in your request headers: Header: x-amz-request-payer: requester or Parameter: --request-payer requester(for CLI requests).

Ensure this setting is included in all requests to avoid access issues.

Blockchain Datasets

Market Type

Feature Type

Sample Files

Mempool*

Ethereum

Mempool*

Bitcoin

Mempool*

Litecoin

Transactions

Bitcoin

Transactions

Ethereum

Transactions

Arbitrum

Transactions

BNB

Transactions

Polygon

Transaction Logs

Ethereum

Transaction Logs

Arbitrum

Transaction Logs

BNB

Transaction Logs

Polygon

Account Balances

Ethereum

Account Balances

Arbitrum

Account Balances

BNB

Account Balances

Polygon

Blocks

Ethereum

Blocks

Arbitrum

Blocks

BNB

Blocks

Polygon

*Mempool collection is no longer supported after 2024-07-16 Historical mempool data is still available.


Blockchain Data Fields & Descriptions

Mempool

Field

Description

blockchainId

Amberdata Unique Identifier for the blockchain network

hash

The transaction hash.

transactionIndex

The transaction index in block. Will be 0 for pending transactions.

blockHash

The block hash.

blockNumber

The blocknumber of the block in which the specified transaction is contained.

createdAt

The timestamp of when the transaction was seen in the mempool.

createdAtNanoseconds

The timestamp for createdAt with nanoseconds.

numLogs

The number of logs in the transaction.

transactionTypeId

The transaction type: EOA_EOA, EOA_Contract, Contract_contract

contractAddress

The address of the contract if this transaction created a contract and null otherwise.

isCoinbase

(UTXO Chains) - first transaction in a block

fee

The transaction fee.

from

Contains data about the transaction sender.

cumulativeGasUsed

The total gas used up to and including the transaction.

gas

gasPrice

The value equal to the number of computation units (in Ethereum Wei) to be paid per unit of gas for all computation costs incurred as a result of the execution of this transaction.

gasUsed

The value equal to the total amount of gas used by the transaction.

maxFeePerGas

Maximum amount that can be paid to validate and include this transaction in the blockchain.

maxPriorityFeePerGas

The value equal to the number of transactions sent by the address.

input

The input data to the function.

nonce

The value equal to the number of transactions sent by the address.

status

The status of the transaction (successful, failed, etc.)

timestamp

The value equal to the reasonable output of Unix’s time() at this transactions confirmation.

timestampNanoseconds

The transaction timestamp with nanoseconds.

to

Contains objects that hold data about the recipient address(es).

tos

type

The transaction type:
0 - legacy
2 - EIP 1559

value

The scalar value equal to the number of units (in Ethereum Wei) to be transferred to the message call’s recipient or, in the case of contract creation, as an endowment to the newly created contract.

accessList

logsBloom

The Bloom filter composed from indexable information (logger address and log topics) contained in each log entry from the receipt of each transaction in the transactions list.

r

Value corresponding to the signature of the transaction and used to determine the sender of
the transaction.

s

Value corresponding to the signature of the transaction and used to determine the sender of
the transaction.

v

Value corresponding to the signature of the transaction and used to determine the sender of
the transaction.

opProInputValue

BTC & LTC only. Value of the transaction input.

opProLockTime

BTC & LTC only. Block height a transaction will be confirmed. Optional.

opProOutputValue

BTC & LTC only. Value of the transaction output.

opProSize

BTC & LTC only. The number of bytes that the transaction takes up on the blockchain

opProVersion

BTC & LTC only. Version of the transaction format.

opProVirtualSize

BTC & LTC only. A measure of the complexity of the transaction, and it is used to calculate the fee.

opProInputs

BTC & LTC only. Input address(es)

opProOutputs

BTC & LTC only. Output address(es)


Transactions

Field

Bitcoin - Description

Ethereum - Description

hash

The transaction hash.

The transaction hash.

blockNumber

The block number of the block in which the specified transaction is contained.

The block number of the block in which the specified transaction is contained.

contractAddress

N/A

The address of the contract if this transaction created a contract and null otherwise.

cumulativeGasUsed

N/A

The total gas used up to and including the transaction.

from

Contains data about the transaction sender.

Contains data about the transaction sender.

gas

N/A

Gas used for this transaction.

gasPrice

N/A

The value equal to the number of computation units (in Ethereum Wei) to be paid per unit of gas for all computation costs incurred as a result of the execution of this transaction.

gasUsed

N/A

The value equal to the total amount of gas used by the transaction.

index

The transaction index in block. Will be 0 for pending transactions.

The transaction index in block. Will be 0 for pending transactions.

input

N/A

The input data to the function.

logsBloom

N/A

The Bloom filter composed from indexable information (logger address and log topics) contained in each log entry from the receipt of each transaction in the transactions list.

nonce

N/A

The value equal to the number of transactions sent by the address.

numFunctionCalls

N/A

The number of function calls made during the execution of a smart contract as part of this transaction.

publicKey

N/A

The public key associated with the sender of the transaction.

raw

N/A

The raw hexadecimal representation of the entire transaction.

root

N/A

Particularly relevant in the context of Ethereum's pre-EIP-1559 transactions and blocks, refers to the state root of the Ethereum world state after the transaction has been processed.

status

The status of the transaction (0x1 if successful)

The status of the transaction (0x1 if successful)

timestamp

The value equal to the reasonable output of Unix’s time() at this transactions confirmation.

The value equal to the reasonable output of Unix’s time() at this transactions confirmation.

to

Contains objects that hold data about the recipient address(es).

Contains objects that hold data about the recipient address(es).

value

Total scalar transaction (UTXO measurement) input value minus transaction fee

The scalar value equal to the number of units (in Ethereum Wei) to be transferred to the message call’s recipient or, in the case of contract creation, as an endowment to the newly created contract.

meta

Metadata of the UTXO transaction, such as size of transaction, weight, and block version

Contains auxiliary information about the transaction that is not directly related to the execution or processing of the transaction on the Ethereum blockchain.

r

N/A

Value corresponding to the signature of the transaction and used to determine the sender of the transaction.

s

N/A

Value corresponding to the signature of the transaction and used to determine the sender of the transaction.

v

N/A

Value corresponding to the signature of the transaction and used to determine the sender of the transaction.

fees

The transaction fee (in UTXO scalar value)

The transaction fee.

isCoinbase

Only for UTXO Chains - first transaction in a block otherwise known as a coinbase transaction. Not to be confused with Coinbase the CEX.

N/A

blockHash

The block hash.

The block hash.

type

N/A

The transaction type:

  • 0 - legacy
  • 2 - EIP 1559

adjustedValue

Total amount of value the recipient address received

An interpretation or adjustment of the transaction value to provide more context or clarity.

maxFeePerGas

N/A

Maximum amount that can be paid to validate and include this transaction in the blockchain.

maxPriorityFeePerGas

N/A

The value equal to the number of transactions sent by the address.

accessList

N/A

A feature introduced as part of EIP-2930, used to specify which addresses and storage slots a transaction intends to access.

transactionInputs

The Transaction Input Vector (raw input transaction information)

The data provided as input to a transaction, especially when the transaction involves interaction with a smart contract.

transactionOutputs

The Transaction Output Vector (raw output transaction information)

The outputs generated by executing a transaction on the Ethereum blockchain.




FAQs

Why is delivery via S3 important?

  • With data in S3, our customers can bulk download historical data in an analytics friendly format. This allows them to dig deep into the data and perform their own proprietary research and test trading strategies without being limited by our REST API throughput.

How does a customer get access to these datasets?

  • Customers will need to have their own AWS credentials in which we will provision for S3 access. If you are interested in downloading data via S3, please contact your Account Executive.

Why Parquet format instead of a GZIP compressed JSON file?

  • Parquet is a columnar storage format for structured data that is optimized for querying and analysis. In Parquet format, data is stored in columns rather than rows, allowing for more efficient compression and encoding of data. This can result in significant performance improvements for analytical workloads that involve large datasets and complex queries. Parquet is widely used in big data environments for data warehousing, analytics, and machine learning applications and can be easily integrated into existing data pipelines.

How do I download a parquet sample file and open it to see which fields are returned?

  • If you only want to see the fields, simply download the sample parquet file, load it as a pandas dataframe in Python and use dataframe.dtypes, that'll give you a quick output of the field types. Here is the code available for you to try out:
#Import the pandas library
import pandas as pd

# Replace 'your_parquet_file.parquet' with the path to your Parquet file
parquet_file = 'your_parquet_file.parquet'

# Load the Parquet file as a pandas DataFrame
df = pd.read_parquet(parquet_file)

# Display the data types of the DataFrame
print(df.dtypes)
  • Now if you wanted to actually read the parquet data, once you've downloaded the sample parquet file, you can run the following Python code:
#Import the pandas library
import pandas as pd   

# Replace 'your_parquet_file.parquet' with the path to your Parquet file
parquet_file = 'your_parquet_file.parquet' 

pd.read_parquet(parquet_file, engine='pyarrow')