S3 Bulk Downloads
We offer Amazon S3 bulk downloads to retrieve massive historical datasets for select data types, delivered in Apache Parquet format.
Market Data - S3 Datasets
Dataset Type | Dataset Name | Sample Files |
---|---|---|
Spot | Order Book Snapshots | Download |
Spot | Order Book Events | Download |
Spot | Tickers | Download |
Spot | Trades | Download |
Futures | Funding Rates | Download |
Futures | Insurance Funds | Download |
Futures | Liquidations | Download |
Futures | Long/Short Ratio | Download (minutely) Download (hourly) Download (daily) |
Futures | Order Book Updates | Download |
Futures | Order Book Snapshots | Download |
Futures | Open Interest | Download |
Futures | Tickers | Download |
Futures | Trades | Download |
Options | Liquidations | Download |
Options | Open Interest | Download |
Options | Order Book Snapshots | Download |
Options | Order Book Updates | Download |
Options | Tickers | Download |
Options | Trades | Download |
DeFi Data - S3 Datasets
Blockchain Data - S3 Datasets
CME Only - S3 Datasets
Dataset Type | Dataset Name | Sample Files |
---|---|---|
Futures | Open Interest | Download (CME only) |
Futures | Tickers | Download (CME only) |
Futures | Trades | Download (CME only) |
Options | Open Interest | Download (CME only) |
Options | Tickers | Download (CME only) |
Options | Trades | Download (CME only) |
Fields & Descriptions
Market Data
Spot - Order Book Snapshots
Field | Description |
---|---|
exchange | The name of the exchange. |
pair | The name of the asset pair. |
exchangeTimestamp | The time at which the order book snapshot took place. |
exchangeTimestampNanoseconds | The nanosecond part of the exchangeTimestamp where applicable. |
isBid | Indicates if the order is a bid or ask: true for a bid and false for an ask. |
timestamp | The time at which the order book snapshot took place. |
receivedTimestamp | Timestamp when Amberdata received the order book snapshot. |
receivedTimestampNanoseconds | The nanosecond part of the receivedTimestamp. |
metadata | The metadata associated with the order book data. |
sequence | The sequence number provided by the exchange (equal to null if it is not provided by the exchange). |
data | The order book data corresponding to the columns fields. |
maxPrice | The maximum price for the asset pair. Any buy orders you submit higher than this price, will be clamped to this maximum. |
minPrice | The minimum price for the asset pair. Any sell orders you submit lower than this price will be clamped to this minimum. |
Spot - Order Book Events
Field | Description |
---|---|
exchange | The name of the exchange. |
pair | The name of the asset pair. |
exchangeTimestamp | The time at which the order book event took place. |
exchangeTimestampNanoseconds | The nanosecond part of the exchangeTimestamp where applicable. |
isBid | Indicates if the order is a bid or ask: true for a bid and false for an ask. |
receivedTimestamp | Timestamp when Amberdata received order book event. |
receivedTimestampNanoseconds | The nanosecond part of the receivedTimestamp. |
timestamp | Timestamp when Amberdata received order book event. |
metadata | The metadata associated with the order book data. |
sequence | The sequence number provided by the exchange (equal to null if it is not provided by the exchange). |
data | The order book data corresponding to the columns fields. |
Spot - Tickers
Field | Description |
---|---|
exchange | The name of the exchange. |
pair | The name of the asset pair. |
timestamp | The time at which the event occurred. |
timestampNanoseconds | The nanosecond part of the timestamp where applicable. |
exchangeTimestamp | The time at which the event occurred. |
exchangeTimestampNanoseconds | The nanosecond part of the exchangeTimestamp where applicable. |
receivedTimestamp | Timestamp when Amberdata received the ticker data. |
receivedTimestampNanoseconds | The nanosecond part of the receivedTimestamp. |
metadata | The metadata associated with the ticker data. |
sequence | The sequence number provided by the exchange (equal to null if it is not provided by the exchange). |
ask | The ask price of the market pair. |
askVolume | It represents the requested order size of all best asks. |
bid | The bid price of the market pair. |
bidVolume | It represents the requested order size of all best bids. |
last | The last price of the market pair. |
mid | The mid price of the market pair. |
Spot - Trades
Field | Description |
---|---|
exchange | The name of the exchange. |
pair | The name of the asset pair. |
exchangeTimestamp | The time at which the trade took place. |
exchangeTimestampNanoseconds | The nanosecond part of the exchangeTimestamp where applicable. NOTE: This is an optional field whose value may or may not be provided by the exchanges. |
tradeId | The exchange provided id of the trade. |
receivedTimestamp | The time Amberdata received the trade data. |
receivedTimestampNanoseconds | The nanosecond part of the receivedTimestamp. |
metadata | The metadata associated with the trade data. |
isBuySide | Indicates if the trade is a buy or sell: true for a buy and false for a sell. |
price | The price at which the asset was traded. |
quoteSize | Quote size at the moment of trade NOTE: This is an optional field whose value may or may not be provided by the exchanges. |
size | The total amount of that asset that was traded. |
Futures - Funding Rates
Field | Description |
---|---|
exchange | The name of the exchange. |
instrument | The name of the instrument. |
exchangeTimestamp | The time at which the event occurred. |
exchangeTimestampNanoseconds | The nanosecond part of the exchangeTimestamp where applicable. |
receivedTimestamp | Timestamp when Amberdata received the data. |
receivedTimestampNanoseconds | The nanosecond part of the receivedTimestamp. |
fundingInterval | The funding interval for which data is available. |
fundingRate | The funding rate for which data is available. |
nextFundingRate | The next funding rate for which data is available. |
nextFundingTime | The next funding time for which data is available. |
Futures - Insurance Funds
Field | Description |
---|---|
exchange | The name of the exchange. |
instrument | The name of the instrument. |
exchangeTimestamp | The time at which the event occurred. |
exchangeTimestampNanoseconds | The nanosecond part of the exchangeTimestamp where applicable. |
receivedTimestamp | Timestamp when Amberdata received the data. |
receivedTimestampNanoseconds | The nanosecond part of the receivedTimestamp. |
fund | The fund of insurance fund. |
underlying | The underlying asset for the instrument. |
Futures - Liquidations
Field | Description |
---|---|
exchange | The name of the exchange. |
instrument | The name of the instrument. |
exchangeTimestamp | The time at which the event occurred. |
exchangeTimestampNanoseconds | The nanosecond part of the exchangeTimestamp where applicable. |
id | |
receivedTimestamp | Timestamp when Amberdata received the data. |
receivedTimestampNanoseconds | The nanosecond part of the receivedTimestamp. |
timestamp | The time at which the liquidation occurred. |
action | |
orderId | |
price | The price of the instrument at the time of the liquidation. |
side | The direction of the trade. |
status | The status of the liquidation. |
timeInForce | How long the order is to remain active before it is executed or expires, for example: - IOC: immediate-or-cancel - FOK: fill-or-kill - GTC: good-'till-canceled - etc. |
type | The type of liquidation. |
volume |
Futures - Long/Short Ratio
Field | Description |
---|---|
exchange | The name of the exchange. |
instrument | The name of the instrument. |
exchangeTimestamp | The time at which the event occurred. |
exchangeTimestampNanoseconds | The nanosecond part of the exchangeTimestamp where applicable. |
receivedTimestamp | Timestamp when Amberdata received the data. |
receivedTimestampNanoseconds | The nanosecond part of the receivedTimestamp. |
period | The number of the period. |
longAccount | The long account number ratio of all traders |
ratio | The long/short account number ratio of all traders |
shortAccount | The short account number ratio of all traders |
Futures - Order Book Updates
Field | Description |
---|---|
exchange | The name of the exchange. |
instrument | The name of the instrument. |
exchangeTimestamp | The time at which the event occurred. |
exchangeTimestampNanoseconds | The nanosecond part of the exchangeTimestamp where applicable. |
isBid | true if the order is a bid, false otherwise. |
receivedTimestamp | Timestamp when Amberdata received the data. |
receivedTimestampNanoseconds | The nanosecond part of the receivedTimestamp. |
timestamp | The time at which the event occurred. |
sequence | The sequence number (equal to null if it is not provided by the exchange). |
data | The order book data corresponding to the columns fields, aggregated by exchange. |
status |
Futures - Open Interest
Field | Description |
---|---|
exchange | The name of the exchange. |
instrument | The name of the instrument. |
exchangeTimestamp | The time at which the event occurred. |
exchangeTimestampNanoseconds | The nanosecond part of the exchangeTimestamp where applicable. |
receivedTimestamp | Timestamp when Amberdata received the data. |
receivedTimestampNanoseconds | The nanosecond part of the receivedTimestamp. |
type | The type of instrument. |
value | The total outstanding number of contracts. |
Futures - Tickers
Field | Description |
---|---|
exchange | The name of the exchange. |
instrument | The name of the instrument. |
timestamp | Timestamp when Amberdata received the data. |
timestampNanoseconds | The nanosecond part of the timestamp. |
exchangeTimestamp | The time at which the event occurred. |
exchangeTimestampNanoseconds | The nanosecond part of the exchangeTimestamp where applicable. |
receivedTimestamp | Timestamp when Amberdata received the data. |
receivedTimestampNanoseconds | The nanosecond part of the receivedTimestamp. |
sequence | The sequence number (equal to null if it is not provided by the exchange). |
ask | The ask price for instrument. |
askVolume | It represents the requested order size of all best asks. |
baseVolume | |
bid | The bid price for instrument. |
bidVolume | It represents the requested order size of all best bids. |
last | The last price for instrument. |
mid | The mid price for instrument. |
quoteVolume | |
markPrice | |
lastVolume |
Futures - Trades
Field | Description |
---|---|
exchange | The name of the exchange. |
instrument | The name of the instrument. |
exchangeTimestamp | The time at which the event occurred. |
exchangeTimestampNanoseconds | The nanosecond part of the exchangeTimestamp where applicable. |
tradeId | The exchange provided id of the trade. |
receivedTimestamp | Timestamp when Amberdata received the data. |
receivedTimestampNanoseconds | The nanosecond part of the receivedTimestamp. |
timestamp | The time at which the event occurred. |
isBuySide | true if the trade is a buy, false otherwise. |
price | The price at which the asset was traded. |
quoteVolume | |
volume | The total amount of that asset that was traded. |
sequence | |
isLiquidation |
Options - Liquidations
Field | Description |
---|---|
exchange | |
instrument | |
exchangeTimestamp | |
exchangeTimestampNanoseconds | |
id | |
volume | |
sequence | |
receivedTimestamp | |
receivedTimestampNanoseconds | |
metadata | |
action | |
orderId | |
price | |
side | |
status | |
timeInForce | |
type |
Options - Open Interest
Field | Description |
---|---|
exchange | |
instrument | |
exchangeTimestamp | |
exchangeTimestampNanoseconds | |
receivedTimestamp | |
receivedTimestampNanoseconds | |
type | |
value |
Options - Order Book Snapshots
Field | Description |
---|---|
exchange | |
instrument | |
exchangeTimestamp | |
exchangeTimestampNanoseconds | |
sequence | |
receivedTimestamp | |
receivedTimestampNanoseconds | |
metadata | |
stats | |
state | |
asks | |
askIv | |
bids | |
bidIv | |
bestAskAmount | |
bestAskPrice | |
bestBidAmount | |
bestBidPrice | |
estimatedDeliveryPrice | |
greeks | |
indexPrice | |
interestRate | |
lastPrice | |
lastPrice | |
markIv | |
markPrice | |
maxPrice | |
minPrice | |
openInterest | |
underlyingIndex | |
underlyingPrice |
Options - Order Book Updates
Field | Description |
---|---|
exchange | |
instrument | |
exchangeTimestamp | |
exchangeTimestampNanoseconds | |
isBid | |
receivedTimestamp | |
receivedTimestampNanoseconds | |
timestamp | |
metadata | |
sequence | |
data | |
status |
Options - Tickers
Field | Description |
---|---|
exchange | |
instrument | |
exchangeTimestamp | |
exchangeTimestampNanoseconds | |
sequence | |
receivedTimestamp | |
receivedTimestampNanoseconds | |
stats | |
state | |
ask | |
askIv | |
askVolume | |
baseVolume | |
bid | |
bidIv | |
bidVolume | |
quoteVolume | |
lastVolume | |
estimatedDeliveryPrice | |
greeks | |
indexPrice | |
interestRate | |
mid | |
last | |
markIv | |
markPrice | |
maxPrice | |
minPrice | |
openInterest | |
settlementPrice | |
underlyingIndex | |
underlyingPrice | |
open24H | |
high24H | |
low24H |
Options - Trades
Field | Description |
---|---|
exchange | |
instrument | |
exchangeTimestamp | |
exchangeTimestampNanoseconds | |
tradeId | |
isBuySide | |
receivedTimestamp | |
receivedTimestampNanoseconds | |
metadata | |
sequence | |
price | |
quoteSize | |
size | |
tickDirection | |
markPrice | |
iv | |
indexPrice |
CME Only
Futures - Open Interest (CME only)
Field | Description |
---|---|
exchange | The name of the exchange. |
symbol | Instrument Name or Symbol. |
exchangeTimestamp | Timestamp provided by CME. |
exchangeTimestampNanoseconds | The nanosecond part of the exchangeTimestamp where applicable. |
receivedTimestamp | The time Amberdata received the data. |
receivedTimestampNanoseconds | The nanosecond part of the receivedTimestamp. |
sentTime | Time CME MDP gateway sends the message (UTC). |
tradeStatistics | Object - view more details here |
instrument | Object - view more details here |
Futures - Tickers (CME only)
Field | Description |
---|---|
exchange | The name of the exchange. |
symbol | Instrument Name or Symbol. |
exchangeTimestamp | Timestamp provided by CME. |
exchangeTimestampNanoseconds | The nanosecond part of the exchangeTimestamp where applicable. |
receivedTimestamp | The time Amberdata received the data. |
receivedTimestampNanoseconds | The nanosecond part of the receivedTimestamp. |
sentTime | Time CME MDP gateway sends the message (UTC). |
tradingStatus | Identifies the trading status applicable to the instrument or product group. |
instrument | Object - view more details here |
askLevel | Object - view more details here |
bidLevel | Object - view more details here |
Futures - Trades (CME only)
Field | Description |
---|---|
exchange | The name of the exchange. |
symbol | Instrument Name or Symbol. |
exchangeTimestamp | Timestamp provided by CME. |
exchangeTimestampNanoseconds | The nanosecond part of the exchangeTimestamp where applicable. |
receivedTimestamp | The time Amberdata received the data. |
receivedTimestampNanoseconds | The nanosecond part of the receivedTimestamp. |
sentTime | Time CME MDP gateway sends the message (UTC). |
tradeSummary | Object - view more details here |
instrument | Object - view more details here |
Options - Open Interest (CME only)
Field | Description |
---|---|
exchange | The name of the exchange. |
symbol | Instrument Name or Symbol. |
exchangeTimestamp | Timestamp provided by CME. |
exchangeTimestampNanoseconds | The nanosecond part of the exchangeTimestamp where applicable. |
receivedTimestamp | The time Amberdata received the data. |
receivedTimestampNanoseconds | The nanosecond part of the receivedTimestamp. |
sentTime | Time CME MDP gateway sends the message (UTC). |
tradeStatistics | Object - view more details here |
instrument | Object - view more details here |
Options - Tickers (CME only)
Field | Description |
---|---|
exchange | The name of the exchange. |
symbol | Instrument Name or Symbol. |
exchangeTimestamp | Timestamp provided by CME. |
exchangeTimestampNanoseconds | The nanosecond part of the exchangeTimestamp where applicable. |
receivedTimestamp | The time Amberdata received the data. |
receivedTimestampNanoseconds | The nanosecond part of the receivedTimestamp. |
sentTime | Time CME MDP gateway sends the message (UTC). |
tradingStatus | Identifies the trading status applicable to the instrument or product group. |
instrument | Object - view more details here |
askLevel | Object - view more details here |
bidLevel | Object - view more details here |
Options - Trades (CME only)
Field | Description |
---|---|
exchange | The name of the exchange. |
symbol | Instrument Name or Symbol. |
exchangeTimestamp | Timestamp provided by CME. |
exchangeTimestampNanoseconds | The nanosecond part of the exchangeTimestamp where applicable. |
receivedTimestamp | The time Amberdata received the data. |
receivedTimestampNanoseconds | The nanosecond part of the receivedTimestamp. |
sentTime | Time CME MDP gateway sends the message (UTC). |
tradeSummary | Object - view more details here |
instrument | Object - view more details here |
Blockchain Data
Mempool
Field | Description |
---|---|
blockchainId | Amberdata Unique Identifier for the blockchain network |
hash | The transaction hash. |
transactionIndex | The transaction index in block. Will be 0 for pending transactions. |
blockHash | The block hash. |
blockNumber | The blocknumber of the block in which the specified transaction is contained. |
createdAt | The timestamp of when the transaction was seen in the mempool. |
createdAtNanoseconds | The timestamp for createdAt with nanoseconds. |
numLogs | The number of logs in the transaction. |
transactionTypeId | The transaction type: EOA_EOA, EOA_Contract, Contract_contract |
contractAddress | The address of the contract if this transaction created a contract and null otherwise. |
isCoinbase | (UTXO Chains) - first transaction in a block |
fee | The transaction fee. |
from | Contains data about the transaction sender. |
cumulativeGasUsed | The total gas used up to and including the transaction. |
gas | |
gasPrice | The value equal to the number of computation units (in Ethereum Wei) to be paid per unit of gas for all computation costs incurred as a result of the execution of this transaction. |
gasUsed | The value equal to the total amount of gas used by the transaction. |
maxFeePerGas | Maximum amount that can be paid to validate and include this transaction in the blockchain. |
maxPriorityFeePerGas | The value equal to the number of transactions sent by the address. |
input | The input data to the function. |
nonce | The value equal to the number of transactions sent by the address. |
status | The status of the transaction (successful, failed, etc.) |
timestamp | The value equal to the reasonable output of Unix’s time() at this transactions confirmation. |
timestampNanoseconds | The transaction timestamp with nanoseconds. |
to | Contains objects that hold data about the recipient address(es). |
tos | |
type | The transaction type:0 - legacy2 - EIP 1559 |
value | The scalar value equal to the number of units (in Ethereum Wei) to be transferred to the message call’s recipient or, in the case of contract creation, as an endowment to the newly created contract. |
accessList | |
logsBloom | The Bloom filter composed from indexable information (logger address and log topics) contained in each log entry from the receipt of each transaction in the transactions list. |
r | Value corresponding to the signature of the transaction and used to determine the sender of the transaction. |
s | Value corresponding to the signature of the transaction and used to determine the sender of the transaction. |
v | Value corresponding to the signature of the transaction and used to determine the sender of the transaction. |
opProInputValue | BTC & LTC only. Value of the transaction input. |
opProLockTime | BTC & LTC only. Block height a transaction will be confirmed. Optional. |
opProOutputValue | BTC & LTC only. Value of the transaction output. |
opProSize | BTC & LTC only. The number of bytes that the transaction takes up on the blockchain |
opProVersion | BTC & LTC only. Version of the transaction format. |
opProVirtualSize | BTC & LTC only. A measure of the complexity of the transaction, and it is used to calculate the fee. |
opProInputs | BTC & LTC only. Input address(es) |
opProOutputs | BTC & LTC only. Output address(es) |
DeFi Data
Ethereum - Protocol Events (Aave v2 & v3)
Field | Description |
---|---|
account | The EOA that triggered this event |
action | The event that the EOA triggered in the smart contract |
amountNative | The amount of the asset in native units, normalized with the asset's decimals |
amountUSD | The amount of the asset in US dollars |
assetId | The smart contract address of the asset |
assetSymbol | The human readable, abbreviated name of the asset |
blockNumber | The integer value identifying the block |
borrowRate | The interest rate for borrowing the asset |
borrowRateMode | Indicates whether the borrowRate is stable or variable |
caller | |
collateralAmountNative | The amount of the asset in native units, normalized with the asset's decimals |
collateralAmountUSD | The amount of the asset in US dollars |
collateralAssetId | The smart contract address of the asset. This is the asset that was used as backing for the borrowing activity of liquidatee |
collateralAssetSymbol | The human readable, abbreviated name of the asset |
initiator | |
liquidatee | The EOA being liquidated because they are under-collateralized on their borrowed amount |
liquidator | The EOA that is triggering the liquidation on liquidatee |
logIndex | |
market | |
marketId | |
principalAmountNative | The amount of the asset in native units, normalized with the asset's decimals |
principalAmountUSD | The amount of the asset in US dollars |
principalAssetId | The smart contract address of the asset. This is the asset that was borrowed by the liquidatee |
principalAssetSymbol | The human readable, abbreviated name of the asset |
profitUSD | The amount in US dollars that the liquidator earned from triggering a liquidation |
repayer | The EOA that repaid the load |
reserveAsCollateralEnabled | Indicates if assets deposited into the smart contract can be used as collateral |
target | |
timestamp | Indicates the datetime or epoch milliseconds of when the event took place |
totalFee | The fee for the FlashLoan |
transactionHash | The unique identifier of the transaction indicating that the transaction was validated and added to the block |
Ethereum - Protocol Events (Compound v2 & MakerDAO)
Field | Description |
---|---|
account | The EOA that triggered this event |
action | The human readable name of the event that the EOA triggered in the smart contract |
amountNative | The amount of the asset in native units, normalized with the asset's decimals |
amountUSD | The amount of the asset in US dollars |
assetId | The smart contract address of the asset |
assetSymbol | The human readable, abbreviated name of the asset |
blockNumber | The integer value identifying the block |
borrowRate | The interest rate for borrowing the asset |
borrowRateMode | Indicates whether the borrowRate is stable or variable |
collateralAmountNative | The amount of the asset in native units, normalized with the asset's decimals |
collateralAmountUSD | The amount of the asset in US dollars |
collateralAssetId | The smart contract address of the asset. This is the asset that was used as backing for the borrowing activity of liquidatee |
collateralAssetSymbol | The human readable, abbreviated name of the asset |
liquidatee | The EOA being liquidated because they are under-collateralized on their borrowed amount |
liquidator | The EOA that is triggering the liquidation on liquidatee |
logIndex | |
market | |
marketId | |
principalAmountNative | The amount of the asset in native units, normalized with the asset's decimals |
principalAmountUSD | The amount of the asset in US dollars |
principalAssetId | The smart contract address of the asset. This is the asset that was borrowed by the liquidatee |
principalAssetSymbol | The human readable, abbreviated name of the asset |
profitUSD | The amount in US dollars that the liquidator earned from triggering a liquidation |
timestamp | Indicates the datetime or epoch milliseconds of when the event took place |
totalFee | The fee for the FlashLoan |
transactionHash | The unique identifier of the transaction indicating that the transaction was validated and added to the block |
DEX - Trades
Field | Description |
---|---|
exchange | The name of the exchange. |
timestamp | Timestamp when Amberdata received the data. |
timestampNanoseconds | The nanosecond part of the timestamp where applicable. |
isBuy | Indicates the direction of the trade: - true means buy the base, sell the quote- false means sell the base, buy the quote |
price | The actual price at which the asset was traded (including slippage, but not fees) |
volume | The total amount of that asset that was traded. |
tradeId | The exchange provided id of the trade. |
logIndex | The index of the log within the transaction which included this trade event. |
pairAddress | The address of the pair. |
amountInBase | The amount of the Base asset accepted in the trade. |
amountInQuote | The amount of the Quote asset accepted in the trade. |
amountOutBase | The amount of the Base asset returned in the trade. |
amountOutQuote | The amount of the Quote asset accepted in the trade. |
fromAddress | The address which started the trade, ie the sender |
toAddress | The recipient of the trade, ie the receiver |
DEX - Liquidity
Field | Description |
---|---|
exchangeName | The name of the exchange. |
exchangeId | The address of the exchange/pool. |
pair | The common name for the pair - name is not unique, use pairAddress instead. |
pairNormalized | The internal name for the pair - name is not unique, use pairAddress instead. |
pairAddress | The address of the pair. |
baseAddress | The address of the first underlying assert behind the pair. |
quoteAddress | The address of the last underlying assert behind the pair. |
address | The address of the asset for which this liquidity events is for (would be one of base or quote address). |
timestamp | The timestamp associated with this record. |
transactionHash | The hash of the transaction which included this liquidity event. |
transactionIndex | The index of the transaction which included this liquidity event. |
logIndex | The index of the log within the transaction which included this liquidity event. |
amount | The new amount of the underlying asset after this liquidity event. |
liquidityPrice | The new price of the underlying asset after this liquidity event. |
timestampNanoseconds | The nanosecond part of the timestamp where applicable. |
FAQs
Why is delivery via S3 important?
- With data in S3, our customers can bulk download historical data in an analytics friendly format. This allows them to dig deep into the data and perform their own proprietary research and test trading strategies without being limited by our REST API throughput.
How does a customer get access to these datasets?
- Customers will need to have their own AWS credentials in which we will provision for S3 access. If you are interested in downloading data via S3, please contact your Account Executive.
Why Parquet format instead of a GZIP compressed JSON file?
- Parquet is a columnar storage format for structured data that is optimized for querying and analysis. In Parquet format, data is stored in columns rather than rows, allowing for more efficient compression and encoding of data. This can result in significant performance improvements for analytical workloads that involve large datasets and complex queries. Parquet is widely used in big data environments for data warehousing, analytics, and machine learning applications and can be easily integrated into existing data pipelines.
How do I download a parquet sample file and open it to see which fields are returned?
- If you only want to see the fields, simply download the sample parquet file, load it as a pandas dataframe in Python and use dataframe.dtypes, that'll give you a quick output of the field types. Here is the code available for you to try out:
#Import the pandas library
import pandas as pd
# Replace 'your_parquet_file.parquet' with the path to your Parquet file
parquet_file = 'your_parquet_file.parquet'
# Load the Parquet file as a pandas DataFrame
df = pd.read_parquet(parquet_file)
# Display the data types of the DataFrame
print(df.dtypes)
- Now if you wanted to actually read the parquet data, once you've downloaded the sample parquet file, you can run the following Python code:
#Import the pandas library
import pandas as pd
# Replace 'your_parquet_file.parquet' with the path to your Parquet file
parquet_file = 'your_parquet_file.parquet'
pd.read_parquet(parquet_file, engine='pyarrow')
Updated 25 days ago