Market Data - S3

We offer Amazon S3 bulk downloads to retrieve massive historical datasets for select data types, delivered in Apache Parquet format.

Market Datasets

Market TypeFeature TypeSample Files
SpotOrder Book SnapshotsDownload
SpotOrder Book EventsDownload
SpotTickersDownload
SpotTradesDownload
SpotOHLCVDownload (minutely)
FuturesFunding RatesDownload
FuturesInsurance FundsDownload
FuturesLiquidationsDownload
FuturesLong/Short RatioDownload (minutely)
Download (hourly)
Download (daily)
FuturesOrder Book SnapshotsDownload
FuturesOrder Book EventsDownload
FuturesOpen InterestDownload
FuturesTickersDownload
FuturesTradesDownload
OptionsLiquidationsDownload
OptionsOpen InterestDownload
OptionsOrder Book SnapshotsDownload
OptionsOrder Book UpdatesDownload
OptionsTickersDownload
OptionsTradesDownload


Market Data Fields & Descriptions

Spot - Order Book Snapshots

FieldDescription
exchangeThe name of the exchange.
pairThe name of the asset pair.
exchangeTimestampThe time at which the order book snapshot took place.
exchangeTimestampNanosecondsThe nanosecond part of the exchangeTimestamp where applicable.
isBidIndicates if the order is a bid or ask: true for a bid and false for an ask.
timestampThe time at which the order book snapshot took place.
receivedTimestampTimestamp when Amberdata received the order book snapshot.
receivedTimestampNanosecondsThe nanosecond part of the receivedTimestamp.
metadataThe metadata associated with the order book data.
sequenceThe sequence number provided by the exchange (equal to null if it is not provided by the exchange).
dataThe order book data corresponding to the columns fields.
maxPriceThe maximum price for the asset pair. Any buy orders you submit higher than this price, will be clamped to this maximum.
minPriceThe minimum price for the asset pair. Any sell orders you submit lower than this price will be clamped to this minimum.

Spot - Order Book Events

FieldDescription
exchangeThe name of the exchange.
pairThe name of the asset pair.
exchangeTimestampThe time at which the order book event took place.
exchangeTimestampNanosecondsThe nanosecond part of the exchangeTimestamp where applicable.
isBidIndicates if the order is a bid or ask: true for a bid and false for an ask.
receivedTimestampTimestamp when Amberdata received order book event.
receivedTimestampNanosecondsThe nanosecond part of the receivedTimestamp.
timestampTimestamp when Amberdata received order book event.
metadataThe metadata associated with the order book data.
sequenceThe sequence number provided by the exchange (equal to null if it is not provided by the exchange).
dataThe order book data corresponding to the columns fields.

Spot - Tickers

FieldDescription
exchangeThe name of the exchange.
pairThe name of the asset pair.
timestampThe time at which the event occurred.
timestampNanosecondsThe nanosecond part of the timestamp where applicable.
exchangeTimestampThe time at which the event occurred.
exchangeTimestampNanosecondsThe nanosecond part of the exchangeTimestamp where applicable.
receivedTimestampTimestamp when Amberdata received the ticker data.
receivedTimestampNanosecondsThe nanosecond part of the receivedTimestamp.
metadataThe metadata associated with the ticker data.
sequenceThe sequence number provided by the exchange (equal to null if it is not provided by the exchange).
askThe ask price of the market pair.
askVolumeIt represents the requested order size of all best asks.
bidThe bid price of the market pair.
bidVolumeIt represents the requested order size of all best bids.
lastThe last price of the market pair.
midThe mid price of the market pair.

Spot - Trades

FieldDescription
exchangeThe name of the exchange.
pairThe name of the asset pair.
exchangeTimestampThe time at which the trade took place.
exchangeTimestampNanosecondsThe nanosecond part of the exchangeTimestamp where applicable.

NOTE: This is an optional field whose value may or may not be provided by the exchanges.
tradeIdThe exchange provided id of the trade.
receivedTimestampThe time Amberdata received the trade data.
receivedTimestampNanosecondsThe nanosecond part of the receivedTimestamp.
metadataThe metadata associated with the trade data.
isBuySideIndicates if the trade is a buy or sell: true for a buy and false for a sell.
priceThe price at which the asset was traded.
quoteSizeQuote size at the moment of trade

NOTE: This is an optional field whose value may or may not be provided by the exchanges.
sizeThe total amount of that asset that was traded.

Futures - Funding Rates

FieldDescription
exchangeThe name of the exchange.
instrumentThe name of the instrument.
exchangeTimestampThe time at which the event occurred.
exchangeTimestampNanosecondsThe nanosecond part of the exchangeTimestamp where applicable.
receivedTimestampTimestamp when Amberdata received the data.
receivedTimestampNanosecondsThe nanosecond part of the receivedTimestamp.
fundingIntervalThe funding interval for which data is available.
fundingRateThe funding rate for which data is available.
nextFundingRateThe next funding rate for which data is available.
nextFundingTimeThe next funding time for which data is available.

Futures - Insurance Funds

FieldDescription
exchangeThe name of the exchange.
instrumentThe name of the instrument.
exchangeTimestampThe time at which the event occurred.
exchangeTimestampNanosecondsThe nanosecond part of the exchangeTimestamp where applicable.
receivedTimestampTimestamp when Amberdata received the data.
receivedTimestampNanosecondsThe nanosecond part of the receivedTimestamp.
fundThe fund of insurance fund.
underlyingThe underlying asset for the instrument.

Futures - Liquidations

FieldDescription
exchangeThe name of the exchange.
instrumentThe name of the instrument.
exchangeTimestampThe time at which the event occurred.
exchangeTimestampNanosecondsThe nanosecond part of the exchangeTimestamp where applicable.
id
receivedTimestampTimestamp when Amberdata received the data.
receivedTimestampNanosecondsThe nanosecond part of the receivedTimestamp.
timestampThe time at which the liquidation occurred.
action
orderId
priceThe price of the instrument at the time of the liquidation.
sideThe direction of the trade.
statusThe status of the liquidation.
timeInForceHow long the order is to remain active before it is executed or expires, for example:

- IOC: immediate-or-cancel

- FOK: fill-or-kill
- GTC: good-'till-canceled
- etc.
typeThe type of liquidation.
volume

Futures - Long/Short Ratio

FieldDescription
exchangeThe name of the exchange.
instrumentThe name of the instrument.
exchangeTimestampThe time at which the event occurred.
exchangeTimestampNanosecondsThe nanosecond part of the exchangeTimestamp where applicable.
receivedTimestampTimestamp when Amberdata received the data.
receivedTimestampNanosecondsThe nanosecond part of the receivedTimestamp.
periodThe number of the period.
longAccountThe long account number ratio of all traders
ratioThe long/short account number ratio of all traders
shortAccountThe short account number ratio of all traders

Futures - Order Book Updates

FieldDescription
exchangeThe name of the exchange.
instrumentThe name of the instrument.
exchangeTimestampThe time at which the event occurred.
exchangeTimestampNanosecondsThe nanosecond part of the exchangeTimestamp where applicable.
isBidtrue if the order is a bid, false otherwise.
receivedTimestampTimestamp when Amberdata received the data.
receivedTimestampNanosecondsThe nanosecond part of the receivedTimestamp.
timestampThe time at which the event occurred.
sequenceThe sequence number (equal to null if it is not provided by the exchange).
dataThe order book data corresponding to the columns fields, aggregated by exchange.
status

Futures - Open Interest

FieldDescription
exchangeThe name of the exchange.
instrumentThe name of the instrument.
exchangeTimestampThe time at which the event occurred.
exchangeTimestampNanosecondsThe nanosecond part of the exchangeTimestamp where applicable.
receivedTimestampTimestamp when Amberdata received the data.
receivedTimestampNanosecondsThe nanosecond part of the receivedTimestamp.
typeThe type of instrument.
valueThe total outstanding number of contracts.

Futures - Tickers

FieldDescription
exchangeThe name of the exchange.
instrumentThe name of the instrument.
timestampTimestamp when Amberdata received the data.
timestampNanosecondsThe nanosecond part of the timestamp.
exchangeTimestampThe time at which the event occurred.
exchangeTimestampNanosecondsThe nanosecond part of the exchangeTimestamp where applicable.
receivedTimestampTimestamp when Amberdata received the data.
receivedTimestampNanosecondsThe nanosecond part of the receivedTimestamp.
sequenceThe sequence number (equal to null if it is not provided by the exchange).
askThe ask price for instrument.
askVolumeIt represents the requested order size of all best asks.
baseVolume
bidThe bid price for instrument.
bidVolumeIt represents the requested order size of all best bids.
lastThe last price for instrument.
midThe mid price for instrument.
quoteVolume
markPrice
lastVolume

Futures - Trades

FieldDescription
exchangeThe name of the exchange.
instrumentThe name of the instrument.
exchangeTimestampThe time at which the event occurred.
exchangeTimestampNanosecondsThe nanosecond part of the exchangeTimestamp where applicable.
tradeIdThe exchange provided id of the trade.
receivedTimestampTimestamp when Amberdata received the data.
receivedTimestampNanosecondsThe nanosecond part of the receivedTimestamp.
timestampThe time at which the event occurred.
isBuySidetrue if the trade is a buy, false otherwise.
priceThe price at which the asset was traded.
quoteVolume
volumeThe total amount of that asset that was traded.
sequence
isLiquidation

Options - Liquidations

FieldDescription
exchange
instrument
exchangeTimestamp
exchangeTimestampNanoseconds
id
volume
sequence
receivedTimestamp
receivedTimestampNanoseconds
metadata
action
orderId
price
side
status
timeInForce
type

Options - Open Interest

FieldDescription
exchange
instrument
exchangeTimestamp
exchangeTimestampNanoseconds
receivedTimestamp
receivedTimestampNanoseconds
type
value

Options - Order Book Snapshots

FieldDescription
exchange
instrument
exchangeTimestamp
exchangeTimestampNanoseconds
sequence
receivedTimestamp
receivedTimestampNanoseconds
metadata
stats
state
asks
askIv
bids
bidIv
bestAskAmount
bestAskPrice
bestBidAmount
bestBidPrice
estimatedDeliveryPrice
greeks
indexPrice
interestRate
lastPrice
lastPrice
markIv
markPrice
maxPrice
minPrice
openInterest
underlyingIndex
underlyingPrice

Options - Order Book Updates

FieldDescription
exchange
instrument
exchangeTimestamp
exchangeTimestampNanoseconds
isBid
receivedTimestamp
receivedTimestampNanoseconds
timestamp
metadata
sequence
data
status

Options - Tickers

FieldDescription
exchange
instrument
exchangeTimestamp
exchangeTimestampNanoseconds
sequence
receivedTimestamp
receivedTimestampNanoseconds
stats
state
ask
askIv
askVolume
baseVolume
bid
bidIv
bidVolume
quoteVolume
lastVolume
estimatedDeliveryPrice
greeks
indexPrice
interestRate
mid
last
markIv
markPrice
maxPrice
minPrice
openInterest
settlementPrice
underlyingIndex
underlyingPrice
open24H
high24H
low24H

Options - Trades

FieldDescription
exchange
instrument
exchangeTimestamp
exchangeTimestampNanoseconds
tradeId
isBuySide
receivedTimestamp
receivedTimestampNanoseconds
metadata
sequence
price
quoteSize
size
tickDirection
markPrice
iv
indexPrice

CME Only

Futures - Open Interest (CME only)

FieldDescription
exchangeThe name of the exchange.
symbolInstrument Name or Symbol.
exchangeTimestampTimestamp provided by CME.
exchangeTimestampNanosecondsThe nanosecond part of the exchangeTimestamp where applicable.
receivedTimestampThe time Amberdata received the data.
receivedTimestampNanosecondsThe nanosecond part of the receivedTimestamp.
sentTimeTime CME MDP gateway sends the message (UTC).
tradeStatisticsObject - view more details here
tradeStatistics.openInterestThe total open interest for the market at the close of the prior trading session.
tradeStatistics.clearedVolumeCleared volume quantity.
tradeStatistics.settlementFinalFinal settlement price
tradeStatistics.settlementPriceSettlement price.
tradeStatistics.openInterestDateOpen interest trade date

Format: "YYYY-MM-DD"
tradeStatistics.settlementActualActual settlement price
tradeStatistics.clearedVolumeDateCleared volume date.
tradeStatistics.settlementRoundedRounded settlement price
tradeStatistics.settlementPriceDateDate of trade session corresponding to a statistic entry.

Format: "YYYY-MM-DD"
tradeStatistics.openInterestTimestampOpen interest update time.
tradeStatistics.clearedVolumeTimestampCleared volume time.
tradeStatistics.settlementPriceTimestampTime of trade session corresponding to a statistic entry.

Format: "yyyy-MM-dd'T'HH:mm:ss.n"
instrumentObject - view more details here
instrument.idUnique instrument ID as qualified by the exchange per market segment.

The unique instrument ID value will not be reused until the next trade date following an instrument expiration or deletion.
instrument.symbolInstrument Name or Symbol.
instrument.periodCodeThis field provides the calendar month reflected in the instrument symbol. Format YYYYMM (e.g., 201912)

For futures spreads, this field contains the first leg's calendar month reflected in the instrument symbol.
instrument.exchangeMicExchange used to identify a security.
instrument.productCodeString field that indicates the underlying asset code (Product Code). Example: SR1 (SOFR), ES (E-Minis).
instrument.productTypeProduct Type
instrument.productGroupProduct Group Code.
instrument.marketSegmentIdIdentifies the market segment.

Populated for all CME Globex instruments.
instrument.definitionSourceIdentifies user-defined instruments. If the tag is not present, the instrument is not user-defined.

Futures - Tickers (CME only)

FieldDescription
exchangeThe name of the exchange.
symbolInstrument Name or Symbol.
exchangeTimestampTimestamp provided by CME.
exchangeTimestampNanosecondsThe nanosecond part of the exchangeTimestamp where applicable.
receivedTimestampThe time Amberdata received the data.
receivedTimestampNanosecondsThe nanosecond part of the receivedTimestamp.
sentTimeTime CME MDP gateway sends the message (UTC).
tradingStatusIdentifies the trading status applicable to the instrument or product group.
instrumentObject - view more details here
instrument.idUnique instrument ID as qualified by the exchange per market segment.

The unique instrument ID value will not be reused until the next trade date following an instrument expiration or deletion.
instrument.symbolInstrument Name or Symbol.
instrument.periodCodeThis field provides the calendar month reflected in the instrument symbol. Format YYYYMM (e.g., 201912)

For futures spreads, this field contains the first leg's calendar month reflected in the instrument symbol.
instrument.exchangeMicExchange used to identify a security.
instrument.productCodeString field that indicates the underlying asset code (Product Code). Example: SR1 (SOFR), ES (E-Minis).
instrument.productTypeProduct Type
instrument.productGroupProduct Group Code.
instrument.marketSegmentIdIdentifies the market segment.

Populated for all CME Globex instruments.
instrument.definitionSourceIdentifies user-defined instruments. If the tag is not present, the instrument is not user-defined.
askLevelObject - view more details here
askLevel.qtyOrder quantity.
askLevel.pricePrice of the MD Entry.
askLevel.orderCntAggregate number of orders at the given price level.
askLevel.lastUpdateTimeLast update time for ask price

Format: "yyyy-MM-dd'T'HH:mm:ss.n"
bidLevelObject - view more details here
bidLevel.qtyOrder quantity
bidLevel.pricePrice of the MD Entry.
bidLevel.orderCntAggregate number of orders at the given price level.
bidLevel.lastUpdateTimeLast update time for bid price.

Format: "yyyy-MM-dd'T'HH:mm:ss.n"

Futures - Trades (CME only)

FieldDescription
exchangeThe name of the exchange.
symbolInstrument Name or Symbol.
exchangeTimestampTimestamp provided by CME.
exchangeTimestampNanosecondsThe nanosecond part of the exchangeTimestamp where applicable.
receivedTimestampThe time Amberdata received the data.
receivedTimestampNanosecondsThe nanosecond part of the receivedTimestamp.
sentTimeTime CME MDP gateway sends the message (UTC).
tradeSummaryObject - view more details here
tradeSummary.orderQtyObject - view more details here
tradeSummary.orderQty.orderIdUnique ID assigned by CME Globex to identify orders.
tradeSummary.orderQty.lastOrdQtyQuantity of trade
tradeSummary.tradeQtyTotal traded quantity.
tradeSummary.tradePriceTrade price.
tradeSummary.aggressorSideIndicates which side is aggressor of the trade. If there is a zero value present, then there is no aggressor.

Trades without aggressors occur:

- at Market Open

- after a Pre-Open or after a Pause
- when the event includes customer order participation in a trade with a CME Globex-generated implied bid or offer.
tradeSummary.mdTradeEntryIdCommon Trade ID that links each trade execution.
tradeSummary.tradeOrderCountIdentifies the total number of non-implied orders per instrument that participated in a match event.
tradeSummary.tradeUpdateActionTrade market data update action.
instrumentObject - view more details here
instrument.idUnique instrument ID as qualified by the exchange per market segment.

The unique instrument ID value will not be reused until the next trade date following an instrument expiration or deletion.
instrument.symbolInstrument Name or Symbol.
instrument.periodCodeThis field provides the calendar month reflected in the instrument symbol. Format YYYYMM (e.g., 201912)

For futures spreads, this field contains the first leg's calendar month reflected in the instrument symbol.
instrument.exchangeMicExchange used to identify a security.
instrument.productCodeString field that indicates the underlying asset code (Product Code). Example: SR1 (SOFR), ES (E-Minis).
instrument.productTypeProduct Type
instrument.productGroupProduct Group Code.
instrument.marketSegmentIdIdentifies the market segment.

Populated for all CME Globex instruments.
instrument.definitionSourceIdentifies user-defined instruments. If the tag is not present, the instrument is not user-defined.

Options - Open Interest (CME only)

FieldDescription
exchangeThe name of the exchange.
symbolInstrument Name or Symbol.
exchangeTimestampTimestamp provided by CME.
exchangeTimestampNanosecondsThe nanosecond part of the exchangeTimestamp where applicable.
receivedTimestampThe time Amberdata received the data.
receivedTimestampNanosecondsThe nanosecond part of the receivedTimestamp.
sentTimeTime CME MDP gateway sends the message (UTC).
tradeStatisticsObject - view more details here
tradeStatistics.openInterestThe total open interest for the market at the close of the prior trading session.
tradeStatistics.clearedVolumeCleared volume quantity.
tradeStatistics.settlementFinalFinal settlement price
tradeStatistics.settlementPriceSettlement price.
tradeStatistics.openInterestDateOpen interest trade date

Format: "YYYY-MM-DD"
tradeStatistics.settlementActualActual settlement price
tradeStatistics.clearedVolumeDateCleared volume date.
tradeStatistics.settlementRoundedRounded settlement price
tradeStatistics.settlementPriceDateDate of trade session corresponding to a statistic entry.

Format: "YYYY-MM-DD"
tradeStatistics.openInterestTimestampOpen interest update time.
tradeStatistics.clearedVolumeTimestampCleared volume time.
tradeStatistics.settlementPriceTimestampTime of trade session corresponding to a statistic entry.

Format: "yyyy-MM-dd'T'HH:mm:ss.n"
instrumentObject - view more details here
instrument.idUnique instrument ID as qualified by the exchange per market segment.

The unique instrument ID value will not be reused until the next trade date following an instrument expiration or deletion.
instrument.symbolInstrument Name or Symbol.
instrument.periodCodeThe calendar month reflected in the instrument symbol. Format YYYYMM (e.g., 201912)

For futures spreads, this field contains the first leg's calendar month reflected in the instrument symbol.
instrument.exchangeMicExchange used to identify a security.
instrument.productCodeString field that indicates the underlying asset code (Product Code). Example: SR1 (SOFR), ES (E-Minis).
instrument.productTypeProduct Type.
instrument.productGroupProduct Group Code.
instrument.marketSegmentIdIdentifies the market segment.

Populated for all CME Globex instruments.
instrument.definitionSourceIdentifies user-defined instruments. If the tag is not present, the instrument is not user-defined.

Options - Tickers (CME only)

FieldDescription
exchangeThe name of the exchange.
symbolInstrument Name or Symbol.
exchangeTimestampTimestamp provided by CME.
exchangeTimestampNanosecondsThe nanosecond part of the exchangeTimestamp where applicable.
receivedTimestampThe time Amberdata received the data.
receivedTimestampNanosecondsThe nanosecond part of the receivedTimestamp.
sentTimeTime CME MDP gateway sends the message (UTC).
tradingStatusIdentifies the trading status applicable to the instrument or product group.
instrumentObject - view more details here
instrument.idUnique instrument ID as qualified by the exchange per market segment.

The unique instrument ID value will not be reused until the next trade date following an instrument expiration or deletion.
instrument.symbolInstrument Name or Symbol.
instrument.putOrCallIndicates whether an option instrument is a put or call.
instrument.periodCodeThis field provides the calendar month reflected in the instrument symbol. Format YYYYMM (e.g., 201912)

For futures spreads, this field contains the first leg's calendar month reflected in the instrument symbol.
instrument.exchangeMicExchange used to identify a security.
instrument.productCodeAn exchange-specific code assigned to a group of related securities, which are concurrently affected by market events.
instrument.productTypeProduct type
instrument.strikePriceThe strike price.
instrument.productGroupProduct Group Code.
instrument.marketSegmentIdIdentifies the market segment.

Populated for all CME Globex instruments.
instrument.definitionSourceIdentifies user-defined instruments. If the tag is not present, the instrument is not user-defined.
instrument.underlyingSymbolUnderlying Instrument Symbol (Contract Name) * this value will be the same as that contained in Leg Instrument's Security Definition Tag 55-Symbol.
askLevelObject - view more details here
askLevel.qtyOrder quantity.
askLevel.pricePrice of the MD Entry.
askLevel.orderCntAggregate number of orders at the given price level.
askLevel.lastUpdateTimeLast update time for ask price

Format: "yyyy-MM-dd'T'HH:mm:ss.n"
bidLevelObject - view more details here
bidLevel.qtyOrder quantity
bidLevel.pricePrice of the MD Entry.
bidLevel.orderCntAggregate number of orders at the given price level.
bidLevel.lastUpdateTimeLast update time for bid price.

Format: "yyyy-MM-dd'T'HH:mm:ss.n"

Options - Trades (CME only)

FieldDescription
exchangeThe name of the exchange.
symbolInstrument Name or Symbol.
exchangeTimestampTimestamp provided by CME.
exchangeTimestampNanosecondsThe nanosecond part of the exchangeTimestamp where applicable.
receivedTimestampThe time Amberdata received the data.
receivedTimestampNanosecondsThe nanosecond part of the receivedTimestamp.
sentTimeTime CME MDP gateway sends the message (UTC).
tradeSummaryObject - view more details here
tradeSummary.orderQty
tradeSummary.orderQty.orderIdUnique ID assigned by CME Globex to identify orders.
tradeSummary.orderQty.lastOrdQtyQuantity of trade
tradeSummary.tradeQtyTotal traded quantity.
tradeSummary.tradePriceTrade price.
tradeSummary.aggressorSideIndicates which side is aggressor of the trade. If there is a zero value present, then there is no aggressor.

Trades without aggressors occur:

at Market Open
after a Pre-Open or after a Pause
when the event includes customer order participation in a trade with a CME Globex-generated implied bid or offer.
tradeSummary.mdTradeEntryIdCommon Trade ID that links each trade execution.
tradeSummary.tradeOrderCountIdentifies the total number of non-implied orders per instrument that participated in a match event.
tradeSummary.tradeUpdateActionTrade market data update action.
instrumentObject - view more details here
instrument.idUnique instrument ID as qualified by the exchange per market segment.

The unique instrument ID value will not be reused until the next trade date following an instrument expiration or deletion.
instrument.symbolInstrument Name or Symbol.
instrument.putOrCallIndicates whether an option instrument is a put or call.
instrument.periodCodeThe calendar month reflected in the instrument symbol.

For futures spreads, this field contains the first leg's calendar month reflected in the instrument symbol.
instrument.exchangeMicExchange used to identify a security.
instrument.productCodeString field that indicates the underlying asset code (Product Code). Example: SR1 (SOFR), ES (E-Minis).
instrument.productTypeProduct Type.
instrument.strikePriceThe strike price.
instrument.productGroupProduct Group Code.
instrument.marketSegmentIdIdentifies the market segment.

Populated for all CME Globex instruments.
instrument.definitionSourceIdentifies user-defined instruments. If the tag is not present, the instrument is not user-defined.
instrument.underlyingSymbolUnderlying Instrument Symbol (Contract Name) * this value will be the same as that contained in Leg Instrument's Security Definition Tag 55-Symbol.


FAQs

Why is delivery via S3 important?

  • With data in S3, our customers can bulk download historical data in an analytics friendly format. This allows them to dig deep into the data and perform their own proprietary research and test trading strategies without being limited by our REST API throughput.

How does a customer get access to these datasets?

  • Customers will need to have their own AWS credentials in which we will provision for S3 access. If you are interested in downloading data via S3, please contact your Account Executive.

Why Parquet format instead of a GZIP compressed JSON file?

  • Parquet is a columnar storage format for structured data that is optimized for querying and analysis. In Parquet format, data is stored in columns rather than rows, allowing for more efficient compression and encoding of data. This can result in significant performance improvements for analytical workloads that involve large datasets and complex queries. Parquet is widely used in big data environments for data warehousing, analytics, and machine learning applications and can be easily integrated into existing data pipelines.

How do I download a parquet sample file and open it to see which fields are returned?

  • If you only want to see the fields, simply download the sample parquet file, load it as a pandas dataframe in Python and use dataframe.dtypes, that'll give you a quick output of the field types. Here is the code available for you to try out:
#Import the pandas library
import pandas as pd

# Replace 'your_parquet_file.parquet' with the path to your Parquet file
parquet_file = 'your_parquet_file.parquet'

# Load the Parquet file as a pandas DataFrame
df = pd.read_parquet(parquet_file)

# Display the data types of the DataFrame
print(df.dtypes)
  • Now if you wanted to actually read the parquet data, once you've downloaded the sample parquet file, you can run the following Python code:
#Import the pandas library
import pandas as pd   

# Replace 'your_parquet_file.parquet' with the path to your Parquet file
parquet_file = 'your_parquet_file.parquet' 

pd.read_parquet(parquet_file, engine='pyarrow')