CloudSync Overview - Amberdata Docs

Amberdata provides multiple options for accessing large historical datasets to power advanced analytics, research, and trading strategy development. Our CloudSync solutions are designed to overcome the throughput limitations of REST APIs, enabling efficient large-scale data analysis.

Benefits

Object Storage Advantages

Large Historical Data – Access large datasets in analytics-ready formats.
Research Flexibility – Run proprietary analyses and test strategies without API rate constraints.
Cost Efficiency – Avoid repeated API calls when retrieving extensive historical data.
Pipeline Integration – Easily connect to existing ETL, ELT, and analytics workflows.

Cloud Data Warehouse Advantages

High-Performance Queries – Optimized for complex, large-scale analytical workloads.
Seamless Data Integration – Integrate with diverse data sources and workflows.
Cloud-Native Scalability – Scale storage and compute as your data needs grow.
Enterprise-Grade Security – Advanced compliance and security features.
User-Friendly SQL Access – Intuitive querying for analysts and engineers.
Built-In Transformation – Native tools for processing, cleansing, and enriching data.

Delivery Methods

Amazon S3 — Parquet

Retrieve large historical datasets from Amazon S3 in Apache Parquet format, optimized for performance and compatibility with analytics tools. Apache Parquet format offers several key advantages:

Columnar Storage – Stores data by column instead of row, enabling highly efficient compression and encoding.
High Performance – Delivers faster processing for large datasets and complex analytical queries.
Efficient Compression – Achieves better compression ratios than row-based formats like JSON.
Analytics-Optimized – Designed for fast querying and analytical workloads.
Seamless Integration – Fits easily into existing data pipelines and big data ecosystems.
Broad Compatibility – Supported across major data warehousing, analytics, and machine learning platforms.

Snowflake Data Warehouse

Most datasets are available in Snowflake, providing scalable, cloud-native data warehousing with efficient storage, fast retrieval, and powerful SQL-based analysis.

Getting Started

Working with Parquet Files

If you only want to see the available fields, download a sample parquet file and load it as a pandas dataframe:

# Import the pandas library
import pandas as pd

# Replace 'your_parquet_file.parquet' with the path to your Parquet file
parquet_file = 'your_parquet_file.parquet'

# Load the Parquet file as a pandas DataFrame
df = pd.read_parquet(parquet_file)

# Display the data types of the DataFrame
print(df.dtypes)

To read the actual parquet data:

# Import the pandas library
import pandas as pd

# Replace 'your_parquet_file.parquet' with the path to your Parquet file
parquet_file = 'your_parquet_file.parquet'

# Read and display the data
df = pd.read_parquet(parquet_file, engine='pyarrow')
print(df.head())

Access and Provisioning

Amazon S3 Access

Customers need their own AWS credentials for S3 access provisioning. Contact your Account Executive if you’re interested in downloading data via S3.

Important Access Requirements

Note: Our S3 data buckets are configured as Requester Pays buckets, meaning your company will be responsible for any Amazon data transfer fees incurred during downloads. To access the data, you must include the following in your request headers:

Header: x-amz-request-payer: requester

Parameter: --request-payer requester (for CLI requests)

Ensure this setting is included in all requests to avoid access issues.

Snowflake Access

Customers need their own Snowflake account for data sharing access. Visit Snowflake’s Marketplace to access sample files, or contact us for full access.

Next Steps

Choose the delivery method that best fits your infrastructure and analytical needs:

Amazon S3: Ideal for downloading and storing large historical datasets for offline analysis
Snowflake: Perfect for real-time querying and advanced analytics with SQL

Contact your Account Executive or reach out to us to discuss which option best suits your requirements.

​Benefits

​Object Storage Advantages

​Cloud Data Warehouse Advantages

​Delivery Methods

​Amazon S3 — Parquet

​Snowflake Data Warehouse

​Getting Started

​Working with Parquet Files

​Access and Provisioning

​Amazon S3 Access

​Important Access Requirements

​Snowflake Access

​Next Steps