> ## Documentation Index
> Fetch the complete documentation index at: https://private-7c7dfe99-home-button.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# DataStore: Pandas-compatible API with SQL optimization

> DataStore provides a pandas-compatible API with SQL optimization for high-performance data analysis

export const Image = ({img, alt, size}) => {
  return <Frame>
      <img src={img} alt={alt} />
    </Frame>;
};

DataStore is chDB's pandas-compatible API that combines the familiar pandas DataFrame interface with the power of SQL query optimization and allows you to write pandas-style code while getting ClickHouse performance.

<h2 id="key-features">
  Key features
</h2>

* **Pandas Compatibility**: 209 pandas DataFrame methods, 56 `.str` methods, 42+ `.dt` methods
* **SQL Optimization**: Operations automatically compile to optimized SQL queries
* **Lazy Evaluation**: Operations are deferred until results are needed
* **630+ API Methods**: Comprehensive API surface for data manipulation
* **ClickHouse Extensions**: Additional accessors (`.arr`, `.json`, `.url`, `.ip`, `.geo`) not available in pandas

<h2 id="architecture">
  Architecture
</h2>

<Image size="md" img="https://mintcdn.com/private-7c7dfe99-home-button/iOJBcv5hu2da8lsi/images/chdb/datastore_architecture.png?fit=max&auto=format&n=iOJBcv5hu2da8lsi&q=85&s=e7c3409dd7781cd6ad0b6028ca3d7d79" alt="DataStore Architecture" width="2816" height="1536" data-path="images/chdb/datastore_architecture.png" />

DataStore uses **lazy evaluation** with **dual-engine execution**:

1. **Lazy Operation Chain**: Operations are recorded, not executed immediately
2. **Smart Engine Selection**: QueryPlanner routes each segment to optimal engine (chDB for SQL, Pandas for complex ops)
3. **Intermediate Caching**: Results cached at each step for fast iterative exploration

See [Execution Model](/products/chdb/datastore/execution-model) for details.

<h2 id="migration">
  One-Line migration from Pandas
</h2>

```python theme={null}
# Before (pandas)
import pandas as pd
df = pd.read_csv("data.csv")
result = df[df['age'] > 25].groupby('city')['salary'].mean()

# After (DataStore) - just change the import!
from chdb import datastore as pd
df = pd.read_csv("data.csv")
result = df[df['age'] > 25].groupby('city')['salary'].mean()
```

Your existing pandas code works unchanged, but now runs on the ClickHouse engine.

<h2 id="performance">
  Performance comparison
</h2>

DataStore delivers significant performance improvements over pandas, especially for aggregation and complex pipelines:

| Operation        | Pandas  | DataStore | Speedup    |
| ---------------- | ------- | --------- | ---------- |
| GroupBy count    | 347ms   | 17ms      | **19.93x** |
| Complex pipeline | 2,047ms | 380ms     | **5.39x**  |
| Filter+Sort+Head | 1,537ms | 350ms     | **4.40x**  |
| GroupBy agg      | 406ms   | 141ms     | **2.88x**  |

*Benchmark on 10M rows. See [benchmark script](https://github.com/chdb-io/chdb/blob/main/refs/benchmark_datastore_vs_pandas.py) and [Performance Guide](/products/chdb/guides/pandas-performance) for details.*

<h2 id="when-to-use">
  When to use DataStore
</h2>

**Use DataStore when:**

* Working with large datasets (millions of rows)
* Performing aggregations and groupby operations
* Querying data from files, databases, or cloud storage
* Building complex data pipelines
* You want pandas API with better performance

**Use raw SQL API when:**

* You prefer writing SQL directly
* You need fine-grained control over query execution
* Working with ClickHouse-specific features not exposed in pandas API

<h2 id="comparison">
  Feature comparison
</h2>

| Feature                   | Pandas | Polars  | DuckDB | DataStore        |
| ------------------------- | ------ | ------- | ------ | ---------------- |
| Pandas API compatible     | -      | Partial | No     | **Full**         |
| Lazy evaluation           | No     | Yes     | Yes    | **Yes**          |
| SQL query support         | No     | Yes     | Yes    | **Yes**          |
| ClickHouse functions      | No     | No      | No     | **Yes**          |
| String/DateTime accessors | Yes    | Yes     | No     | **Yes + extras** |
| Array/JSON/URL/IP/Geo     | No     | Partial | No     | **Yes**          |
| Direct file queries       | No     | Yes     | Yes    | **Yes**          |
| Cloud storage support     | No     | Limited | Yes    | **Yes**          |

<h2 id="api-stats">
  API statistics
</h2>

| Category              | Count    | Coverage                           |
| --------------------- | -------- | ---------------------------------- |
| DataFrame methods     | 209      | 100% of pandas                     |
| Series.str accessor   | 56       | 100% of pandas                     |
| Series.dt accessor    | 42+      | 100%+ (includes ClickHouse extras) |
| Series.arr accessor   | 37       | ClickHouse-specific                |
| Series.json accessor  | 13       | ClickHouse-specific                |
| Series.url accessor   | 15       | ClickHouse-specific                |
| Series.ip accessor    | 9        | ClickHouse-specific                |
| Series.geo accessor   | 14       | ClickHouse-specific                |
| **Total API methods** | **630+** | -                                  |

<h2 id="navigation">
  Documentation navigation
</h2>

<h3 id="getting-started">
  Getting Started
</h3>

* [Quickstart](/products/chdb/datastore/quickstart) - Installation and basic usage
* [Migration from Pandas](/products/chdb/guides/migration-from-pandas) - Step-by-step migration guide

<h3 id="api-reference">
  API reference
</h3>

* [Factory Methods](/products/chdb/datastore/factory-methods) - Creating DataStore from various sources
* [Query Building](/products/chdb/datastore/query-building) - SQL-style query operations
* [Pandas Compatibility](/products/chdb/datastore/pandas-compat) - All 209 pandas-compatible methods
* [Accessors](/products/chdb/datastore/accessors) - String, DateTime, Array, JSON, URL, IP, Geo accessors
* [Aggregation](/products/chdb/datastore/aggregation) - Aggregate and window functions
* [I/O Operations](/products/chdb/datastore/io) - Reading and writing data

<h3 id="advanced-topics">
  Advanced topics
</h3>

* [Execution Model](/products/chdb/datastore/execution-model) - Lazy evaluation and caching
* [Class Reference](/products/chdb/datastore/class-reference) - Complete API reference

<h3 id="configuration-debugging">
  Configuration & debugging
</h3>

* [Configuration](/products/chdb/configuration) - All configuration options
* [Performance Mode](/products/chdb/configuration/performance-mode) - SQL-first mode for maximum throughput
* [Debugging](/products/chdb/debugging) - Explain, profiling, and logging

<h3 id="pandas-user-guides">
  Pandas user guides
</h3>

* [Pandas Cookbook](/products/chdb/guides/pandas-cookbook) - Common patterns
* [Key Differences](/products/chdb/guides/pandas-differences) - Important differences from pandas
* [Performance Guide](/products/chdb/guides/pandas-performance) - Optimization tips
* [SQL for Pandas Users](/products/chdb/guides/pandas-to-sql) - Understanding the SQL behind pandas operations

<h2 id="quick-example">
  Quick example
</h2>

```python theme={null}
from chdb import datastore as pd

# Read data from various sources
ds = pd.read_csv("sales.csv")
# or: ds = pd.DataStore.uri("s3://bucket/sales.parquet")
# or: ds = pd.DataStore.from_mysql("mysql://user:pass@host/db/table")

# Familiar pandas operations - automatically optimized to SQL
result = (ds
    .filter(ds['amount'] > 1000)           # WHERE amount > 1000
    .groupby('region')                      # GROUP BY region
    .agg({'amount': ['sum', 'mean']})       # SUM(amount), AVG(amount)
    .sort_values('sum', ascending=False)    # ORDER BY sum DESC
    .head(10)                               # LIMIT 10
)

# View the generated SQL
print(result.to_sql())

# Execute and get results
df = result.to_df()  # Returns pandas DataFrame
```

<h2 id="next-steps">
  Next steps
</h2>

* **New to DataStore?** Start with the [Quickstart Guide](/products/chdb/datastore/quickstart)
* **Coming from pandas?** Read the [Migration Guide](/products/chdb/guides/migration-from-pandas)
* **Want to learn more?** Explore the [API Reference](/products/chdb/datastore/class-reference)
