> ## Documentation Index > Fetch the complete documentation index at: https://private-7c7dfe99-home-button.mintlify.site/llms.txt > Use this file to discover all available pages before exploring further. # 执行引擎配置 > 配置 DataStore 执行引擎：auto、chdb 或 pandas DataStore 可使用不同的后端来执行操作。本指南介绍如何配置并优化引擎选择。

## 可用引擎

| 引擎 | 描述 | 适用场景 | | -------- | --------------------------- | ----------------- | | `auto` | 自动为每项操作选择最合适的引擎 | 常规用途 (默认) | | `chdb` | 强制所有操作都通过 ClickHouse SQL 执行 | 大型数据集、聚合操作 | | `pandas` | 强制所有操作都通过 pandas 执行 | 兼容性测试、pandas 特有功能 |

## 配置引擎

### 全局配置

```python theme={null} from chdb.datastore.config import config # 选项 1：使用 set 方法 config.set_execution_engine('auto') # 默认 config.set_execution_engine('chdb') # 强制使用 ClickHouse config.set_execution_engine('pandas') # 强制使用 pandas # 选项 2：使用快捷方法 config.use_auto() # 自动选择 config.use_chdb() # 强制使用 ClickHouse config.use_pandas() # 强制使用 pandas ```

### 查看当前引擎

```python theme={null} print(config.execution_engine) # 'auto'、'chdb' 或 'pandas' ``` ***

## 自动模式

在 `auto` 模式 (默认) 下，DataStore 会为每个操作选择最佳引擎：

### 在 chDB 中执行的操作

* 与 SQL 兼容的过滤 (`filter()`, `where()`) * 列选择 (`select()`) * 排序 (`sort()`, `orderby()`) * 分组和聚合 (`groupby().agg()`) * 连接 (`join()`, `merge()`) * 去重 (`distinct()`, `drop_duplicates()`) * 限制返回结果数量 (`limit()`, `head()`, `tail()`)

### 在 pandas 中执行的操作

* 自定义 apply 函数 (`apply(custom_func)`) * 带有自定义聚合的复杂数据透视表 * 无法用 SQL 表达的操作 * 当输入已经是 pandas DataFrame 时

### 示例

```python theme={null} from chdb import datastore as pd from chdb.datastore.config import config config.use_auto() # 默认 ds = pd.read_csv("data.csv") # 此处使用 chDB (SQL) result = (ds .filter(ds['amount'] > 100) # SQL: WHERE .groupby('region') # SQL: GROUP BY .agg({'amount': 'sum'}) # SQL: SUM() ) # 此处使用 pandas（自定义函数） result = ds.apply(lambda row: complex_calculation(row), axis=1) ``` ***

## chDB 模式

强制所有操作均通过 ClickHouse SQL 执行： ```python theme={null} config.use_chdb() ```

### 适用场景

* 处理大型数据集 (数百万行) * 高强度聚合类工作负载 * 需要最大化 SQL 优化时 * 需要在所有操作中保持行为一致时

### 性能表现

| 操作类型 | 性能 | | ---------- | --------------- | | GroupBy/聚合 | 极佳 (最高可提升 20 倍) | | 复杂过滤 | 极佳 | | 排序 | 非常好 | | 简单单条件过滤 | 良好 (有轻微开销) |

### 限制事项

* 可能不支持自定义 Python 函数 * 某些 pandas 特有功能需要先进行转换 ***

## pandas 模式

强制所有操作均通过 pandas 执行： ```python theme={null} config.use_pandas() ```

### 何时使用

* 与 pandas 的兼容性测试 * 使用 pandas 专有功能 * 调试与 pandas 相关的问题 * 当数据已为 pandas 格式时

### 性能特性

| 操作类型 | 性能 | | ------ | ------- | | 简单单项操作 | 良好 | | 自定义函数 | 极佳 | | 复杂聚合 | 慢于 chDB | | 大型数据集 | 内存占用高 | ***

## 跨 DataStore 引擎

为需要组合不同 DataStore 中列的操作配置引擎： ```python theme={null} # 设置跨 DataStore 引擎 config.set_cross_datastore_engine('auto') config.set_cross_datastore_engine('chdb') config.set_cross_datastore_engine('pandas') ```

### 示例

```python theme={null} ds1 = pd.read_csv("sales.csv") ds2 = pd.read_csv("inventory.csv") # 此操作涉及两个 DataStore result = ds1.join(ds2, on='product_id') # 使用 cross_datastore_engine 设置 ``` ***

## 引擎选择逻辑

### 自动模式决策树

```text theme={null} Operation requested │ ├─ Can be expressed in SQL? │ │ │ ├─ Yes → Use chDB │ │ │ └─ No → Use pandas │ └─ Cross-DataStore operation? │ └─ Use cross_datastore_engine setting ```

### 函数级别覆盖

某些函数可以显式指定其引擎： ```python theme={null} from chdb.datastore.config import function_config # 强制特定函数使用特定引擎 function_config.use_chdb('length', 'substring') function_config.use_pandas('upper', 'lower') ``` 详见[函数配置](/zh/products/chdb/configuration/function-config)。 ***

## 性能对比

1000 万行数据的基准测试结果： | 操作 | pandas (ms) | chdb (ms) | 加速比 | | ---------- | ----------- | --------- | ------ | | GroupBy 计数 | 347 | 17 | 19.93x | | 组合操作 | 1,535 | 234 | 6.56x | | 复杂管道 | 2,047 | 380 | 5.39x | | 过滤+排序+Head | 1,537 | 350 | 4.40x | | GroupBy 聚合 | 406 | 141 | 2.88x | | 单一过滤 | 276 | 526 | 0.52x | **关键结论：** * chDB 在聚合和复杂管道方面表现尤为出色 * 对于简单的单步操作，pandas 略快一些 * 使用 `auto` 模式可同时兼顾两者的优势 ***

## 最佳实践

### 1. 优先使用自动模式

```python theme={null} config.use_auto() # 让 DataStore 自动决定 ```

### 2. 强制指定前先做性能分析

```python theme={null} config.enable_profiling() # 运行您的工作负载 # 查看 Profiler 报告，了解时间消耗分布 ```

### 3. 为特定工作负载强制指定引擎

```python theme={null} # 适用于高强度聚合工作负载 config.use_chdb() # 用于 Pandas 兼容性测试 config.use_pandas() ```

### 4. 使用 explain() 理解执行情况

```python theme={null} ds = pd.read_csv("data.csv") query = ds.filter(ds['age'] > 25).groupby('city').agg({'salary': 'sum'}) # 查看将生成的 SQL query.explain() ``` ***

## 故障排查

### 问题：操作速度低于预期

```python theme={null} # 检查当前引擎 print(config.execution_engine) # 启用调试以查看发生了什么 config.enable_debug() # 尝试强制使用特定引擎 config.use_chdb() # 或 config.use_pandas() ```

### 问题：chdb 模式下不支持该操作

```python theme={null} # 某些 pandas 操作在 SQL 中不受支持 # 解决方案：使用 auto 模式 config.use_auto() # 或者先显式转换为 pandas df = ds.to_df() result = df.some_pandas_specific_operation() ```

### 问题：大量数据导致的内存问题

```python theme={null} # 使用 chdb 引擎以避免将所有数据加载到内存中 config.use_chdb() # 提前过滤以减少数据量 result = ds.filter(ds['date'] >= '2024-01-01').to_df() # 对于大型数据集，若要获得最大吞吐量，请使用性能模式 # 该模式支持并行 Parquet 读取和单条 SQL 聚合 config.use_performance_mode() ``` **性能模式** 如果您正在运行高强度聚合工作负载，且不需要与 pandas 输出完全兼容 (如行顺序、MultiIndex、dtype 修正) ，可考虑使用 [性能模式](/zh/products/chdb/configuration/performance-mode)。它会自动将引擎设置为 `chdb`，并去除所有 Pandas 兼容性开销。