# 性能测试指南

本文档说明如何对 RAG 系统进行性能测试和基准测试。

## 📊 性能目标

### 响应时间目标

| 端点类型 | 目标响应时间 | 说明 |
|---------|-------------|------|
| 健康检查 | < 50ms | /health, /metrics |
| 简单查询 | < 200ms | 文档 CRUD 操作 |
| 搜索查询 | < 500ms | 向量搜索、混合搜索 |
| 批量操作 | < 2s | 批量导入、批量更新 |

### 吞吐量目标

- **并发请求**: 支持 100+ 并发请求
- **每秒请求数 (RPS)**: 
  - 健康检查: 1000+ RPS
  - 简单查询: 500+ RPS
  - 搜索查询: 100+ RPS

### 资源使用目标

- **CPU 使用率**: < 70% (正常负载)
- **内存使用**: < 2GB (正常负载)
- **数据库连接**: < 50 个活跃连接

## 🧪 性能测试工具

### 1. 内置基准测试脚本

项目提供了两个基准测试脚本：

#### 模拟测试（快速验证）

```bash
# 运行模拟测试（不需要启动服务器）
python scripts/benchmark_simple.py
```

这个脚本会：
- 模拟 API 调用
- 生成性能统计数据
- 验证测试框架本身

#### 真实测试（需要运行服务器）

```bash
# 1. 启动应用
python main.py

# 2. 在另一个终端运行基准测试
python scripts/benchmark.py
```

这个脚本会：
- 对真实 API 端点发送请求
- 测量实际响应时间
- 生成详细的性能报告
- 保存结果到 `benchmark_results.json`

### 2. 使用 Apache Bench (ab)

```bash
# 测试健康检查端点
ab -n 1000 -c 10 http://localhost:8000/health

# 测试搜索端点（POST 请求）
ab -n 100 -c 5 -p search_payload.json -T application/json \
   http://localhost:8000/api/v1/documents/search
```

### 3. 使用 wrk

```bash
# 安装 wrk
# Ubuntu: sudo apt-get install wrk
# macOS: brew install wrk

# 运行负载测试
wrk -t4 -c100 -d30s http://localhost:8000/health

# 使用 Lua 脚本测试 POST 请求
wrk -t4 -c100 -d30s -s search.lua http://localhost:8000/api/v1/documents/search
```

### 4. 使用 Locust

创建 `locustfile.py`:

```python
from locust import HttpUser, task, between

class RAGSystemUser(HttpUser):
    wait_time = between(1, 3)
    
    @task(3)
    def health_check(self):
        self.client.get("/health")
    
    @task(2)
    def search_documents(self):
        self.client.post("/api/v1/documents/search", json={
            "query_text": "test query",
            "top_k": 10
        })
    
    @task(1)
    def get_metrics(self):
        self.client.get("/metrics")
```

运行 Locust:

```bash
# 安装 Locust
pip install locust

# 启动 Locust Web UI
locust -f locustfile.py

# 或命令行模式
locust -f locustfile.py --headless -u 100 -r 10 -t 60s
```

## 📈 性能测试场景

### 场景 1: 基础健康检查

**目的**: 验证系统基本可用性

```bash
python scripts/benchmark.py
```

**预期结果**:
- 健康检查 < 50ms
- 成功率 100%

### 场景 2: 文档搜索性能

**目的**: 测试核心搜索功能

**步骤**:
1. 准备测试数据（1000+ 文档）
2. 运行搜索基准测试
3. 分析响应时间分布

**预期结果**:
- 平均响应时间 < 500ms
- P95 < 800ms
- P99 < 1000ms

### 场景 3: 并发负载测试

**目的**: 测试系统在高并发下的表现

```bash
# 使用 wrk 进行 30 秒压力测试
wrk -t8 -c200 -d30s http://localhost:8000/api/v1/documents/search
```

**预期结果**:
- 系统保持稳定
- 错误率 < 1%
- 响应时间不显著增加

### 场景 4: 长时间稳定性测试

**目的**: 验证系统长时间运行的稳定性

```bash
# 使用 Locust 运行 1 小时测试
locust -f locustfile.py --headless -u 50 -r 5 -t 1h
```

**监控指标**:
- 内存使用是否稳定（无内存泄漏）
- CPU 使用是否正常
- 数据库连接是否正常释放

## 🔍 性能分析

### 1. 响应时间分析

查看基准测试结果:

```bash
cat benchmark_results.json | jq '.results[] | {endpoint, mean: .response_time.mean}'
```

### 2. 数据库查询分析

启用 SQLAlchemy 查询日志:

```python
# 在 config/settings.py 中
DB_ECHO = True  # 打印所有 SQL 查询
```

使用 `EXPLAIN ANALYZE` 分析慢查询:

```sql
EXPLAIN ANALYZE
SELECT * FROM documents WHERE ...;
```

### 3. 应用性能分析

使用 Python profiler:

```python
import cProfile
import pstats

# 分析特定函数
cProfile.run('your_function()', 'profile_stats')

# 查看结果
stats = pstats.Stats('profile_stats')
stats.sort_stats('cumulative')
stats.print_stats(20)
```

使用 `py-spy` 进行实时分析:

```bash
# 安装 py-spy
pip install py-spy

# 分析运行中的进程
py-spy top --pid <PID>

# 生成火焰图
py-spy record -o profile.svg --pid <PID>
```

## 🚀 性能优化建议

### 1. 数据库优化

- **添加索引**: 为常用查询字段添加索引
- **连接池**: 配置合适的连接池大小
- **查询优化**: 避免 N+1 查询，使用 JOIN

```python
# 配置连接池
SQLALCHEMY_POOL_SIZE = 20
SQLALCHEMY_MAX_OVERFLOW = 40
```

### 2. 缓存策略

- **Redis 缓存**: 缓存热点数据
- **应用层缓存**: 使用 `functools.lru_cache`
- **HTTP 缓存**: 设置合适的 Cache-Control 头

```python
from functools import lru_cache

@lru_cache(maxsize=1000)
def get_document(doc_id: str):
    # 缓存文档查询结果
    pass
```

### 3. 异步处理

- **异步 I/O**: 使用 `async/await` 处理 I/O 操作
- **后台任务**: 使用 Celery 处理耗时任务
- **批量处理**: 批量插入/更新数据

### 4. 向量数据库优化

- **索引优化**: 选择合适的索引类型（HNSW, IVF）
- **维度优化**: 使用 PCA 降维
- **批量查询**: 批量处理向量搜索请求

### 5. 应用层优化

- **减少序列化开销**: 使用 `orjson` 替代标准 `json`
- **连接复用**: 复用 HTTP 连接
- **压缩响应**: 启用 gzip 压缩

```python
# 使用 orjson
from fastapi.responses import ORJSONResponse

app = FastAPI(default_response_class=ORJSONResponse)
```

## 📊 性能监控

### 1. 应用指标

使用 Prometheus + Grafana:

```python
from prometheus_client import Counter, Histogram

# 定义指标
request_count = Counter('http_requests_total', 'Total HTTP requests')
request_duration = Histogram('http_request_duration_seconds', 'HTTP request duration')
```

### 2. 系统指标

监控系统资源:

```bash
# CPU 和内存
htop

# 网络
iftop

# 磁盘 I/O
iotop
```

### 3. 日志分析

分析访问日志:

```bash
# 统计响应时间
cat access.log | awk '{print $10}' | sort -n | tail -100

# 统计最慢的端点
cat access.log | sort -k10 -n | tail -20
```

## 🎯 性能测试检查清单

在部署前，确保完成以下性能测试:

- [ ] 基础健康检查测试通过
- [ ] 所有 API 端点响应时间符合目标
- [ ] 并发负载测试通过（100+ 并发）
- [ ] 长时间稳定性测试通过（1+ 小时）
- [ ] 内存使用稳定（无泄漏）
- [ ] 数据库查询已优化
- [ ] 慢查询已识别并优化
- [ ] 性能监控已配置
- [ ] 性能基线已建立

## 📝 性能测试报告模板

```markdown
# 性能测试报告

## 测试环境
- 日期: YYYY-MM-DD
- 版本: v1.0.0
- 硬件: CPU, RAM, Disk
- 数据量: 文档数量

## 测试结果

### 响应时间
| 端点 | 平均值 | P95 | P99 | 目标 | 状态 |
|-----|--------|-----|-----|------|------|
| /health | 10ms | 15ms | 20ms | <50ms | ✓ |
| /search | 150ms | 300ms | 450ms | <500ms | ✓ |

### 吞吐量
- 健康检查: 1200 RPS
- 搜索查询: 150 RPS

### 资源使用
- CPU: 45%
- 内存: 1.2GB
- 数据库连接: 25

## 问题和建议
1. 搜索查询在高并发下响应时间增加
2. 建议添加缓存层

## 结论
系统性能符合预期目标。
```

## 🔗 相关资源

- [FastAPI 性能优化](https://fastapi.tiangolo.com/deployment/concepts/)
- [SQLAlchemy 性能优化](https://docs.sqlalchemy.org/en/14/faq/performance.html)
- [Python 性能分析](https://docs.python.org/3/library/profile.html)
- [Locust 文档](https://docs.locust.io/)

---

**注意**: 性能测试应该在与生产环境相似的环境中进行，以获得准确的结果。