# CI/CD Pipeline Guide

## Overview

The RAG System uses GitHub Actions for continuous integration and continuous deployment. This guide explains how the CI/CD pipeline works and how to use it effectively.

## Table of Contents

- [Pipeline Architecture](#pipeline-architecture)
- [Workflow Jobs](#workflow-jobs)
- [Test Execution](#test-execution)
- [Coverage Reporting](#coverage-reporting)
- [Setting Up Codecov](#setting-up-codecov)
- [Running Tests Locally](#running-tests-locally)
- [Troubleshooting](#troubleshooting)
- [Best Practices](#best-practices)

## Pipeline Architecture

The CI/CD pipeline consists of four main jobs that run in parallel:

```
┌─────────────────────────────────────────────────────────┐
│                    GitHub Actions                        │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐             │
│  │   Test   │  │   Lint   │  │ Security │             │
│  │   Job    │  │   Job    │  │   Job    │             │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘             │
│       │             │              │                    │
│       └─────────────┴──────────────┘                    │
│                     │                                   │
│              ┌──────▼──────┐                           │
│              │Build Status │                           │
│              │     Job     │                           │
│              └─────────────┘                           │
│                                                          │
└─────────────────────────────────────────────────────────┘
```

## Workflow Jobs

### 1. Test Job

Runs the complete test suite with coverage measurement.

**Matrix Strategy**: Tests run on Python 3.11 and 3.12

**Steps**:
1. Checkout code
2. Set up Python environment
3. Install dependencies
4. Create necessary directories
5. Run unit tests with coverage
6. Run integration tests with coverage
7. Run end-to-end tests with coverage
8. Generate coverage report
9. Upload coverage to Codecov
10. Upload artifacts (coverage reports and test logs)

**Test Execution Order**:
```
Unit Tests → Integration Tests → E2E Tests
```

**Coverage Requirements**:
- Minimum: 80%
- Domain Layer: 90%
- Application Layer: 85%
- Infrastructure Layer: 70%
- Presentation Layer: 75%

### 2. Lint Job

Performs code quality checks.

**Tools**:
- **flake8**: Python linting for code style and errors
- **black**: Code formatting verification
- **isort**: Import statement sorting verification
- **mypy**: Static type checking

**Configuration**:
- Max line length: 127 characters
- Max complexity: 10
- Type checking: Ignore missing imports

### 3. Security Job

Scans for security vulnerabilities.

**Tools**:
- **safety**: Checks dependencies for known vulnerabilities
- **bandit**: Scans code for common security issues

**Reports**:
- `bandit-report.json`: Detailed security scan results

### 4. Build Status Job

Aggregates results from all jobs and determines overall build status.

**Behavior**:
- ✅ Passes if test job succeeds
- ❌ Fails if test job fails
- Runs regardless of lint and security job results

## Test Execution

### Test Types

#### Unit Tests
- **Purpose**: Test individual components in isolation
- **Location**: `tests/unit/`
- **Marker**: `@pytest.mark.unit`
- **Speed**: Fast (< 100ms per test)
- **Dependencies**: Mocked

```bash
pytest tests/unit -v --cov=src --cov-report=term -m unit
```

#### Integration Tests
- **Purpose**: Test component interactions
- **Location**: `tests/integration/`
- **Marker**: `@pytest.mark.integration`
- **Speed**: Medium (< 1s per test)
- **Dependencies**: Test databases/services

```bash
pytest tests/integration -v --cov=src --cov-report=term -m integration
```

#### End-to-End Tests
- **Purpose**: Test complete user workflows
- **Location**: `tests/e2e/`
- **Marker**: `@pytest.mark.e2e`
- **Speed**: Slow (< 5s per test)
- **Dependencies**: Full system

```bash
pytest tests/e2e -v --cov=src --cov-report=term -m e2e
```

### Test Markers

Use markers to organize and selectively run tests:

```python
import pytest

@pytest.mark.unit
def test_vector_creation():
    """Unit test for vector creation"""
    pass

@pytest.mark.integration
@pytest.mark.requires_db
def test_document_repository():
    """Integration test requiring database"""
    pass

@pytest.mark.e2e
@pytest.mark.slow
def test_complete_workflow():
    """End-to-end test of complete workflow"""
    pass

@pytest.mark.property
def test_vector_properties():
    """Property-based test"""
    pass
```

### Running Specific Tests

```bash
# Run only unit tests
pytest -m unit

# Run only integration tests
pytest -m integration

# Run only e2e tests
pytest -m e2e

# Exclude slow tests
pytest -m "not slow"

# Run tests requiring database
pytest -m requires_db

# Run property-based tests
pytest -m property

# Combine markers
pytest -m "unit and not slow"
```

## Coverage Reporting

### Coverage Configuration

Coverage is configured in `.coveragerc` and `codecov.yml`.

**Key Settings**:
- Source: `src/`
- Omit: Tests, cache, migrations
- Branch coverage: Enabled
- Minimum coverage: 80%

### Coverage Reports

#### Terminal Report
```bash
pytest --cov=src --cov-report=term
```

Output:
```
Name                                 Stmts   Miss Branch BrPart  Cover
----------------------------------------------------------------------
src/domain/vector_search/entities.py    45      2      8      1    94%
src/application/handlers.py             67      5     12      2    89%
----------------------------------------------------------------------
TOTAL                                  512     23     89      7    93%
```

#### HTML Report
```bash
pytest --cov=src --cov-report=html
```

Open `htmlcov/index.html` in browser for interactive report.

#### XML Report (for CI/CD)
```bash
pytest --cov=src --cov-report=xml
```

Generates `coverage.xml` for Codecov upload.

### Coverage by Layer

Codecov tracks coverage for each architectural layer:

- **Domain Layer** (`src/domain/`): Target 90%
- **Application Layer** (`src/application/`): Target 85%
- **Infrastructure Layer** (`src/infrastructure/`): Target 70%
- **Presentation Layer** (`src/presentation/`): Target 75%
- **Configuration** (`src/config/`): Target 80%
- **Shared Utilities** (`src/shared/`): Target 80%

## Setting Up Codecov

### Step 1: Sign Up

1. Go to [codecov.io](https://codecov.io)
2. Sign in with GitHub
3. Authorize Codecov to access your repositories

### Step 2: Add Repository

1. Find your repository in Codecov dashboard
2. Click "Add Repository"
3. Copy the upload token

### Step 3: Configure GitHub Secret

1. Go to your GitHub repository
2. Navigate to Settings → Secrets and variables → Actions
3. Click "New repository secret"
4. Name: `CODECOV_TOKEN`
5. Value: Paste the token from Codecov
6. Click "Add secret"

### Step 4: Verify Integration

1. Push a commit to trigger the workflow
2. Check GitHub Actions tab for workflow run
3. Verify coverage upload in Codecov dashboard

### Step 5: Add Coverage Badge

Add to your `README.md`:

```markdown
[![codecov](https://codecov.io/gh/YOUR_USERNAME/YOUR_REPO/branch/main/graph/badge.svg?token=YOUR_TOKEN)](https://codecov.io/gh/YOUR_USERNAME/YOUR_REPO)
```

Replace `YOUR_USERNAME`, `YOUR_REPO`, and `YOUR_TOKEN` with your values.

## Running Tests Locally

### Prerequisites

```bash
# Install test dependencies
pip install pytest pytest-asyncio pytest-cov hypothesis httpx
```

### Basic Test Execution

```bash
# Run all tests
pytest

# Run with verbose output
pytest -v

# Run with coverage
pytest --cov=src

# Run specific test file
pytest tests/unit/domain/test_entities.py

# Run specific test function
pytest tests/unit/domain/test_entities.py::test_vector_creation
```

### Advanced Options

```bash
# Run tests in parallel (requires pytest-xdist)
pytest -n auto

# Stop on first failure
pytest -x

# Show local variables in tracebacks
pytest --showlocals

# Run only failed tests from last run
pytest --lf

# Run failed tests first, then others
pytest --ff

# Generate all report types
pytest --cov=src --cov-report=html --cov-report=term --cov-report=xml
```

### Debugging Tests

```bash
# Run with Python debugger
pytest --pdb

# Drop into debugger on failure
pytest --pdb --maxfail=1

# Print output (disable capture)
pytest -s

# Show extra test summary info
pytest -ra
```

## Troubleshooting

### Common Issues

#### 1. Tests Pass Locally but Fail in CI

**Possible Causes**:
- Python version differences
- Missing dependencies
- Environment-specific configuration
- Timing issues in async tests

**Solutions**:
```bash
# Test with specific Python version
pyenv install 3.11
pyenv local 3.11
pytest

# Check for missing dependencies
pip freeze > current-deps.txt
diff requirements.txt current-deps.txt

# Run tests with same settings as CI
pytest -v --cov=src --cov-report=term
```

#### 2. Coverage Not Uploading to Codecov

**Possible Causes**:
- Missing or incorrect `CODECOV_TOKEN`
- Coverage file not generated
- Network issues

**Solutions**:
1. Verify token in GitHub Secrets
2. Check if `coverage.xml` exists after test run
3. Review Codecov upload logs in workflow
4. Try manual upload:
```bash
bash <(curl -s https://codecov.io/bash) -t YOUR_TOKEN
```

#### 3. Slow Test Execution

**Possible Causes**:
- Too many integration/e2e tests
- Inefficient test setup/teardown
- External service calls

**Solutions**:
```bash
# Identify slow tests
pytest --durations=10

# Run only fast tests
pytest -m "not slow"

# Use parallel execution
pytest -n auto

# Profile test execution
pytest --profile
```

#### 4. Import Errors

**Possible Causes**:
- Missing `__init__.py` files
- Incorrect PYTHONPATH
- Circular imports

**Solutions**:
```bash
# Check Python path
python -c "import sys; print('\n'.join(sys.path))"

# Run tests with explicit path
PYTHONPATH=. pytest

# Check for circular imports
pytest --collect-only
```

### Viewing Artifacts

1. Go to GitHub Actions tab
2. Click on a workflow run
3. Scroll to "Artifacts" section
4. Download:
   - Coverage reports
   - Test logs
   - Security reports

## Best Practices

### 1. Write Fast Tests

```python
# Good: Fast unit test
@pytest.mark.unit
def test_vector_dimension_count():
    vector = Vector([1.0, 2.0, 3.0])
    assert vector.dimension_count == 3

# Avoid: Slow test with unnecessary delays
def test_slow():
    time.sleep(5)  # Don't do this!
    assert True
```

### 2. Use Appropriate Markers

```python
# Mark tests appropriately
@pytest.mark.unit
@pytest.mark.fast
def test_value_object():
    pass

@pytest.mark.integration
@pytest.mark.requires_db
def test_repository():
    pass

@pytest.mark.e2e
@pytest.mark.slow
def test_workflow():
    pass
```

### 3. Mock External Dependencies

```python
# Good: Mock external service
@pytest.mark.unit
def test_document_handler(mock_repository):
    handler = CreateDocumentHandler(mock_repository)
    result = handler.handle(command)
    assert result is not None

# Avoid: Real external calls in unit tests
def test_with_real_api():
    response = requests.get("https://api.example.com")  # Don't do this!
    assert response.status_code == 200
```

### 4. Maintain High Coverage

```python
# Cover edge cases
def test_vector_empty_dimensions():
    with pytest.raises(ValueError):
        Vector([])

def test_vector_invalid_dimensions():
    with pytest.raises(ValueError):
        Vector([1, "invalid", 3])

def test_vector_normal_case():
    vector = Vector([1.0, 2.0])
    assert vector.dimension_count == 2
```

### 5. Keep Tests Independent

```python
# Good: Independent test
@pytest.fixture
def clean_database():
    db = create_test_db()
    yield db
    db.cleanup()

def test_create_document(clean_database):
    # Test uses fresh database
    pass

# Avoid: Tests depending on each other
def test_step_1():
    global shared_state
    shared_state = "value"

def test_step_2():
    # Depends on test_step_1 running first
    assert shared_state == "value"
```

### 6. Use Descriptive Test Names

```python
# Good: Descriptive names
def test_vector_creation_with_valid_dimensions_succeeds():
    pass

def test_vector_creation_with_empty_dimensions_raises_value_error():
    pass

# Avoid: Vague names
def test_vector_1():
    pass

def test_vector_2():
    pass
```

### 7. Document Complex Tests

```python
def test_hybrid_search_score_combination():
    """
    Test that hybrid search correctly combines vector and text search scores.
    
    Given:
        - Vector search results with scores [0.9, 0.7, 0.5]
        - Text search results with scores [0.8, 0.6, 0.4]
        - Weight configuration: vector=0.7, text=0.3
    
    When:
        - Scores are combined using weighted average
    
    Then:
        - Combined scores should be [0.87, 0.67, 0.47]
        - Results should be sorted by combined score
    """
    # Test implementation
    pass
```

## Continuous Improvement

### Monitoring

- Review coverage trends in Codecov
- Monitor test execution time
- Track flaky tests
- Review security scan results

### Optimization

- Refactor slow tests
- Add more unit tests
- Reduce integration test dependencies
- Parallelize test execution

### Documentation

- Keep this guide updated
- Document new test patterns
- Share troubleshooting solutions
- Update best practices

## References

- [GitHub Actions Documentation](https://docs.github.com/en/actions)
- [pytest Documentation](https://docs.pytest.org/)
- [pytest-cov Documentation](https://pytest-cov.readthedocs.io/)
- [Codecov Documentation](https://docs.codecov.com/)
- [Coverage.py Documentation](https://coverage.readthedocs.io/)
- [Hypothesis Documentation](https://hypothesis.readthedocs.io/)