# RAG System A production-ready Retrieval-Augmented Generation (RAG) system built with FastAPI, featuring clean architecture, comprehensive testing, and CI/CD integration. ## πŸš€ Features - **Clean Architecture**: Domain-driven design with clear separation of concerns - **Multiple Vector Databases**: Support for Infinity and Elasticsearch - **Document Processing**: PDF, image, and text document parsing - **Hybrid Search**: Combined vector and full-text search capabilities - **Knowledge Base Management**: Organize documents into knowledge bases - **Comprehensive Testing**: Unit, integration, and end-to-end tests - **CI/CD Pipeline**: Automated testing and deployment with GitHub Actions - **High Test Coverage**: Maintained at 80%+ coverage - **Structured Logging**: JSON-formatted logs with request tracking - **Type Safety**: Full type hints and mypy validation ## πŸ“‹ Table of Contents - [Quick Start](#quick-start) - [Installation](#installation) - [Configuration](#configuration) - [Running Tests](#running-tests) - [CI/CD Pipeline](#cicd-pipeline) - [Architecture](#architecture) - [Documentation](#documentation) - [Contributing](#contributing) ## πŸƒ Quick Start ```bash # Clone the repository git clone https://github.com/YOUR_USERNAME/rag-system.git cd rag-system # Install dependencies pip install -r requirements.txt # Set up environment variables cp .env.example .env # Edit .env with your configuration # Run tests pytest # Start the application python main.py ``` ## πŸ“¦ Installation ### Prerequisites - Python 3.11 or 3.12 - pip (Python package manager) - Git ### Development Setup 1. **Create a virtual environment**: ```bash python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate ``` 2. **Install dependencies**: ```bash pip install -r requirements.txt ``` 3. **Install development dependencies**: ```bash pip install pytest pytest-asyncio pytest-cov hypothesis httpx pip install flake8 black isort mypy # For linting pip install safety bandit # For security scanning ``` 4. **Set up pre-commit hooks** (optional): ```bash pip install pre-commit pre-commit install ``` ## βš™οΈ Configuration Configuration is managed through environment variables and the `.env` file. ### Environment Variables Copy `.env.example` to `.env` and configure: ```bash # Application APP_NAME=RAG System DEBUG=false # Database DB_HOST=localhost DB_PORT=5432 DB_DATABASE=rag_system DB_USERNAME=your_username DB_PASSWORD=your_password # Vector Database (infinity or elasticsearch) VECTOR_DB_TYPE=infinity # Infinity Configuration INFINITY_HOST=localhost INFINITY_PORT=23817 # Elasticsearch Configuration ES_HOST=localhost ES_PORT=9200 ``` See [Configuration Guide](docs/configuration.md) for detailed configuration options. ## πŸ§ͺ Running Tests ### Quick Test Commands ```bash # Run all tests pytest # Run with coverage pytest --cov=src --cov-report=html # Run specific test types pytest -m unit # Unit tests only pytest -m integration # Integration tests only pytest -m e2e # End-to-end tests only # Run fast tests only pytest -m "not slow" ``` ### Using the Test Runner Script ```bash # Run all tests with CI configuration python scripts/run_tests.py # Run with coverage report python scripts/run_tests.py --coverage --html # Run all checks (tests + lint + security) python scripts/run_tests.py --all # Run in parallel python scripts/run_tests.py --parallel ``` ### Performance Testing ```bash # Run simulated performance test (quick) python scripts/benchmark_simple.py # Run real performance test (requires running server) python main.py # In one terminal python scripts/benchmark.py # In another terminal ``` See [Performance Guide](docs/performance.md) for detailed performance testing documentation. ### Test Organization Tests are organized by type: - `tests/unit/`: Fast, isolated unit tests - `tests/integration/`: Component interaction tests - `tests/e2e/`: Complete workflow tests See [Testing Guide](.github/TESTING.md) for detailed testing documentation. ## πŸ”„ CI/CD Pipeline The project uses GitHub Actions for continuous integration and deployment. ### Pipeline Jobs 1. **Test**: Runs all tests with coverage measurement 2. **Lint**: Code quality checks (flake8, black, isort, mypy) 3. **Security**: Security scans (safety, bandit) 4. **Build Status**: Aggregates results ### Coverage Reporting Coverage reports are automatically uploaded to Codecov on every push. [![codecov](https://codecov.io/gh/YOUR_USERNAME/YOUR_REPO/branch/main/graph/badge.svg)](https://codecov.io/gh/YOUR_USERNAME/YOUR_REPO) ### Setting Up CI/CD 1. **Enable GitHub Actions**: Already configured in `.github/workflows/test.yml` 2. **Set up Codecov**: - Sign up at [codecov.io](https://codecov.io) - Add your repository - Add `CODECOV_TOKEN` to GitHub Secrets See [CI/CD Guide](docs/ci-cd.md) for detailed pipeline documentation. ## πŸ—οΈ Architecture The system follows a clean, layered architecture: ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Presentation Layer β”‚ FastAPI routes, middleware β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ Application Layer β”‚ Use cases, handlers, DTOs β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ Domain Layer β”‚ Entities, value objects, services β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ Infrastructure Layer β”‚ Databases, external services β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ### Directory Structure ``` rag_system/ β”œβ”€β”€ src/ β”‚ β”œβ”€β”€ domain/ # Domain layer (business logic) β”‚ β”œβ”€β”€ application/ # Application layer (use cases) β”‚ β”œβ”€β”€ infrastructure/ # Infrastructure layer (databases, APIs) β”‚ β”œβ”€β”€ presentation/ # Presentation layer (API routes) β”‚ β”œβ”€β”€ config/ # Configuration management β”‚ └── shared/ # Shared utilities β”œβ”€β”€ tests/ β”‚ β”œβ”€β”€ unit/ # Unit tests β”‚ β”œβ”€β”€ integration/ # Integration tests β”‚ └── e2e/ # End-to-end tests β”œβ”€β”€ docs/ # Documentation β”œβ”€β”€ scripts/ # Utility scripts └── docker/ # Docker configuration ``` See [Architecture Documentation](docs/architecture.md) for detailed design information. ## πŸ“š Documentation - [Architecture Documentation](docs/architecture.md) - System design and architecture - [Configuration Guide](docs/configuration.md) - Environment and configuration setup - [Directory Structure](docs/directory-structure.md) - Project organization - [Logging Guide](docs/logging.md) - Logging configuration and usage - [CI/CD Guide](docs/ci-cd.md) - Pipeline and deployment - [Testing Guide](.github/TESTING.md) - Testing practices and commands - [Performance Guide](docs/performance.md) - Performance testing and optimization ## 🀝 Contributing We welcome contributions! Please follow these guidelines: ### Development Workflow 1. **Fork the repository** 2. **Create a feature branch**: `git checkout -b feature/your-feature` 3. **Write tests**: Ensure your code is well-tested 4. **Run tests**: `python scripts/run_tests.py --all` 5. **Commit changes**: Use clear, descriptive commit messages 6. **Push to your fork**: `git push origin feature/your-feature` 7. **Create a Pull Request** ### Code Standards - **Style**: Follow PEP 8 (enforced by flake8 and black) - **Type Hints**: Use type hints for all functions - **Documentation**: Add docstrings to all public APIs - **Testing**: Maintain 80%+ test coverage - **Commits**: Use conventional commit messages ### Running Quality Checks ```bash # Format code black src tests isort src tests # Lint code flake8 src tests # Type check mypy src # Run all checks python scripts/run_tests.py --all ``` ## πŸ“Š Test Coverage Current coverage by layer: | Layer | Target | Current | |---------------------|--------|---------| | Domain Layer | 90% | TBD | | Application Layer | 85% | TBD | | Infrastructure Layer| 70% | TBD | | Presentation Layer | 75% | TBD | | **Overall** | **80%**| **TBD** | ## πŸ”’ Security - **Dependency Scanning**: Automated with `safety` - **Code Scanning**: Automated with `bandit` - **Security Reports**: Available in CI/CD artifacts Report security vulnerabilities to: security@example.com ## πŸ“ License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. ## πŸ™ Acknowledgments - FastAPI for the excellent web framework - pytest for the comprehensive testing framework - Codecov for coverage reporting - GitHub Actions for CI/CD automation ## πŸ“ž Support - **Documentation**: [docs/](docs/) - **Issues**: [GitHub Issues](https://github.com/YOUR_USERNAME/YOUR_REPO/issues) - **Discussions**: [GitHub Discussions](https://github.com/YOUR_USERNAME/YOUR_REPO/discussions) ## πŸ—ΊοΈ Roadmap - [x] Phase 1: Infrastructure setup - [x] Phase 2: Domain layer refactoring - [x] Phase 3: Application layer refactoring - [x] Phase 4: Infrastructure layer migration - [x] Phase 5: Presentation layer migration - [x] Phase 6: Documentation and cleanup See [Implementation Plan](.kiro/specs/rag-system-refactoring/tasks.md) for detailed roadmap. --- **Built with ❀️ using Clean Architecture principles**