Spaces:

DataQuests
/

DeepCritical

Running

App Files Files Community

adds interface

by Tonic - opened 12 days ago

base: refs/heads/main

←

from: refs/pr/4

Discussion Files changed

+7148

-23004

This view is limited to 50 files because it contains too many changes. See the raw diff here.

Files changed (50) hide show

.cursorrules +0 -240
.env.example +97 -80
.github/README.md +11 -2
.github/scripts/deploy_to_hf_space.py +0 -391
.github/workflows/ci.yml +23 -70
.github/workflows/deploy-hf-space.yml +0 -47
.github/workflows/docs.yml +61 -0
.gitignore +5 -6
.pre-commit-config.yaml +11 -21
=0.22.0 +0 -0
=0.22.0, +0 -0
AGENTS.txt +0 -236
LICENSE.md +0 -25
Makefile +51 -0
README.md +86 -26
deployments/README.md +0 -46
deployments/modal_tts.py +0 -97
dev/Makefile +51 -0
docs/api/agents.md +103 -48
docs/api/models.md +110 -57
docs/api/orchestrators.md +86 -44
docs/api/services.md +41 -123
docs/api/tools.md +29 -57
docs/architecture/agents.md +18 -123
docs/architecture/graph-orchestration.md +152 -0
docs/architecture/graph_orchestration.md +42 -185
docs/architecture/middleware.md +37 -45
docs/architecture/orchestrators.md +55 -58
docs/architecture/services.md +28 -36
docs/architecture/tools.md +33 -29
docs/architecture/workflow-diagrams.md +20 -5
docs/architecture/workflows.md +662 -0
docs/configuration/CONFIGURATION.md +743 -0
docs/configuration/index.md +260 -78
CONTRIBUTING.md → docs/contributing.md +66 -132
docs/contributing/code-quality.md +30 -73
docs/contributing/code-style.md +16 -42
docs/contributing/error-handling.md +15 -4
docs/contributing/implementation-patterns.md +20 -7
docs/contributing/index.md +26 -121
docs/contributing/prompt-engineering.md +10 -0
docs/contributing/testing.md +12 -66
docs/getting-started/examples.md +31 -24
docs/getting-started/installation.md +10 -18
docs/getting-started/mcp-integration.md +14 -6
docs/getting-started/quick-start.md +14 -41
docs/index.md +9 -28
docs/{LICENSE.md → license.md} +0 -0
docs/overview/architecture.md +14 -16
docs/overview/features.md +19 -44

.cursorrules DELETED Viewed

@@ -1,240 +0,0 @@
-# DeepCritical Project - Cursor Rules
-## Project-Wide Rules
-**Architecture**: Multi-agent research system using Pydantic AI for agent orchestration, supporting iterative and deep research patterns. Uses middleware for state management, budget tracking, and workflow coordination.
-**Type Safety**: ALWAYS use complete type hints. All functions must have parameter and return type annotations. Use `mypy --strict` compliance. Use `TYPE_CHECKING` imports for circular dependencies: `from typing import TYPE_CHECKING; if TYPE_CHECKING: from src.services.embeddings import EmbeddingService`
-**Async Patterns**: ALL I/O operations must be async (`async def`, `await`). Use `asyncio.gather()` for parallel operations. CPU-bound work must use `run_in_executor()`: `loop = asyncio.get_running_loop(); result = await loop.run_in_executor(None, cpu_bound_function, args)`. Never block the event loop.
-**Error Handling**: Use custom exceptions from `src/utils/exceptions.py`: `DeepCriticalError`, `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions: `raise SearchError(...) from e`. Log with structlog: `logger.error("Operation failed", error=str(e), context=value)`.
-**Logging**: Use `structlog` for ALL logging (NOT `print` or `logging`). Import: `import structlog; logger = structlog.get_logger()`. Log with structured data: `logger.info("event", key=value)`. Use appropriate levels: DEBUG, INFO, WARNING, ERROR.
-**Pydantic Models**: All data exchange uses Pydantic models from `src/utils/models.py`. Models are frozen (`model_config = {"frozen": True}`) for immutability. Use `Field()` with descriptions. Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints.
-**Code Style**: Ruff with 100-char line length. Ignore rules: `PLR0913` (too many arguments), `PLR0912` (too many branches), `PLR0911` (too many returns), `PLR2004` (magic values), `PLW0603` (global statement), `PLC0415` (lazy imports).
-**Docstrings**: Google-style docstrings for all public functions. Include Args, Returns, Raises sections. Use type hints in docstrings only if needed for clarity.
-**Testing**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`). Use `respx` for httpx mocking, `pytest-mock` for general mocking.
-**State Management**: Use `ContextVar` in middleware for thread-safe isolation. Never use global mutable state (except singletons via `@lru_cache`). Use `WorkflowState` from `src/middleware/state_machine.py` for workflow state.
-**Citation Validation**: ALWAYS validate references before returning reports. Use `validate_references()` from `src/utils/citation_validator.py`. Remove hallucinated citations. Log warnings for removed citations.
----
-## src/agents/ - Agent Implementation Rules
-**Pattern**: All agents use Pydantic AI `Agent` class. Agents have structured output types (Pydantic models) or return strings. Use factory functions in `src/agent_factory/agents.py` for creation.
-**Agent Structure**:
-- System prompt as module-level constant (with date injection: `datetime.now().strftime("%Y-%m-%d")`)
-- Agent class with `__init__(model: Any | None = None)`
-- Main method (e.g., `async def evaluate()`, `async def write_report()`)
-- Factory function: `def create_agent_name(model: Any | None = None) -> AgentName`
-**Model Initialization**: Use `get_model()` from `src/agent_factory/judges.py` if no model provided. Support OpenAI/Anthropic/HF Inference via settings.
-**Error Handling**: Return fallback values (e.g., `KnowledgeGapOutput(research_complete=False, outstanding_gaps=[...])`) on failure. Log errors with context. Use retry logic (3 retries) in Pydantic AI Agent initialization.
-**Input Validation**: Validate query/inputs are not empty. Truncate very long inputs with warnings. Handle None values gracefully.
-**Output Types**: Use structured output types from `src/utils/models.py` (e.g., `KnowledgeGapOutput`, `AgentSelectionPlan`, `ReportDraft`). For text output (writer agents), return `str` directly.
-**Agent-Specific Rules**:
-- `knowledge_gap.py`: Outputs `KnowledgeGapOutput`. Evaluates research completeness.
-- `tool_selector.py`: Outputs `AgentSelectionPlan`. Selects tools (RAG/web/database).
-- `writer.py`: Returns markdown string. Includes citations in numbered format.
-- `long_writer.py`: Uses `ReportDraft` input/output. Handles section-by-section writing.
-- `proofreader.py`: Takes `ReportDraft`, returns polished markdown.
-- `thinking.py`: Returns observation string from conversation history.
-- `input_parser.py`: Outputs `ParsedQuery` with research mode detection.
----
-## src/tools/ - Search Tool Rules
-**Protocol**: All tools implement `SearchTool` protocol from `src/tools/base.py`: `name` property and `async def search(query, max_results) -> list[Evidence]`.
-**Rate Limiting**: Use `@retry` decorator from tenacity: `@retry(stop=stop_after_attempt(3), wait=wait_exponential(...))`. Implement `_rate_limit()` method for APIs with limits. Use shared rate limiters from `src/tools/rate_limiter.py`.
-**Error Handling**: Raise `SearchError` or `RateLimitError` on failures. Handle HTTP errors (429, 500, timeout). Return empty list on non-critical errors (log warning).
-**Query Preprocessing**: Use `preprocess_query()` from `src/tools/query_utils.py` to remove noise and expand synonyms.
-**Evidence Conversion**: Convert API responses to `Evidence` objects with `Citation`. Extract metadata (title, url, date, authors). Set relevance scores (0.0-1.0). Handle missing fields gracefully.
-**Tool-Specific Rules**:
-- `pubmed.py`: Use NCBI E-utilities (ESearch → EFetch). Rate limit: 0.34s between requests. Parse XML with `xmltodict`. Handle single vs. multiple articles.
-- `clinicaltrials.py`: Use `requests` library (NOT httpx - WAF blocks httpx). Run in thread pool: `await asyncio.to_thread(requests.get, ...)`. Filter: Only interventional studies, active/completed.
-- `europepmc.py`: Handle preprint markers: `[PREPRINT - Not peer-reviewed]`. Build URLs from DOI or PMID.
-- `rag_tool.py`: Wraps `LlamaIndexRAGService`. Returns Evidence from RAG results. Handles ingestion.
-- `search_handler.py`: Orchestrates parallel searches across multiple tools. Uses `asyncio.gather()` with `return_exceptions=True`. Aggregates results into `SearchResult`.
----
-## src/middleware/ - Middleware Rules
-**State Management**: Use `ContextVar` for thread-safe isolation. `WorkflowState` uses `ContextVar[WorkflowState | None]`. Initialize with `init_workflow_state(embedding_service)`. Access with `get_workflow_state()` (auto-initializes if missing).
-**WorkflowState**: Tracks `evidence: list[Evidence]`, `conversation: Conversation`, `embedding_service: Any`. Methods: `add_evidence()` (deduplicates by URL), `async search_related()` (semantic search).
-**WorkflowManager**: Manages parallel research loops. Methods: `add_loop()`, `run_loops_parallel()`, `update_loop_status()`, `sync_loop_evidence_to_state()`. Uses `asyncio.gather()` for parallel execution. Handles errors per loop (don't fail all if one fails).
-**BudgetTracker**: Tracks tokens, time, iterations per loop and globally. Methods: `create_budget()`, `add_tokens()`, `start_timer()`, `update_timer()`, `increment_iteration()`, `check_budget()`, `can_continue()`. Token estimation: `estimate_tokens(text)` (~4 chars per token), `estimate_llm_call_tokens(prompt, response)`.
-**Models**: All middleware models in `src/utils/models.py`. `IterationData`, `Conversation`, `ResearchLoop`, `BudgetStatus` are used by middleware.
----
-## src/orchestrator/ - Orchestration Rules
-**Research Flows**: Two patterns: `IterativeResearchFlow` (single loop) and `DeepResearchFlow` (plan → parallel loops → synthesis). Both support agent chains (`use_graph=False`) and graph execution (`use_graph=True`).
-**IterativeResearchFlow**: Pattern: Generate observations → Evaluate gaps → Select tools → Execute → Judge → Continue/Complete. Uses `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`, `WriterAgent`, `JudgeHandler`. Tracks iterations, time, budget.
-**DeepResearchFlow**: Pattern: Planner → Parallel iterative loops per section → Synthesizer. Uses `PlannerAgent`, `IterativeResearchFlow` (per section), `LongWriterAgent` or `ProofreaderAgent`. Uses `WorkflowManager` for parallel execution.
-**Graph Orchestrator**: Uses Pydantic AI Graphs (when available) or agent chains (fallback). Routes based on research mode (iterative/deep/auto). Streams `AgentEvent` objects for UI.
-**State Initialization**: Always call `init_workflow_state()` before running flows. Initialize `BudgetTracker` per loop. Use `WorkflowManager` for parallel coordination.
-**Event Streaming**: Yield `AgentEvent` objects during execution. Event types: "started", "search_complete", "judge_complete", "hypothesizing", "synthesizing", "complete", "error". Include iteration numbers and data payloads.
----
-## src/services/ - Service Rules
-**EmbeddingService**: Local sentence-transformers (NO API key required). All operations async-safe via `run_in_executor()`. ChromaDB for vector storage. Deduplication threshold: 0.85 (85% similarity = duplicate).
-**LlamaIndexRAGService**: Uses OpenAI embeddings (requires `OPENAI_API_KEY`). Methods: `ingest_evidence()`, `retrieve()`, `query()`. Returns documents with metadata (source, title, url, date, authors). Lazy initialization with graceful fallback.
-**StatisticalAnalyzer**: Generates Python code via LLM. Executes in Modal sandbox (secure, isolated). Library versions pinned in `SANDBOX_LIBRARIES` dict. Returns `AnalysisResult` with verdict (SUPPORTED/REFUTED/INCONCLUSIVE).
-**Singleton Pattern**: Use `@lru_cache(maxsize=1)` for singletons: `@lru_cache(maxsize=1); def get_service() -> Service: return Service()`. Lazy initialization to avoid requiring dependencies at import time.
----
-## src/utils/ - Utility Rules
-**Models**: All Pydantic models in `src/utils/models.py`. Use frozen models (`model_config = {"frozen": True}`) except where mutation needed. Use `Field()` with descriptions. Validate with constraints.
-**Config**: Settings via Pydantic Settings (`src/utils/config.py`). Load from `.env` automatically. Use `settings` singleton: `from src.utils.config import settings`. Validate API keys with properties: `has_openai_key`, `has_anthropic_key`.
-**Exceptions**: Custom exception hierarchy in `src/utils/exceptions.py`. Base: `DeepCriticalError`. Specific: `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions.
-**LLM Factory**: Centralized LLM model creation in `src/utils/llm_factory.py`. Supports OpenAI, Anthropic, HF Inference. Use `get_model()` or factory functions. Check requirements before initialization.
-**Citation Validator**: Use `validate_references()` from `src/utils/citation_validator.py`. Removes hallucinated citations (URLs not in evidence). Logs warnings. Returns validated report string.
----
-## src/orchestrator_factory.py Rules
-**Purpose**: Factory for creating orchestrators. Supports "simple" (legacy) and "advanced" (magentic) modes. Auto-detects mode based on API key availability.
-**Pattern**: Lazy import for optional dependencies (`_get_magentic_orchestrator_class()`). Handles `ImportError` gracefully with clear error messages.
-**Mode Detection**: `_determine_mode()` checks explicit mode or auto-detects: "advanced" if `settings.has_openai_key`, else "simple". Maps "magentic" → "advanced".
-**Function Signature**: `create_orchestrator(search_handler, judge_handler, config, mode) -> Any`. Simple mode requires handlers. Advanced mode uses MagenticOrchestrator.
-**Error Handling**: Raise `ValueError` with clear messages if requirements not met. Log mode selection with structlog.
----
-## src/orchestrator_hierarchical.py Rules
-**Purpose**: Hierarchical orchestrator using middleware and sub-teams. Adapts Magentic ChatAgent to SubIterationTeam protocol.
-**Pattern**: Uses `SubIterationMiddleware` with `ResearchTeam` and `LLMSubIterationJudge`. Event-driven via callback queue.
-**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated, but kept for compatibility).
-**Event Streaming**: Uses `asyncio.Queue` for event coordination. Yields `AgentEvent` objects. Handles event callback pattern with `asyncio.wait()`.
-**Error Handling**: Log errors with context. Yield error events. Process remaining events after task completion.
----
-## src/orchestrator_magentic.py Rules
-**Purpose**: Magentic-based orchestrator using ChatAgent pattern. Each agent has internal LLM. Manager orchestrates agents.
-**Pattern**: Uses `MagenticBuilder` with participants (searcher, hypothesizer, judge, reporter). Manager uses `OpenAIChatClient`. Workflow built in `_build_workflow()`.
-**Event Processing**: `_process_event()` converts Magentic events to `AgentEvent`. Handles: `MagenticOrchestratorMessageEvent`, `MagenticAgentMessageEvent`, `MagenticFinalResultEvent`, `MagenticAgentDeltaEvent`, `WorkflowOutputEvent`.
-**Text Extraction**: `_extract_text()` defensively extracts text from messages. Priority: `.content` → `.text` → `str(message)`. Handles buggy message objects.
-**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated).
-**Requirements**: Must call `check_magentic_requirements()` in `__init__`. Requires `agent-framework-core` and OpenAI API key.
-**Event Types**: Maps agent names to event types: "search" → "search_complete", "judge" → "judge_complete", "hypothes" → "hypothesizing", "report" → "synthesizing".
----
-## src/agent_factory/ - Factory Rules
-**Pattern**: Factory functions for creating agents and handlers. Lazy initialization for optional dependencies. Support OpenAI/Anthropic/HF Inference.
-**Judges**: `create_judge_handler()` creates `JudgeHandler` with structured output (`JudgeAssessment`). Supports `MockJudgeHandler`, `HFInferenceJudgeHandler` as fallbacks.
-**Agents**: Factory functions in `agents.py` for all Pydantic AI agents. Pattern: `create_agent_name(model: Any | None = None) -> AgentName`. Use `get_model()` if model not provided.
-**Graph Builder**: `graph_builder.py` contains utilities for building research graphs. Supports iterative and deep research graph construction.
-**Error Handling**: Raise `ConfigurationError` if required API keys missing. Log agent creation. Handle import errors gracefully.
----
-## src/prompts/ - Prompt Rules
-**Pattern**: System prompts stored as module-level constants. Include date injection: `datetime.now().strftime("%Y-%m-%d")`. Format evidence with truncation (1500 chars per item).
-**Judge Prompts**: In `judge.py`. Handle empty evidence case separately. Always request structured JSON output.
-**Hypothesis Prompts**: In `hypothesis.py`. Use diverse evidence selection (MMR algorithm). Sentence-aware truncation.
-**Report Prompts**: In `report.py`. Include full citation details. Use diverse evidence selection (n=20). Emphasize citation validation rules.
----
-## Testing Rules
-**Structure**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`).
-**Mocking**: Use `respx` for httpx mocking. Use `pytest-mock` for general mocking. Mock LLM calls in unit tests (use `MockJudgeHandler`).
-**Fixtures**: Common fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response`.
-**Coverage**: Aim for >80% coverage. Test error handling, edge cases, and integration paths.
----
-## File-Specific Agent Rules
-**knowledge_gap.py**: Outputs `KnowledgeGapOutput`. System prompt evaluates research completeness. Handles conversation history. Returns fallback on error.
-**writer.py**: Returns markdown string. System prompt includes citation format examples. Validates inputs. Truncates long findings. Retry logic for transient failures.
-**long_writer.py**: Uses `ReportDraft` input/output. Writes sections iteratively. Reformats references (deduplicates, renumbers). Reformats section headings.
-**proofreader.py**: Takes `ReportDraft`, returns polished markdown. Removes duplicates. Adds summary. Preserves references.
-**tool_selector.py**: Outputs `AgentSelectionPlan`. System prompt lists available agents (WebSearchAgent, SiteCrawlerAgent, RAGAgent). Guidelines for when to use each.
-**thinking.py**: Returns observation string. Generates observations from conversation history. Uses query and background context.
-**input_parser.py**: Outputs `ParsedQuery`. Detects research mode (iterative/deep). Extracts entities and research questions. Improves/refines query.

.env.example CHANGED Viewed

@@ -1,83 +1,63 @@
-# HuggingFace
-HF_TOKEN=your_huggingface_token_here
-# OpenAI (optional)
-OPENAI_API_KEY=your_openai_key_here
-# Anthropic (optional)
-ANTHROPIC_API_KEY=your_anthropic_key_here
 # Model names (optional - sensible defaults set in config.py)
-# ANTHROPIC_MODEL=claude-sonnet-4-5-20250929
 # OPENAI_MODEL=gpt-5.1
-# ============================================
-# Audio Processing Configuration (TTS)
-# ============================================
-# Kokoro TTS Model Configuration
-TTS_MODEL=hexgrad/Kokoro-82M
-TTS_VOICE=af_heart
-TTS_SPEED=1.0
-TTS_GPU=T4
-TTS_TIMEOUT=60
-# Available TTS Voices:
-# American English Female: af_heart, af_bella, af_nicole, af_aoede, af_kore, af_sarah, af_nova, af_sky, af_alloy, af_jessica, af_river
-# American English Male: am_michael, am_fenrir, am_puck, am_echo, am_eric, am_liam, am_onyx, am_santa, am_adam
-# Available GPU Types (Modal):
-# T4 - Cheapest, good for testing (default)
-# A10 - Good balance of cost/performance
-# A100 - Fastest, most expensive
-# L4 - NVIDIA L4 GPU
-# L40S - NVIDIA L40S GPU
-# Note: GPU type is set at function definition time. Changes require app restart.
-# ============================================
-# Audio Processing Configuration (STT)
-# ============================================
-# Speech-to-Text API Configuration
-STT_API_URL=nvidia/canary-1b-v2
-STT_SOURCE_LANG=English
-STT_TARGET_LANG=English
-# Available STT Languages:
-# English, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish, Russian, Ukrainian
-# ============================================
-# Audio Feature Flags
-# ============================================
-ENABLE_AUDIO_INPUT=true
-ENABLE_AUDIO_OUTPUT=true
-# ============================================
-# Image OCR Configuration
-# ============================================
-OCR_API_URL=prithivMLmods/Multimodal-OCR3
-ENABLE_IMAGE_INPUT=true
-# ============== EMBEDDINGS ==============
-# OpenAI Embedding Model (used if LLM_PROVIDER is openai and performing RAG/Embeddings)
-OPENAI_EMBEDDING_MODEL=text-embedding-3-small
-# Local Embedding Model (used for local/offline embeddings)
-LOCAL_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
-# ============== HUGGINGFACE (FREE TIER) ==============
-# HuggingFace Token - enables Llama 3.1 (best quality free model)
 # Get yours at: https://huggingface.co/settings/tokens
-#
-# WITHOUT HF_TOKEN: Falls back to ungated models (zephyr-7b-beta)
-# WITH HF_TOKEN: Uses Llama 3.1 8B Instruct (requires accepting license)
 #
 # For HuggingFace Spaces deployment:
 #   Set this as a "Secret" in Space Settings -> Variables and secrets
 #   Users/judges don't need their own token - the Space secret is used
 #
 HF_TOKEN=hf_your-token-here
 # ============== AGENT CONFIGURATION ==============
@@ -85,23 +65,60 @@ MAX_ITERATIONS=10
 SEARCH_TIMEOUT=30
 LOG_LEVEL=INFO
-# ============================================
-# Modal Configuration (Required for TTS)
-# ============================================
-# Modal credentials are required for TTS (Text-to-Speech) functionality
-# Get your credentials from: https://modal.com/
-MODAL_TOKEN_ID=your_modal_token_id_here
-MODAL_TOKEN_SECRET=your_modal_token_secret_here
 # ============== EXTERNAL SERVICES ==============
-# PubMed (optional - higher rate limits)
 NCBI_API_KEY=your-ncbi-key-here
-# Vector Database (optional - for LlamaIndex RAG)
 CHROMA_DB_PATH=./chroma_db
-# Neo4j Knowledge Graph
-NEO4J_URI=bolt://localhost:7687
-NEO4J_USER=neo4j
-NEO4J_PASSWORD=your_neo4j_password_here
-NEO4J_DATABASE=your_database_name

+# ============== LLM CONFIGURATION ==============
+# Provider: "openai", "anthropic", or "huggingface"
+LLM_PROVIDER=openai
+# API Keys (at least one required for full LLM analysis)
+OPENAI_API_KEY=sk-your-key-here
+ANTHROPIC_API_KEY=sk-ant-your-key-here
 # Model names (optional - sensible defaults set in config.py)
 # OPENAI_MODEL=gpt-5.1
+# ANTHROPIC_MODEL=claude-sonnet-4-5-20250929
+# ============== HUGGINGFACE CONFIGURATION ==============
+# HuggingFace Token - enables gated models and higher rate limits
 # Get yours at: https://huggingface.co/settings/tokens
+#
+# WITHOUT HF_TOKEN: Falls back to ungated models (zephyr-7b-beta, Qwen2-7B)
+# WITH HF_TOKEN: Uses gated models (Llama 3.1, Gemma-2) via inference providers
 #
 # For HuggingFace Spaces deployment:
 #   Set this as a "Secret" in Space Settings -> Variables and secrets
 #   Users/judges don't need their own token - the Space secret is used
 #
 HF_TOKEN=hf_your-token-here
+# Alternative: HUGGINGFACE_API_KEY (same as HF_TOKEN)
+# Default HuggingFace model for inference (gated, requires auth)
+# Can be overridden in UI dropdown
+# Latest reasoning models: Qwen3-Next-80B-A3B-Thinking, Qwen3-Next-80B-A3B-Instruct, Llama-3.3-70B-Instruct
+HUGGINGFACE_MODEL=Qwen/Qwen3-Next-80B-A3B-Thinking
+# Fallback models for HuggingFace Inference API (comma-separated)
+# Models are tried in order until one succeeds
+# Format: model1,model2,model3
+# Latest reasoning models first, then reliable fallbacks
+# Reasoning models: Qwen3-Next (thinking/instruct), Llama-3.3-70B, Qwen3-235B
+# Fallbacks: Llama-3.1-8B, Zephyr-7B (ungated), Qwen2-7B (ungated)
+HF_FALLBACK_MODELS=Qwen/Qwen3-Next-80B-A3B-Thinking,Qwen/Qwen3-Next-80B-A3B-Instruct,meta-llama/Llama-3.3-70B-Instruct,meta-llama/Llama-3.1-8B-Instruct,HuggingFaceH4/zephyr-7b-beta,Qwen/Qwen2-7B-Instruct
+# Override model/provider selection (optional, usually set via UI)
+# HF_MODEL=Qwen/Qwen3-Next-80B-A3B-Thinking
+# HF_PROVIDER=hyperbolic
+# ============== EMBEDDING CONFIGURATION ==============
+# Embedding Provider: "openai", "local", or "huggingface"
+# Default: "local" (no API key required)
+EMBEDDING_PROVIDER=local
+# OpenAI Embedding Model (used if EMBEDDING_PROVIDER=openai)
+OPENAI_EMBEDDING_MODEL=text-embedding-3-small
+# Local Embedding Model (sentence-transformers, used if EMBEDDING_PROVIDER=local)
+# BAAI/bge-small-en-v1.5 is newer, faster, and better than all-MiniLM-L6-v2
+LOCAL_EMBEDDING_MODEL=BAAI/bge-small-en-v1.5
+# HuggingFace Embedding Model (used if EMBEDDING_PROVIDER=huggingface)
+HUGGINGFACE_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
 # ============== AGENT CONFIGURATION ==============
 SEARCH_TIMEOUT=30
 LOG_LEVEL=INFO
+# Graph-based execution (experimental)
+# USE_GRAPH_EXECUTION=false
+# Budget & Rate Limiting
+# DEFAULT_TOKEN_LIMIT=100000
+# DEFAULT_TIME_LIMIT_MINUTES=10
+# DEFAULT_ITERATIONS_LIMIT=10
+# ============== WEB SEARCH CONFIGURATION ==============
+# Web Search Provider: "serper", "searchxng", "brave", "tavily", or "duckduckgo"
+# Default: "duckduckgo" (no API key required)
+WEB_SEARCH_PROVIDER=duckduckgo
+# Serper API Key (for Google search via Serper)
+# SERPER_API_KEY=your-serper-key-here
+# SearchXNG Host URL (for self-hosted search)
+# SEARCHXNG_HOST=http://localhost:8080
+# Brave Search API Key
+# BRAVE_API_KEY=your-brave-key-here
+# Tavily API Key
+# TAVILY_API_KEY=your-tavily-key-here
 # ============== EXTERNAL SERVICES ==============
+# PubMed (optional - higher rate limits: 10 req/sec vs 3 req/sec)
 NCBI_API_KEY=your-ncbi-key-here
+# Modal (optional - for secure code execution sandbox)
+# MODAL_TOKEN_ID=your-modal-token-id
+# MODAL_TOKEN_SECRET=your-modal-token-secret
+# ============== VECTOR DATABASE (ChromaDB) ==============
+# ChromaDB storage path
 CHROMA_DB_PATH=./chroma_db
+# Persist ChromaDB to disk (default: true)
+# CHROMA_DB_PERSIST=true
+# Remote ChromaDB server (optional)
+# CHROMA_DB_HOST=localhost
+# CHROMA_DB_PORT=8000
+# ============== RAG SERVICE CONFIGURATION ==============
+# ChromaDB collection name for RAG
+# RAG_COLLECTION_NAME=deepcritical_evidence
+# Number of top results to retrieve from RAG
+# RAG_SIMILARITY_TOP_K=5
+# Automatically ingest evidence into RAG
+# RAG_AUTO_INGEST=true

.github/README.md CHANGED Viewed

@@ -3,7 +3,8 @@
 > **You are reading the Github README!**
 >
 > - 📚 **Documentation**: See our [technical documentation](https://deepcritical.github.io/GradioDemo/) for detailed information
-> - 📖 **Demo README**: Check out the [Demo README](..README.md) for more information > - 🏆 **Demo**: Kindly consider using our [Free Demo](https://hf.co/DataQuests/GradioDemo)
 <div align="center">
@@ -37,7 +38,15 @@ gradio run "src/app.py"
 Open your browser to `http://localhost:7860`.
-### 3. Connect via MCP
 This application exposes a Model Context Protocol (MCP) server, allowing you to use its search tools directly from Claude Desktop or other MCP clients.

 > **You are reading the Github README!**
 >
 > - 📚 **Documentation**: See our [technical documentation](https://deepcritical.github.io/GradioDemo/) for detailed information
+> - 📖 **Demo README**: Check out the [Demo README](..README.md) for setup, configuration, and contribution guidelines
+> - 🏆 **Hackathon Submission**: Keep reading below for more information about our MCP Hackathon submission
 <div align="center">
 Open your browser to `http://localhost:7860`.
+### 3. Authentication (Optional)
+**HuggingFace OAuth Login**:
+- Click the "Sign in with HuggingFace" button at the top of the app
+- Your HuggingFace API token will be automatically used for AI inference
+- No need to manually enter API keys when logged in
+- OAuth token is used only for the current session and never stored
+### 4. Connect via MCP
 This application exposes a Model Context Protocol (MCP) server, allowing you to use its search tools directly from Claude Desktop or other MCP clients.

.github/scripts/deploy_to_hf_space.py DELETED Viewed

@@ -1,391 +0,0 @@
-"""Deploy repository to Hugging Face Space, excluding unnecessary files."""
-import os
-import shutil
-import subprocess
-import tempfile
-from pathlib import Path
-from huggingface_hub import HfApi
-def get_excluded_dirs() -> set[str]:
-    """Get set of directory names to exclude from deployment."""
-    return {
-        "docs",
-        "dev",
-        "folder",
-        "site",
-        "tests",  # Optional - can be included if desired
-        "examples",  # Optional - can be included if desired
-        ".git",
-        ".github",
-        "__pycache__",
-        ".pytest_cache",
-        ".mypy_cache",
-        ".ruff_cache",
-        ".venv",
-        "venv",
-        "env",
-        "ENV",
-        "node_modules",
-        ".cursor",
-        "reference_repos",
-        "burner_docs",
-        "chroma_db",
-        "logs",
-        "build",
-        "dist",
-        ".eggs",
-        "htmlcov",
-        "hf_space",  # Exclude the cloned HF Space directory itself
-    }
-def get_excluded_files() -> set[str]:
-    """Get set of file names to exclude from deployment."""
-    return {
-        ".pre-commit-config.yaml",
-        "mkdocs.yml",
-        "uv.lock",
-        "AGENTS.txt",
-        ".env",
-        ".env.local",
-        "*.local",
-        ".DS_Store",
-        "Thumbs.db",
-        "*.log",
-        ".coverage",
-        "coverage.xml",
-    }
-def should_exclude(path: Path, excluded_dirs: set[str], excluded_files: set[str]) -> bool:
-    """Check if a path should be excluded from deployment."""
-    # Check if any parent directory is excluded
-    for parent in path.parents:
-        if parent.name in excluded_dirs:
-            return True
-    # Check if the path itself is a directory that should be excluded
-    if path.is_dir() and path.name in excluded_dirs:
-        return True
-    # Check if the file name matches excluded patterns
-    if path.is_file():
-        # Check exact match
-        if path.name in excluded_files:
-            return True
-        # Check pattern matches (simple wildcard support)
-        for pattern in excluded_files:
-            if "*" in pattern:
-                # Simple pattern matching (e.g., "*.log")
-                suffix = pattern.replace("*", "")
-                if path.name.endswith(suffix):
-                    return True
-    return False
-def deploy_to_hf_space() -> None:
-    """Deploy repository to Hugging Face Space.
-    Supports both user and organization Spaces:
-    - User Space: username/space-name
-    - Organization Space: organization-name/space-name
-    Works with both classic tokens and fine-grained tokens.
-    """
-    # Get configuration from environment variables
-    hf_token = os.getenv("HF_TOKEN")
-    hf_username = os.getenv("HF_USERNAME")  # Can be username or organization name
-    space_name = os.getenv("HF_SPACE_NAME")
-    # Check which variables are missing and provide helpful error message
-    missing = []
-    if not hf_token:
-        missing.append("HF_TOKEN (should be in repository secrets)")
-    if not hf_username:
-        missing.append("HF_USERNAME (should be in repository variables)")
-    if not space_name:
-        missing.append("HF_SPACE_NAME (should be in repository variables)")
-    if missing:
-        raise ValueError(
-            f"Missing required environment variables: {', '.join(missing)}\n"
-            f"Please configure:\n"
-            f"  - HF_TOKEN in Settings > Secrets and variables > Actions > Secrets\n"
-            f"  - HF_USERNAME in Settings > Secrets and variables > Actions > Variables\n"
-            f"  - HF_SPACE_NAME in Settings > Secrets and variables > Actions > Variables"
-        )
-    # HF_USERNAME can be either a username or organization name
-    # Format: {username|organization}/{space_name}
-    repo_id = f"{hf_username}/{space_name}"
-    local_dir = "hf_space"
-    print(f"🚀 Deploying to Hugging Face Space: {repo_id}")
-    # Initialize HF API
-    api = HfApi(token=hf_token)
-    # Create Space if it doesn't exist
-    try:
-        api.repo_info(repo_id=repo_id, repo_type="space", token=hf_token)
-        print(f"✅ Space exists: {repo_id}")
-    except Exception:
-        print(f"⚠️  Space does not exist, creating: {repo_id}")
-        # Create new repository
-        # Note: For organizations, repo_id should be "org/space-name"
-        # For users, repo_id should be "username/space-name"
-        api.create_repo(
-            repo_id=repo_id,  # Full repo_id including owner
-            repo_type="space",
-            space_sdk="gradio",
-            token=hf_token,
-            exist_ok=True,
-        )
-        print(f"✅ Created new Space: {repo_id}")
-    # Configure Git credential helper for authentication
-    # This is needed for Git LFS to work properly with fine-grained tokens
-    print("🔐 Configuring Git credentials...")
-    # Use Git credential store to store the token
-    # This allows Git LFS to authenticate properly
-    temp_dir = Path(tempfile.gettempdir())
-    credential_store = temp_dir / ".git-credentials-hf"
-    # Write credentials in the format: https://username:[email protected]
-    credential_store.write_text(
-        f"https://{hf_username}:{hf_token}@huggingface.co\n", encoding="utf-8"
-    )
-    try:
-        credential_store.chmod(0o600)  # Secure permissions (Unix only)
-    except OSError:
-        # Windows doesn't support chmod, skip
-        pass
-    # Configure Git to use the credential store
-    subprocess.run(
-        ["git", "config", "--global", "credential.helper", f"store --file={credential_store}"],
-        check=True,
-        capture_output=True,
-    )
-    # Also set environment variable for Git LFS
-    os.environ["GIT_CREDENTIAL_HELPER"] = f"store --file={credential_store}"
-    # Clone repository using git
-    # Use the token in the URL for initial clone, but LFS will use credential store
-    space_url = f"https://{hf_username}:{hf_token}@huggingface.co/spaces/{repo_id}"
-    if Path(local_dir).exists():
-        print(f"🧹 Removing existing {local_dir} directory...")
-        shutil.rmtree(local_dir)
-    print("📥 Cloning Space repository...")
-    try:
-        result = subprocess.run(
-            ["git", "clone", space_url, local_dir],
-            check=True,
-            capture_output=True,
-            text=True,
-        )
-        print("✅ Cloned Space repository")
-        # After clone, configure the remote to use credential helper
-        # This ensures future operations (like push) use the credential store
-        os.chdir(local_dir)
-        subprocess.run(
-            ["git", "remote", "set-url", "origin", f"https://huggingface.co/spaces/{repo_id}"],
-            check=True,
-            capture_output=True,
-        )
-        os.chdir("..")
-    except subprocess.CalledProcessError as e:
-        error_msg = e.stderr if e.stderr else e.stdout if e.stdout else "Unknown error"
-        print(f"❌ Failed to clone Space repository: {error_msg}")
-        # Try alternative: clone with LFS skip, then fetch LFS files separately
-        print("🔄 Trying alternative clone method (skip LFS during clone)...")
-        try:
-            env = os.environ.copy()
-            env["GIT_LFS_SKIP_SMUDGE"] = "1"  # Skip LFS during clone
-            subprocess.run(
-                ["git", "clone", space_url, local_dir],
-                check=True,
-                capture_output=True,
-                text=True,
-                env=env,
-            )
-            print("✅ Cloned Space repository (LFS skipped)")
-            # Configure remote
-            os.chdir(local_dir)
-            subprocess.run(
-                ["git", "remote", "set-url", "origin", f"https://huggingface.co/spaces/{repo_id}"],
-                check=True,
-                capture_output=True,
-            )
-            # Try to fetch LFS files with proper authentication
-            print("📥 Fetching LFS files...")
-            subprocess.run(
-                ["git", "lfs", "pull"],
-                check=False,  # Don't fail if LFS pull fails - we'll continue without LFS files
-                capture_output=True,
-                text=True,
-            )
-            os.chdir("..")
-            print("✅ Repository cloned (LFS files may be incomplete, but deployment can continue)")
-        except subprocess.CalledProcessError as e2:
-            error_msg2 = e2.stderr if e2.stderr else e2.stdout if e2.stdout else "Unknown error"
-            print(f"❌ Alternative clone method also failed: {error_msg2}")
-            raise RuntimeError(f"Git clone failed: {error_msg}") from e
-    # Get exclusion sets
-    excluded_dirs = get_excluded_dirs()
-    excluded_files = get_excluded_files()
-    # Remove all existing files in HF Space (except .git)
-    print("🧹 Cleaning existing files...")
-    for item in Path(local_dir).iterdir():
-        if item.name == ".git":
-            continue
-        if item.is_dir():
-            shutil.rmtree(item)
-        else:
-            item.unlink()
-    # Copy files from repository root
-    print("📦 Copying files...")
-    repo_root = Path(".")
-    files_copied = 0
-    dirs_copied = 0
-    for item in repo_root.rglob("*"):
-        # Skip if in .git directory
-        if ".git" in item.parts:
-            continue
-        # Skip if in hf_space directory (the cloned Space directory)
-        if "hf_space" in item.parts:
-            continue
-        # Skip if should be excluded
-        if should_exclude(item, excluded_dirs, excluded_files):
-            continue
-        # Calculate relative path
-        try:
-            rel_path = item.relative_to(repo_root)
-        except ValueError:
-            # Item is outside repo root, skip
-            continue
-        # Skip if in excluded directory
-        if any(part in excluded_dirs for part in rel_path.parts):
-            continue
-        # Destination path
-        dest_path = Path(local_dir) / rel_path
-        # Create parent directories
-        dest_path.parent.mkdir(parents=True, exist_ok=True)
-        # Copy file or directory
-        if item.is_file():
-            shutil.copy2(item, dest_path)
-            files_copied += 1
-        elif item.is_dir():
-            # Directory will be created by parent mkdir, but we track it
-            dirs_copied += 1
-    print(f"✅ Copied {files_copied} files and {dirs_copied} directories")
-    # Commit and push changes using git
-    print("💾 Committing changes...")
-    # Change to the Space directory
-    original_cwd = os.getcwd()
-    os.chdir(local_dir)
-    try:
-        # Configure git user (required for commit)
-        subprocess.run(
-            ["git", "config", "user.name", "github-actions[bot]"],
-            check=True,
-            capture_output=True,
-        )
-        subprocess.run(
-            ["git", "config", "user.email", "github-actions[bot]@users.noreply.github.com"],
-            check=True,
-            capture_output=True,
-        )
-        # Add all files
-        subprocess.run(
-            ["git", "add", "."],
-            check=True,
-            capture_output=True,
-        )
-        # Check if there are changes to commit
-        result = subprocess.run(
-            ["git", "status", "--porcelain"],
-            check=False,
-            capture_output=True,
-            text=True,
-        )
-        if result.stdout.strip():
-            # There are changes, commit and push
-            subprocess.run(
-                ["git", "commit", "-m", "Deploy to Hugging Face Space [skip ci]"],
-                check=True,
-                capture_output=True,
-            )
-            print("📤 Pushing to Hugging Face Space...")
-            # Ensure remote URL uses credential helper (not token in URL)
-            subprocess.run(
-                ["git", "remote", "set-url", "origin", f"https://huggingface.co/spaces/{repo_id}"],
-                check=True,
-                capture_output=True,
-            )
-            subprocess.run(
-                ["git", "push"],
-                check=True,
-                capture_output=True,
-            )
-            print("✅ Deployment complete!")
-        else:
-            print("ℹ️  No changes to commit (repository is up to date)")
-    except subprocess.CalledProcessError as e:
-        error_msg = e.stderr if e.stderr else (e.stdout if e.stdout else str(e))
-        if isinstance(error_msg, bytes):
-            error_msg = error_msg.decode("utf-8", errors="replace")
-        if "nothing to commit" in error_msg.lower():
-            print("ℹ️  No changes to commit (repository is up to date)")
-        else:
-            print(f"⚠️  Error during git operations: {error_msg}")
-            raise RuntimeError(f"Git operation failed: {error_msg}") from e
-    finally:
-        # Return to original directory
-        os.chdir(original_cwd)
-        # Clean up credential store for security
-        try:
-            if credential_store.exists():
-                credential_store.unlink()
-        except Exception:
-            # Ignore cleanup errors
-            pass
-    print(f"🎉 Successfully deployed to: https://huggingface.co/spaces/{repo_id}")
-if __name__ == "__main__":
-    deploy_to_hf_space()

.github/workflows/ci.yml CHANGED Viewed

@@ -2,9 +2,9 @@ name: CI
 on:
   push:
-    branches: [main, dev, develop]
   pull_request:
-    branches: [main, dev, develop]
 jobs:
   test:
@@ -16,6 +16,11 @@ jobs:
     steps:
       - uses: actions/checkout@v4
       - name: Set up Python ${{ matrix.python-version }}
         uses: actions/setup-python@v5
         with:
@@ -23,105 +28,53 @@ jobs:
       - name: Install dependencies
         run: |
-          python -m pip install --upgrade pip
-          pip install -e ".[dev]"
       - name: Lint with ruff
-        run: |
-          ruff check . --exclude tests
-          ruff format --check . --exclude tests
         continue-on-error: true
       - name: Type check with mypy
-        run: |
-          mypy src
         continue-on-error: true
-      - name: Install embedding dependencies
         run: |
-          pip install -e ".[embeddings]"
-      - name: Run unit tests (excluding OpenAI and embedding providers)
         env:
           HF_TOKEN: ${{ secrets.HF_TOKEN }}
         run: |
-          pytest tests/unit/ -v -m "not openai and not embedding_provider" --tb=short -p no:logfire --cov --cov-branch --cov-report=xml --cov-report=term
       - name: Run local embeddings tests
         env:
           HF_TOKEN: ${{ secrets.HF_TOKEN }}
         run: |
-          pytest tests/ -v -m "local_embeddings" --tb=short -p no:logfire --cov --cov-branch --cov-report=xml --cov-report=term --cov-append || true
         continue-on-error: true  # Allow failures if dependencies not available
       - name: Run HuggingFace integration tests
         env:
           HF_TOKEN: ${{ secrets.HF_TOKEN }}
         run: |
-          pytest tests/integration/ -v -m "huggingface and not embedding_provider" --tb=short -p no:logfire --cov --cov-branch --cov-report=xml --cov-report=term --cov-append || true
         continue-on-error: true  # Allow failures if HF_TOKEN not set
-      - name: Run non-OpenAI integration tests (excluding embedding providers)
         env:
           HF_TOKEN: ${{ secrets.HF_TOKEN }}
         run: |
-          pytest tests/integration/ -v -m "integration and not openai and not embedding_provider" --tb=short -p no:logfire --cov --cov-branch --cov-report=xml --cov-report=term --cov-append || true
         continue-on-error: true  # Allow failures if dependencies not available
       - name: Upload coverage reports to Codecov
         uses: codecov/codecov-action@v5
         with:
           token: ${{ secrets.CODECOV_TOKEN }}
           slug: DeepCritical/GradioDemo
-          files: ./coverage.xml
-          fail_ci_if_error: false
-        continue-on-error: true
-  docs:
-    runs-on: ubuntu-latest
-    permissions:
-      contents: write
-    if: github.event_name == 'push' && (github.ref == 'refs/heads/main' || github.ref == 'refs/heads/dev' || github.ref == 'refs/heads/develop')
-    steps:
-      - uses: actions/checkout@v4
-        with:
-          fetch-depth: 0
-      - name: Set up Python
-        uses: actions/setup-python@v5
-        with:
-          python-version: '3.11'
-      - name: Install uv
-        uses: astral-sh/setup-uv@v5
-        with:
-          version: "latest"
-      - name: Install dependencies
-        run: |
-          uv sync --extra dev
-      - name: Configure Git
-        run: |
-          git config user.name "github-actions[bot]"
-          git config user.email "github-actions[bot]@users.noreply.github.com"
-          git remote set-url origin https://x-access-token:${{ secrets.GITHUB_TOKEN }}@github.com/${{ github.repository }}.git
-      - name: Deploy to GitHub Pages
-        run: |
-          # mkdocs gh-deploy automatically creates .nojekyll, but let's verify
-          uv run mkdocs gh-deploy --force --message "Deploy docs [skip ci]" --strict
-          # Verify .nojekyll was created in gh-pages branch
-          git fetch origin gh-pages:gh-pages || true
-          git checkout gh-pages || true
-          if [ -f .nojekyll ]; then
-            echo "✓ .nojekyll file exists"
-          else
-            echo "⚠ .nojekyll file missing, creating it..."
-            touch .nojekyll
-            git add .nojekyll
-            git commit -m "Add .nojekyll to disable Jekyll [skip ci]" || true
-            git push origin gh-pages || true
-          fi
-        env:
-          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

 on:
   push:
+    branches: [main, dev]
   pull_request:
+    branches: [main, dev]
 jobs:
   test:
     steps:
       - uses: actions/checkout@v4
+      - name: Install uv
+        uses: astral-sh/setup-uv@v5
+        with:
+          version: "latest"
       - name: Set up Python ${{ matrix.python-version }}
         uses: actions/setup-python@v5
         with:
       - name: Install dependencies
         run: |
+          uv sync --extra dev
       - name: Lint with ruff
         continue-on-error: true
+        run: |
+          uv run ruff check . --exclude tests --exclude reference_repos
+          uv run ruff format --check . --exclude tests --exclude reference_repos
       - name: Type check with mypy
         continue-on-error: true
         run: |
+          uv run mypy src --ignore-missing-imports
+      - name: Run unit tests (No OpenAI/Anthropic, HuggingFace only)
         env:
           HF_TOKEN: ${{ secrets.HF_TOKEN }}
+          LLM_PROVIDER: huggingface
         run: |
+          uv run pytest tests/unit/ -v -m "not openai and not anthropic and not embedding_provider" --tb=short -p no:logfire --cov --cov-branch --cov-report=xml
       - name: Run local embeddings tests
         env:
           HF_TOKEN: ${{ secrets.HF_TOKEN }}
+          LLM_PROVIDER: huggingface
         run: |
+          uv run pytest tests/ -v -m "local_embeddings" --tb=short -p no:logfire --cov --cov-branch --cov-report=xml --cov-append || true
         continue-on-error: true  # Allow failures if dependencies not available
       - name: Run HuggingFace integration tests
         env:
           HF_TOKEN: ${{ secrets.HF_TOKEN }}
+          LLM_PROVIDER: huggingface
         run: |
+          uv run pytest tests/integration/ -v -m "huggingface and not embedding_provider" --tb=short -p no:logfire --cov --cov-branch --cov-report=xml --cov-append || true
         continue-on-error: true  # Allow failures if HF_TOKEN not set
+      - name: Run non-OpenAI/Anthropic integration tests (excluding embedding providers)
         env:
           HF_TOKEN: ${{ secrets.HF_TOKEN }}
+          LLM_PROVIDER: huggingface
         run: |
+          uv run pytest tests/integration/ -v -m "integration and not openai and not anthropic and not embedding_provider" --tb=short -p no:logfire --cov --cov-branch --cov-report=xml --cov-append || true
         continue-on-error: true  # Allow failures if dependencies not available
       - name: Upload coverage reports to Codecov
         uses: codecov/codecov-action@v5
+        continue-on-error: true
         with:
           token: ${{ secrets.CODECOV_TOKEN }}
           slug: DeepCritical/GradioDemo

.github/workflows/deploy-hf-space.yml DELETED Viewed

@@ -1,47 +0,0 @@
-name: Deploy to Hugging Face Space
-on:
-  push:
-    branches: [main]
-  workflow_dispatch:  # Allow manual triggering
-jobs:
-  deploy:
-    runs-on: ubuntu-latest
-    permissions:
-      contents: read
-      # No write permissions needed for GitHub repo (we're pushing to HF Space)
-    steps:
-      - name: Checkout Repository
-        uses: actions/checkout@v4
-        with:
-          fetch-depth: 0
-      - name: Set up Python
-        uses: actions/setup-python@v5
-        with:
-          python-version: '3.11'
-      - name: Install dependencies
-        run: |
-          pip install --upgrade pip
-          pip install huggingface-hub
-      - name: Deploy to Hugging Face Space
-        env:
-          # Token from secrets (sensitive data)
-          HF_TOKEN: ${{ secrets.HF_TOKEN }}
-          # Username/Organization from repository variables (non-sensitive)
-          HF_USERNAME: ${{ vars.HF_USERNAME }}
-          # Space name from repository variables (non-sensitive)
-          HF_SPACE_NAME: ${{ vars.HF_SPACE_NAME }}
-        run: |
-          python .github/scripts/deploy_to_hf_space.py
-      - name: Verify deployment
-        if: success()
-        run: |
-          echo "✅ Deployment completed successfully!"
-          echo "Space URL: https://huggingface.co/spaces/${{ vars.HF_USERNAME }}/${{ vars.HF_SPACE_NAME }}"

.github/workflows/docs.yml ADDED Viewed

	@@ -0,0 +1,61 @@

+name: Documentation
+on:
+  push:
+    branches:
+      - main
+      - dev
+    paths:
+      - 'docs/**'
+      - 'mkdocs.yml'
+      - '.github/workflows/docs.yml'
+  pull_request:
+    branches:
+      - main
+      - dev
+    paths:
+      - 'docs/**'
+      - 'mkdocs.yml'
+      - '.github/workflows/docs.yml'
+  workflow_dispatch:
+permissions:
+  contents: write
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.11'
+      - name: Install uv
+        uses: astral-sh/setup-uv@v5
+        with:
+          version: "latest"
+      - name: Install dependencies
+        run: |
+          uv sync --extra dev
+      - name: Build documentation
+        run: |
+          uv run mkdocs build --strict
+      - name: Deploy to GitHub Pages
+        if: (github.ref == 'refs/heads/main' || github.ref == 'refs/heads/dev') && github.event_name == 'push'
+        uses: peaceiris/actions-gh-pages@v3
+        with:
+          github_token: ${{ secrets.GITHUB_TOKEN }}
+          publish_dir: ./site
+          publish_branch: dev
+          cname: false
+          keep_files: true

.gitignore CHANGED Viewed

@@ -1,7 +1,10 @@
 folder/
 site/
 .cursor/
 .ruff_cache/
 # Python
 __pycache__/
 *.py[cod]
@@ -57,9 +60,6 @@ reference_repos/DeepCritical/
 # Keep the README in reference_repos
 !reference_repos/README.md
-# Development directory
-dev/
 # OS
 .DS_Store
 Thumbs.db
@@ -72,13 +72,12 @@ logs/
 .pytest_cache/
 .mypy_cache/
 .coverage
 htmlcov/
-test_output*.txt
 # Database files
 chroma_db/
 *.sqlite3
 # Trigger rebuild Wed Nov 26 17:51:41 EST 2025
-.env

+=0.22.0
+=0.22.0,
 folder/
 site/
 .cursor/
 .ruff_cache/
+docs/contributing/
 # Python
 __pycache__/
 *.py[cod]
 # Keep the README in reference_repos
 !reference_repos/README.md
 # OS
 .DS_Store
 Thumbs.db
 .pytest_cache/
 .mypy_cache/
 .coverage
+.coverage.*
+coverage.xml
 htmlcov/
 # Database files
 chroma_db/
 *.sqlite3
 # Trigger rebuild Wed Nov 26 17:51:41 EST 2025

.pre-commit-config.yaml CHANGED Viewed

@@ -1,20 +1,20 @@
 repos:
   - repo: https://github.com/astral-sh/ruff-pre-commit
-    rev: v0.4.4
     hooks:
       - id: ruff
-        args: [--fix, --exclude, tests]
         exclude: ^reference_repos/
       - id: ruff-format
-        args: [--exclude, tests]
         exclude: ^reference_repos/
   - repo: https://github.com/pre-commit/mirrors-mypy
-    rev: v1.10.0
     hooks:
       - id: mypy
         files: ^src/
-        exclude: ^folder|^src/app.py
         additional_dependencies:
           - pydantic>=2.7
           - pydantic-settings>=2.2
@@ -31,14 +31,9 @@ repos:
         types: [python]
         args: [
           "run",
-          "pytest",
-          "tests/unit/",
-          "-v",
-          "-m",
-          "not openai and not embedding_provider",
-          "--tb=short",
-          "-p",
-          "no:logfire",
         ]
         pass_filenames: false
         always_run: true
@@ -50,14 +45,9 @@ repos:
         types: [python]
         args: [
           "run",
-          "pytest",
-          "tests/",
-          "-v",
-          "-m",
-          "local_embeddings",
-          "--tb=short",
-          "-p",
-          "no:logfire",
         ]
         pass_filenames: false
         always_run: true

 repos:
   - repo: https://github.com/astral-sh/ruff-pre-commit
+    rev: v0.14.7  # Compatible with ruff>=0.14.6 (matches CI)
     hooks:
       - id: ruff
+        args: [--fix, --exclude, tests, --exclude, reference_repos]
         exclude: ^reference_repos/
       - id: ruff-format
+        args: [--exclude, tests, --exclude, reference_repos]
         exclude: ^reference_repos/
   - repo: https://github.com/pre-commit/mirrors-mypy
+    rev: v1.18.2  # Matches CI version mypy>=1.18.2
     hooks:
       - id: mypy
         files: ^src/
+        exclude: ^folder
         additional_dependencies:
           - pydantic>=2.7
           - pydantic-settings>=2.2
         types: [python]
         args: [
           "run",
+          "python",
+          ".pre-commit-hooks/run_pytest_with_sync.py",
+          "unit",
         ]
         pass_filenames: false
         always_run: true
         types: [python]
         args: [
           "run",
+          "python",
+          ".pre-commit-hooks/run_pytest_with_sync.py",
+          "embeddings",
         ]
         pass_filenames: false
         always_run: true

=0.22.0 ADDED Viewed

File without changes

=0.22.0, ADDED Viewed

File without changes

AGENTS.txt DELETED Viewed

@@ -1,236 +0,0 @@
-# DeepCritical Project - Rules
-## Project-Wide Rules
-**Architecture**: Multi-agent research system using Pydantic AI for agent orchestration, supporting iterative and deep research patterns. Uses middleware for state management, budget tracking, and workflow coordination.
-**Type Safety**: ALWAYS use complete type hints. All functions must have parameter and return type annotations. Use `mypy --strict` compliance. Use `TYPE_CHECKING` imports for circular dependencies: `from typing import TYPE_CHECKING; if TYPE_CHECKING: from src.services.embeddings import EmbeddingService`
-**Async Patterns**: ALL I/O operations must be async (`async def`, `await`). Use `asyncio.gather()` for parallel operations. CPU-bound work must use `run_in_executor()`: `loop = asyncio.get_running_loop(); result = await loop.run_in_executor(None, cpu_bound_function, args)`. Never block the event loop.
-**Error Handling**: Use custom exceptions from `src/utils/exceptions.py`: `DeepCriticalError`, `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions: `raise SearchError(...) from e`. Log with structlog: `logger.error("Operation failed", error=str(e), context=value)`.
-**Logging**: Use `structlog` for ALL logging (NOT `print` or `logging`). Import: `import structlog; logger = structlog.get_logger()`. Log with structured data: `logger.info("event", key=value)`. Use appropriate levels: DEBUG, INFO, WARNING, ERROR.
-**Pydantic Models**: All data exchange uses Pydantic models from `src/utils/models.py`. Models are frozen (`model_config = {"frozen": True}`) for immutability. Use `Field()` with descriptions. Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints.
-**Code Style**: Ruff with 100-char line length. Ignore rules: `PLR0913` (too many arguments), `PLR0912` (too many branches), `PLR0911` (too many returns), `PLR2004` (magic values), `PLW0603` (global statement), `PLC0415` (lazy imports).
-**Docstrings**: Google-style docstrings for all public functions. Include Args, Returns, Raises sections. Use type hints in docstrings only if needed for clarity.
-**Testing**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`). Use `respx` for httpx mocking, `pytest-mock` for general mocking.
-**State Management**: Use `ContextVar` in middleware for thread-safe isolation. Never use global mutable state (except singletons via `@lru_cache`). Use `WorkflowState` from `src/middleware/state_machine.py` for workflow state.
-**Citation Validation**: ALWAYS validate references before returning reports. Use `validate_references()` from `src/utils/citation_validator.py`. Remove hallucinated citations. Log warnings for removed citations.
----
-## src/agents/ - Agent Implementation Rules
-**Pattern**: All agents use Pydantic AI `Agent` class. Agents have structured output types (Pydantic models) or return strings. Use factory functions in `src/agent_factory/agents.py` for creation.
-**Agent Structure**:
-- System prompt as module-level constant (with date injection: `datetime.now().strftime("%Y-%m-%d")`)
-- Agent class with `__init__(model: Any | None = None)`
-- Main method (e.g., `async def evaluate()`, `async def write_report()`)
-- Factory function: `def create_agent_name(model: Any | None = None) -> AgentName`
-**Model Initialization**: Use `get_model()` from `src/agent_factory/judges.py` if no model provided. Support OpenAI/Anthropic/HF Inference via settings.
-**Error Handling**: Return fallback values (e.g., `KnowledgeGapOutput(research_complete=False, outstanding_gaps=[...])`) on failure. Log errors with context. Use retry logic (3 retries) in Pydantic AI Agent initialization.
-**Input Validation**: Validate query/inputs are not empty. Truncate very long inputs with warnings. Handle None values gracefully.
-**Output Types**: Use structured output types from `src/utils/models.py` (e.g., `KnowledgeGapOutput`, `AgentSelectionPlan`, `ReportDraft`). For text output (writer agents), return `str` directly.
-**Agent-Specific Rules**:
-- `knowledge_gap.py`: Outputs `KnowledgeGapOutput`. Evaluates research completeness.
-- `tool_selector.py`: Outputs `AgentSelectionPlan`. Selects tools (RAG/web/database).
-- `writer.py`: Returns markdown string. Includes citations in numbered format.
-- `long_writer.py`: Uses `ReportDraft` input/output. Handles section-by-section writing.
-- `proofreader.py`: Takes `ReportDraft`, returns polished markdown.
-- `thinking.py`: Returns observation string from conversation history.
-- `input_parser.py`: Outputs `ParsedQuery` with research mode detection.
----
-## src/tools/ - Search Tool Rules
-**Protocol**: All tools implement `SearchTool` protocol from `src/tools/base.py`: `name` property and `async def search(query, max_results) -> list[Evidence]`.
-**Rate Limiting**: Use `@retry` decorator from tenacity: `@retry(stop=stop_after_attempt(3), wait=wait_exponential(...))`. Implement `_rate_limit()` method for APIs with limits. Use shared rate limiters from `src/tools/rate_limiter.py`.
-**Error Handling**: Raise `SearchError` or `RateLimitError` on failures. Handle HTTP errors (429, 500, timeout). Return empty list on non-critical errors (log warning).
-**Query Preprocessing**: Use `preprocess_query()` from `src/tools/query_utils.py` to remove noise and expand synonyms.
-**Evidence Conversion**: Convert API responses to `Evidence` objects with `Citation`. Extract metadata (title, url, date, authors). Set relevance scores (0.0-1.0). Handle missing fields gracefully.
-**Tool-Specific Rules**:
-- `pubmed.py`: Use NCBI E-utilities (ESearch → EFetch). Rate limit: 0.34s between requests. Parse XML with `xmltodict`. Handle single vs. multiple articles.
-- `clinicaltrials.py`: Use `requests` library (NOT httpx - WAF blocks httpx). Run in thread pool: `await asyncio.to_thread(requests.get, ...)`. Filter: Only interventional studies, active/completed.
-- `europepmc.py`: Handle preprint markers: `[PREPRINT - Not peer-reviewed]`. Build URLs from DOI or PMID.
-- `rag_tool.py`: Wraps `LlamaIndexRAGService`. Returns Evidence from RAG results. Handles ingestion.
-- `search_handler.py`: Orchestrates parallel searches across multiple tools. Uses `asyncio.gather()` with `return_exceptions=True`. Aggregates results into `SearchResult`.
----
-## src/middleware/ - Middleware Rules
-**State Management**: Use `ContextVar` for thread-safe isolation. `WorkflowState` uses `ContextVar[WorkflowState | None]`. Initialize with `init_workflow_state(embedding_service)`. Access with `get_workflow_state()` (auto-initializes if missing).
-**WorkflowState**: Tracks `evidence: list[Evidence]`, `conversation: Conversation`, `embedding_service: Any`. Methods: `add_evidence()` (deduplicates by URL), `async search_related()` (semantic search).
-**WorkflowManager**: Manages parallel research loops. Methods: `add_loop()`, `run_loops_parallel()`, `update_loop_status()`, `sync_loop_evidence_to_state()`. Uses `asyncio.gather()` for parallel execution. Handles errors per loop (don't fail all if one fails).
-**BudgetTracker**: Tracks tokens, time, iterations per loop and globally. Methods: `create_budget()`, `add_tokens()`, `start_timer()`, `update_timer()`, `increment_iteration()`, `check_budget()`, `can_continue()`. Token estimation: `estimate_tokens(text)` (~4 chars per token), `estimate_llm_call_tokens(prompt, response)`.
-**Models**: All middleware models in `src/utils/models.py`. `IterationData`, `Conversation`, `ResearchLoop`, `BudgetStatus` are used by middleware.
----
-## src/orchestrator/ - Orchestration Rules
-**Research Flows**: Two patterns: `IterativeResearchFlow` (single loop) and `DeepResearchFlow` (plan → parallel loops → synthesis). Both support agent chains (`use_graph=False`) and graph execution (`use_graph=True`).
-**IterativeResearchFlow**: Pattern: Generate observations → Evaluate gaps → Select tools → Execute → Judge → Continue/Complete. Uses `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`, `WriterAgent`, `JudgeHandler`. Tracks iterations, time, budget.
-**DeepResearchFlow**: Pattern: Planner → Parallel iterative loops per section → Synthesizer. Uses `PlannerAgent`, `IterativeResearchFlow` (per section), `LongWriterAgent` or `ProofreaderAgent`. Uses `WorkflowManager` for parallel execution.
-**Graph Orchestrator**: Uses Pydantic AI Graphs (when available) or agent chains (fallback). Routes based on research mode (iterative/deep/auto). Streams `AgentEvent` objects for UI.
-**State Initialization**: Always call `init_workflow_state()` before running flows. Initialize `BudgetTracker` per loop. Use `WorkflowManager` for parallel coordination.
-**Event Streaming**: Yield `AgentEvent` objects during execution. Event types: "started", "search_complete", "judge_complete", "hypothesizing", "synthesizing", "complete", "error". Include iteration numbers and data payloads.
----
-## src/services/ - Service Rules
-**EmbeddingService**: Local sentence-transformers (NO API key required). All operations async-safe via `run_in_executor()`. ChromaDB for vector storage. Deduplication threshold: 0.85 (85% similarity = duplicate).
-**LlamaIndexRAGService**: Uses OpenAI embeddings (requires `OPENAI_API_KEY`). Methods: `ingest_evidence()`, `retrieve()`, `query()`. Returns documents with metadata (source, title, url, date, authors). Lazy initialization with graceful fallback.
-**StatisticalAnalyzer**: Generates Python code via LLM. Executes in Modal sandbox (secure, isolated). Library versions pinned in `SANDBOX_LIBRARIES` dict. Returns `AnalysisResult` with verdict (SUPPORTED/REFUTED/INCONCLUSIVE).
-**Singleton Pattern**: Use `@lru_cache(maxsize=1)` for singletons: `@lru_cache(maxsize=1); def get_service() -> Service: return Service()`. Lazy initialization to avoid requiring dependencies at import time.
----
-## src/utils/ - Utility Rules
-**Models**: All Pydantic models in `src/utils/models.py`. Use frozen models (`model_config = {"frozen": True}`) except where mutation needed. Use `Field()` with descriptions. Validate with constraints.
-**Config**: Settings via Pydantic Settings (`src/utils/config.py`). Load from `.env` automatically. Use `settings` singleton: `from src.utils.config import settings`. Validate API keys with properties: `has_openai_key`, `has_anthropic_key`.
-**Exceptions**: Custom exception hierarchy in `src/utils/exceptions.py`. Base: `DeepCriticalError`. Specific: `SearchError`, `RateLimitError`, `JudgeError`, `ConfigurationError`. Always chain exceptions.
-**LLM Factory**: Centralized LLM model creation in `src/utils/llm_factory.py`. Supports OpenAI, Anthropic, HF Inference. Use `get_model()` or factory functions. Check requirements before initialization.
-**Citation Validator**: Use `validate_references()` from `src/utils/citation_validator.py`. Removes hallucinated citations (URLs not in evidence). Logs warnings. Returns validated report string.
----
-## src/orchestrator_factory.py Rules
-**Purpose**: Factory for creating orchestrators. Supports "simple" (legacy) and "advanced" (magentic) modes. Auto-detects mode based on API key availability.
-**Pattern**: Lazy import for optional dependencies (`_get_magentic_orchestrator_class()`). Handles `ImportError` gracefully with clear error messages.
-**Mode Detection**: `_determine_mode()` checks explicit mode or auto-detects: "advanced" if `settings.has_openai_key`, else "simple". Maps "magentic" → "advanced".
-**Function Signature**: `create_orchestrator(search_handler, judge_handler, config, mode) -> Any`. Simple mode requires handlers. Advanced mode uses MagenticOrchestrator.
-**Error Handling**: Raise `ValueError` with clear messages if requirements not met. Log mode selection with structlog.
----
-## src/orchestrator_hierarchical.py Rules
-**Purpose**: Hierarchical orchestrator using middleware and sub-teams. Adapts Magentic ChatAgent to SubIterationTeam protocol.
-**Pattern**: Uses `SubIterationMiddleware` with `ResearchTeam` and `LLMSubIterationJudge`. Event-driven via callback queue.
-**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated, but kept for compatibility).
-**Event Streaming**: Uses `asyncio.Queue` for event coordination. Yields `AgentEvent` objects. Handles event callback pattern with `asyncio.wait()`.
-**Error Handling**: Log errors with context. Yield error events. Process remaining events after task completion.
----
-## src/orchestrator_magentic.py Rules
-**Purpose**: Magentic-based orchestrator using ChatAgent pattern. Each agent has internal LLM. Manager orchestrates agents.
-**Pattern**: Uses `MagenticBuilder` with participants (searcher, hypothesizer, judge, reporter). Manager uses `OpenAIChatClient`. Workflow built in `_build_workflow()`.
-**Event Processing**: `_process_event()` converts Magentic events to `AgentEvent`. Handles: `MagenticOrchestratorMessageEvent`, `MagenticAgentMessageEvent`, `MagenticFinalResultEvent`, `MagenticAgentDeltaEvent`, `WorkflowOutputEvent`.
-**Text Extraction**: `_extract_text()` defensively extracts text from messages. Priority: `.content` → `.text` → `str(message)`. Handles buggy message objects.
-**State Initialization**: Initialize embedding service with graceful fallback. Use `init_magentic_state()` (deprecated).
-**Requirements**: Must call `check_magentic_requirements()` in `__init__`. Requires `agent-framework-core` and OpenAI API key.
-**Event Types**: Maps agent names to event types: "search" → "search_complete", "judge" → "judge_complete", "hypothes" → "hypothesizing", "report" → "synthesizing".
----
-## src/agent_factory/ - Factory Rules
-**Pattern**: Factory functions for creating agents and handlers. Lazy initialization for optional dependencies. Support OpenAI/Anthropic/HF Inference.
-**Judges**: `create_judge_handler()` creates `JudgeHandler` with structured output (`JudgeAssessment`). Supports `MockJudgeHandler`, `HFInferenceJudgeHandler` as fallbacks.
-**Agents**: Factory functions in `agents.py` for all Pydantic AI agents. Pattern: `create_agent_name(model: Any | None = None) -> AgentName`. Use `get_model()` if model not provided.
-**Graph Builder**: `graph_builder.py` contains utilities for building research graphs. Supports iterative and deep research graph construction.
-**Error Handling**: Raise `ConfigurationError` if required API keys missing. Log agent creation. Handle import errors gracefully.
----
-## src/prompts/ - Prompt Rules
-**Pattern**: System prompts stored as module-level constants. Include date injection: `datetime.now().strftime("%Y-%m-%d")`. Format evidence with truncation (1500 chars per item).
-**Judge Prompts**: In `judge.py`. Handle empty evidence case separately. Always request structured JSON output.
-**Hypothesis Prompts**: In `hypothesis.py`. Use diverse evidence selection (MMR algorithm). Sentence-aware truncation.
-**Report Prompts**: In `report.py`. Include full citation details. Use diverse evidence selection (n=20). Emphasize citation validation rules.
----
-## Testing Rules
-**Structure**: Unit tests in `tests/unit/` (mocked, fast). Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`).
-**Mocking**: Use `respx` for httpx mocking. Use `pytest-mock` for general mocking. Mock LLM calls in unit tests (use `MockJudgeHandler`).
-**Fixtures**: Common fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response`.
-**Coverage**: Aim for >80% coverage. Test error handling, edge cases, and integration paths.
----
-## File-Specific Agent Rules
-**knowledge_gap.py**: Outputs `KnowledgeGapOutput`. System prompt evaluates research completeness. Handles conversation history. Returns fallback on error.
-**writer.py**: Returns markdown string. System prompt includes citation format examples. Validates inputs. Truncates long findings. Retry logic for transient failures.
-**long_writer.py**: Uses `ReportDraft` input/output. Writes sections iteratively. Reformats references (deduplicates, renumbers). Reformats section headings.
-**proofreader.py**: Takes `ReportDraft`, returns polished markdown. Removes duplicates. Adds summary. Preserves references.
-**tool_selector.py**: Outputs `AgentSelectionPlan`. System prompt lists available agents (WebSearchAgent, SiteCrawlerAgent, RAGAgent). Guidelines for when to use each.
-**thinking.py**: Returns observation string. Generates observations from conversation history. Uses query and background context.
-**input_parser.py**: Outputs `ParsedQuery`. Detects research mode (iterative/deep). Extracts entities and research questions. Improves/refines query.

LICENSE.md DELETED Viewed

@@ -1,25 +0,0 @@
-# License
-DeepCritical is licensed under the MIT License.
-## MIT License
-Copyright (c) 2024 DeepCritical Team
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-The above copyright notice and this permission notice shall be included in all
-copies or substantial portions of the Software.
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.

Makefile ADDED Viewed

	@@ -0,0 +1,51 @@

+.PHONY: install test lint format typecheck check clean all cov cov-html
+# Default target
+all: check
+install:
+	uv sync --all-extras
+	uv run pre-commit install
+test:
+	uv run pytest tests/unit/ -v -m "not openai" -p no:logfire
+test-hf:
+	uv run pytest tests/ -v -m "huggingface" -p no:logfire
+test-all:
+	uv run pytest tests/ -v -p no:logfire
+# Coverage aliases
+cov: test-cov
+test-cov:
+	uv run pytest --cov=src --cov-report=term-missing -m "not openai" -p no:logfire
+cov-html:
+	uv run pytest --cov=src --cov-report=html -p no:logfire
+	@echo "Coverage report: open htmlcov/index.html"
+lint:
+	uv run ruff check src tests
+format:
+	uv run ruff format src tests
+typecheck:
+	uv run mypy src
+check: lint typecheck test-cov
+	@echo "All checks passed!"
+docs-build:
+	uv run mkdocs build
+docs-serve:
+	uv run mkdocs serve
+docs-clean:
+	rm -rf site/
+clean:
+	rm -rf .pytest_cache .mypy_cache .ruff_cache __pycache__ .coverage htmlcov
+	find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: The DETERMINATOR
 emoji: 🐉
 colorFrom: red
 colorTo: yellow
@@ -10,54 +10,114 @@ app_file: src/app.py
 hf_oauth: true
 hf_oauth_expiration_minutes: 480
 hf_oauth_scopes:
-  # Required for HuggingFace Inference API (includes all third-party providers)
-  # This scope grants access to:
-  # - HuggingFace's own Inference API
-  # - Third-party inference providers (nebius, together, scaleway, hyperbolic, novita, nscale, sambanova, ovh, fireworks, etc.)
-  # - All models available through the Inference Providers API
-  - inference-api
-  # Optional: Uncomment if you need to access user's billing information
-  # - read-billing
 pinned: true
 license: mit
 tags:
   - mcp-in-action-track-enterprise
   - mcp-hackathon
-  - deep-research
   - biomedical-ai
   - pydantic-ai
   - llamaindex
   - modal
-  - building-mcp-track-enterprise
-  - building-mcp-track-consumer
-  - mcp-in-action-track-enterprise
-  - mcp-in-action-track-consumer
-  - building-mcp-track-modal
-  - building-mcp-track-blaxel
-  - building-mcp-track-llama-index
-  - building-mcp-track-HUGGINGFACE
 ---
 > [!IMPORTANT]
 > **You are reading the Gradio Demo README!**
 >
-> - 📚 **Documentation**: See our [technical documentation](https://deepcritical.github.io/GradioDemo/) for detailed information
-> - 📖 **Complete README**: Check out the [Github README](.github/README.md) for setup, configuration, and contribution guidelines
-> - ⚠️**This README is for our Gradio Demo Only !**
 <div align="center">
-[![GitHub](https://img.shields.io/github/stars/DeepCritical/GradioDemo?style=for-the-badge&logo=github&logoColor=white&label=GitHub&labelColor=181717&color=181717)](https://github.com/DeepCritical/GradioDemo)
-[![Documentation](https://img.shields.io/badge/Docs-0080FF?style=for-the-badge&logo=readthedocs&logoColor=white&labelColor=0080FF&color=0080FF)](deepcritical.github.io/GradioDemo/)
-[![Demo](https://img.shields.io/badge/Demo-FFD21E?style=for-the-badge&logo=huggingface&logoColor=white&labelColor=FFD21E&color=FFD21E)](https://huggingface.co/spaces/DataQuests/DeepCritical)
 [![codecov](https://codecov.io/gh/DeepCritical/GradioDemo/graph/badge.svg?token=B1f05RCGpz)](https://codecov.io/gh/DeepCritical/GradioDemo)
 [![Join us on Discord](https://img.shields.io/discord/1109943800132010065?label=Discord&logo=discord&style=flat-square)](https://discord.gg/qdfnvSPcqP)
 </div>
-# The DETERMINATOR
 ## About
-The DETERMINATOR is a powerful generalist deep research agent system that stops at nothing until finding precise answers to complex questions. It uses iterative search-and-judge loops to comprehensively investigate any research question from any domain.

 ---
+title: Critical Deep Resarch
 emoji: 🐉
 colorFrom: red
 colorTo: yellow
 hf_oauth: true
 hf_oauth_expiration_minutes: 480
 hf_oauth_scopes:
+ - inference-api
 pinned: true
 license: mit
 tags:
   - mcp-in-action-track-enterprise
   - mcp-hackathon
+  - drug-repurposing
   - biomedical-ai
   - pydantic-ai
   - llamaindex
   - modal
 ---
 > [!IMPORTANT]
 > **You are reading the Gradio Demo README!**
 >
+> - 📚 **Documentation**: See our [technical documentation](deepcritical.github.io/GradioDemo/) for detailed information
+> - 📖 **Complete README**: Check out the [full README](.github/README.md) for setup, configuration, and contribution guidelines
+> - 🏆 **Hackathon Submission**: Keep reading below for more information about our MCP Hackathon submission
 <div align="center">
+[![GitHub](https://img.shields.io/github/stars/DeepCritical/GradioDemo?style=for-the-badge&logo=github&logoColor=white&label=🐙%20GitHub&labelColor=181717&color=181717)](https://github.com/DeepCritical/GradioDemo)
+[![Documentation](https://img.shields.io/badge/📚%20Docs-0080FF?style=for-the-badge&logo=readthedocs&logoColor=white&labelColor=0080FF&color=0080FF)](deepcritical.github.io/GradioDemo/)
+[![Demo](https://img.shields.io/badge/🚀%20Demo-FFD21E?style=for-the-badge&logo=huggingface&logoColor=white&labelColor=FFD21E&color=FFD21E)](https://huggingface.co/spaces/DataQuests/DeepCritical)
 [![codecov](https://codecov.io/gh/DeepCritical/GradioDemo/graph/badge.svg?token=B1f05RCGpz)](https://codecov.io/gh/DeepCritical/GradioDemo)
 [![Join us on Discord](https://img.shields.io/discord/1109943800132010065?label=Discord&logo=discord&style=flat-square)](https://discord.gg/qdfnvSPcqP)
 </div>
+# DeepCritical
 ## About
+The [Deep Critical Gradio Hackathon Team](### Team) met online in the Alzheimer's Critical Literature Review Group in the Hugging Science initiative. We're building the agent framework we want to use for ai assisted research to [turn the vast amounts of clinical data into cures](https://github.com/DeepCritical/GradioDemo).
+For this hackathon we're proposing a simple yet powerful Deep Research Agent that iteratively looks for the answer until it finds it using general purpose websearch and special purpose retrievers for technical retrievers.
+## Deep Critical In the Medial
+- Social Medial Posts about Deep Critical :
+  -
+  -
+  -
+  -
+  -
+  -
+  -
+## Important information
+- **[readme](.github\README.md)**: configure, deploy , contribute and learn more here.
+- **[docs](deepcritical.github.io/GradioDemo/)**: want to know how all this works ? read our detailed technical documentation here.
+- **[demo](https://huggingface/spaces/DataQuests/DeepCritical)**: Try our demo on huggingface
+- **[team](### Team)**: Join us , or follow us !
+- **[video]**: See our demo video
+## Future Developments
+- [] Apply Deep Research Systems To Generate Short Form Video (up to 5 minutes)
+- [] Visualize Pydantic Graphs as Loading Screens in the UI
+- [] Improve Data Science with more Complex Graph Agents
+- [] Create Deep Critical Drug Reporposing / Discovery Demo
+- [] Create Deep Critical Literal Review
+- [] Create Deep Critical Hypothesis Generator
+- [] Create PyPi Package
+## Completed
+- [] **Multi-Source Search**: PubMed, ClinicalTrials.gov, bioRxiv/medRxiv
+- [] **MCP Integration**: Use our tools from Claude Desktop or any MCP client
+- [] **HuggingFace OAuth**: Sign in with HuggingFace
+- [] **Modal Sandbox**: Secure execution of AI-generated statistical code
+- [] **LlamaIndex RAG**: Semantic search and evidence synthesis
+- [] **HuggingfaceInference**:
+- [] **HuggingfaceMCP Custom Config To Use Community Tools**:
+- [] **Strongly Typed Composable Graphs**:
+- [] **Specialized Research Teams of Agents**:
+### Team
+- ZJ
+- MarioAderman
+- Josephrp
+## Acknowledgements
+- McSwaggins
+- Magentic
+- Huggingface
+- Gradio
+- DeepCritical
+- Sponsors
+- Microsoft
+- Pydantic
+- Llama-index
+- Anthhropic/MCP
+- List of Tools Makers
+## Links
+[![GitHub](https://img.shields.io/github/stars/DeepCritical/GradioDemo?style=for-the-badge&logo=github&logoColor=white&label=🐙%20GitHub&labelColor=181717&color=181717)](https://github.com/DeepCritical/GradioDemo)
+[![Documentation](https://img.shields.io/badge/📚%20Docs-0080FF?style=for-the-badge&logo=readthedocs&logoColor=white&labelColor=0080FF&color=0080FF)](deepcritical.github.io/GradioDemo/)
+[![Demo](https://img.shields.io/badge/🚀%20Demo-FFD21E?style=for-the-badge&logo=huggingface&logoColor=white&labelColor=FFD21E&color=FFD21E)](https://huggingface.co/spaces/DataQuests/DeepCritical)
+[![codecov](https://codecov.io/gh/DeepCritical/GradioDemo/graph/badge.svg?token=B1f05RCGpz)](https://codecov.io/gh/DeepCritical/GradioDemo)
+[![Join us on Discord](https://img.shields.io/discord/1109943800132010065?label=Discord&logo=discord&style=flat-square)](https://discord.gg/qdfnvSPcqP)

deployments/README.md DELETED Viewed

@@ -1,46 +0,0 @@
-# Deployments
-This directory contains infrastructure deployment scripts for DeepCritical services.
-## Modal Deployments
-### TTS Service (`modal_tts.py`)
-Deploys the Kokoro TTS (Text-to-Speech) function to Modal's GPU infrastructure.
-**Deploy:**
-```bash
-modal deploy deployments/modal_tts.py
-```
-**Features:**
-- Kokoro 82M TTS model
-- GPU-accelerated (T4)
-- Voice options: af_heart, af_bella, am_michael, etc.
-- Configurable speech speed
-**Requirements:**
-- Modal account and credentials (`MODAL_TOKEN_ID`, `MODAL_TOKEN_SECRET` in `.env`)
-- GPU quota on Modal
-**After Deployment:**
-The function will be available at:
-- App: `deepcritical-tts`
-- Function: `kokoro_tts_function`
-The main application (`src/services/tts_modal.py`) will call this deployed function.
----
-## Adding New Deployments
-When adding new deployment scripts:
-1. Create a new file: `deployments/<service_name>.py`
-2. Use Modal's app pattern:
-   ```python
-   import modal
-   app = modal.App("deepcritical-<service-name>")
-   ```
-3. Document in this README
-4. Test deployment: `modal deploy deployments/<service_name>.py`

deployments/modal_tts.py DELETED Viewed

@@ -1,97 +0,0 @@
-"""Deploy Kokoro TTS function to Modal.
-This script deploys the TTS function to Modal so it can be called
-from the main DeepCritical application.
-Usage:
-    modal deploy deploy_modal_tts.py
-After deployment, the function will be available at:
-    App: deepcritical-tts
-    Function: kokoro_tts_function
-"""
-import modal
-import numpy as np
-# Create Modal app
-app = modal.App("deepcritical-tts")
-# Define Kokoro TTS dependencies
-KOKORO_DEPENDENCIES = [
-    "torch>=2.0.0",
-    "transformers>=4.30.0",
-    "numpy<2.0",
-]
-# Create Modal image with Kokoro
-tts_image = (
-    modal.Image.debian_slim(python_version="3.11")
-    .apt_install("git")  # Install git first for pip install from github
-    .pip_install(*KOKORO_DEPENDENCIES)
-    .pip_install("git+https://github.com/hexgrad/kokoro.git")
-)
-@app.function(
-    image=tts_image,
-    gpu="T4",
-    timeout=60,
-)
-def kokoro_tts_function(text: str, voice: str, speed: float) -> tuple[int, np.ndarray]:
-    """Modal GPU function for Kokoro TTS.
-    This function runs on Modal's GPU infrastructure.
-    Based on: https://huggingface.co/spaces/hexgrad/Kokoro-TTS
-    Args:
-        text: Text to synthesize
-        voice: Voice ID (e.g., af_heart, af_bella, am_michael)
-        speed: Speech speed multiplier (0.5-2.0)
-    Returns:
-        Tuple of (sample_rate, audio_array)
-    """
-    import numpy as np
-    try:
-        import torch
-        from kokoro import KModel, KPipeline
-        # Initialize model (cached on GPU)
-        model = KModel().to("cuda").eval()
-        pipeline = KPipeline(lang_code=voice[0])
-        pack = pipeline.load_voice(voice)
-        # Generate audio - accumulate all chunks
-        audio_chunks = []
-        for _, ps, _ in pipeline(text, voice, speed):
-            ref_s = pack[len(ps) - 1]
-            audio = model(ps, ref_s, speed)
-            audio_chunks.append(audio.numpy())
-        # Concatenate all audio chunks
-        if audio_chunks:
-            full_audio = np.concatenate(audio_chunks)
-            return (24000, full_audio)
-        # If no audio generated, return empty
-        return (24000, np.zeros(1, dtype=np.float32))
-    except ImportError as e:
-        raise RuntimeError(
-            f"Kokoro not installed: {e}. "
-            "Install with: pip install git+https://github.com/hexgrad/kokoro.git"
-        ) from e
-    except Exception as e:
-        raise RuntimeError(f"TTS synthesis failed: {e}") from e
-# Optional: Add a test entrypoint
-@app.local_entrypoint()
-def test():
-    """Test the TTS function."""
-    print("Testing Modal TTS function...")
-    sample_rate, audio = kokoro_tts_function.remote("Hello, this is a test.", "af_heart", 1.0)
-    print(f"Generated audio: {sample_rate}Hz, shape={audio.shape}")
-    print("✓ TTS function works!")

dev/Makefile ADDED Viewed

	@@ -0,0 +1,51 @@

+.PHONY: install test lint format typecheck check clean all cov cov-html
+# Default target
+all: check
+install:
+	uv sync --all-extras
+	uv run pre-commit install
+test:
+	uv run pytest tests/unit/ -v -m "not openai" -p no:logfire
+test-hf:
+	uv run pytest tests/ -v -m "huggingface" -p no:logfire
+test-all:
+	uv run pytest tests/ -v -p no:logfire
+# Coverage aliases
+cov: test-cov
+test-cov:
+	uv run pytest --cov=src --cov-report=term-missing -m "not openai" -p no:logfire
+cov-html:
+	uv run pytest --cov=src --cov-report=html -p no:logfire
+	@echo "Coverage report: open htmlcov/index.html"
+lint:
+	uv run ruff check src tests
+format:
+	uv run ruff format src tests
+typecheck:
+	uv run mypy src
+check: lint typecheck test-cov
+	@echo "All checks passed!"
+docs-build:
+	uv run mkdocs build
+docs-serve:
+	uv run mkdocs serve
+docs-clean:
+	rm -rf site/
+clean:
+	rm -rf .pytest_cache .mypy_cache .ruff_cache __pycache__ .coverage htmlcov
+	find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true

docs/api/agents.md CHANGED Viewed

@@ -12,19 +12,27 @@ This page documents the API for DeepCritical agents.
 #### `evaluate`
-<!--codeinclude-->
-[KnowledgeGapAgent.evaluate](../src/agents/knowledge_gap.py) start_line:66 end_line:74
-<!--/codeinclude-->
 Evaluates research completeness and identifies outstanding knowledge gaps.
 **Parameters**:
 - `query`: Research query string
-- `background_context`: Background context for the query (default: "")
-- `conversation_history`: History of actions, findings, and thoughts as string (default: "")
-- `iteration`: Current iteration number (default: 0)
-- `time_elapsed_minutes`: Elapsed time in minutes (default: 0.0)
-- `max_time_minutes`: Maximum time limit in minutes (default: 10)
 **Returns**: `KnowledgeGapOutput` with:
 - `research_complete`: Boolean indicating if research is complete
@@ -40,17 +48,21 @@ Evaluates research completeness and identifies outstanding knowledge gaps.
 #### `select_tools`
-<!--codeinclude-->
-[ToolSelectorAgent.select_tools](../src/agents/tool_selector.py) start_line:78 end_line:84
-<!--/codeinclude-->
-Selects tools for addressing a knowledge gap.
 **Parameters**:
-- `gap`: The knowledge gap to address
 - `query`: Research query string
-- `background_context`: Optional background context (default: "")
-- `conversation_history`: History of actions, findings, and thoughts as string (default: "")
 **Returns**: `AgentSelectionPlan` with list of `AgentTask` objects.
@@ -64,17 +76,23 @@ Selects tools for addressing a knowledge gap.
 #### `write_report`
-<!--codeinclude-->
-[WriterAgent.write_report](../src/agents/writer.py) start_line:67 end_line:73
-<!--/codeinclude-->
 Generates a markdown report from research findings.
 **Parameters**:
 - `query`: Research query string
 - `findings`: Research findings to include in report
-- `output_length`: Optional description of desired output length (default: "")
-- `output_instructions`: Optional additional instructions for report generation (default: "")
 **Returns**: Markdown string with numbered citations.
@@ -88,25 +106,36 @@ Generates a markdown report from research findings.
 #### `write_next_section`
-<!--codeinclude-->
-[LongWriterAgent.write_next_section](../src/agents/long_writer.py) start_line:94 end_line:100
-<!--/codeinclude-->
 Writes the next section of a long-form report.
 **Parameters**:
-- `original_query`: The original research query
-- `report_draft`: Current report draft as string (all sections written so far)
-- `next_section_title`: Title of the section to write
-- `next_section_draft`: Draft content for the next section
-**Returns**: `LongWriterOutput` with formatted section and references.
 #### `write_report`
-<!--codeinclude-->
-[LongWriterAgent.write_report](../src/agents/long_writer.py) start_line:263 end_line:268
-<!--/codeinclude-->
 Generates final report from draft.
@@ -127,9 +156,14 @@ Generates final report from draft.
 #### `proofread`
-<!--codeinclude-->
-[ProofreaderAgent.proofread](../src/agents/proofreader.py) start_line:72 end_line:76
-<!--/codeinclude-->
 Proofreads and polishes a report draft.
@@ -150,17 +184,21 @@ Proofreads and polishes a report draft.
 #### `generate_observations`
-<!--codeinclude-->
-[ThinkingAgent.generate_observations](../src/agents/thinking.py) start_line:70 end_line:76
-<!--/codeinclude-->
 Generates observations from conversation history.
 **Parameters**:
 - `query`: Research query string
-- `background_context`: Optional background context (default: "")
-- `conversation_history`: History of actions, findings, and thoughts as string (default: "")
-- `iteration`: Current iteration number (default: 1)
 **Returns**: Observation string.
@@ -172,11 +210,14 @@ Generates observations from conversation history.
 ### Methods
-#### `parse`
-<!--codeinclude-->
-[InputParserAgent.parse](../src/agents/input_parser.py) start_line:82 end_line:82
-<!--/codeinclude-->
 Parses and improves a user query.
@@ -194,13 +235,18 @@ Parses and improves a user query.
 All agents have factory functions in `src.agent_factory.agents`:
-<!--codeinclude-->
-[Factory Functions](../src/agent_factory/agents.py) start_line:30 end_line:50
-<!--/codeinclude-->
 **Parameters**:
 - `model`: Optional Pydantic AI model. If None, uses `get_model()` from settings.
-- `oauth_token`: Optional OAuth token from HuggingFace login (takes priority over env vars)
 **Returns**: Agent instance.
@@ -209,3 +255,12 @@ All agents have factory functions in `src.agent_factory.agents`:
 - [Architecture - Agents](../architecture/agents.md) - Architecture overview
 - [Models API](models.md) - Data models used by agents

 #### `evaluate`
+```python
+async def evaluate(
+    self,
+    query: str,
+    background_context: str,
+    conversation_history: Conversation,
+    iteration: int,
+    time_elapsed_minutes: float,
+    max_time_minutes: float
+) -> KnowledgeGapOutput
+```
 Evaluates research completeness and identifies outstanding knowledge gaps.
 **Parameters**:
 - `query`: Research query string
+- `background_context`: Background context for the query
+- `conversation_history`: Conversation history with previous iterations
+- `iteration`: Current iteration number
+- `time_elapsed_minutes`: Elapsed time in minutes
+- `max_time_minutes`: Maximum time limit in minutes
 **Returns**: `KnowledgeGapOutput` with:
 - `research_complete`: Boolean indicating if research is complete
 #### `select_tools`
+```python
+async def select_tools(
+    self,
+    query: str,
+    knowledge_gaps: list[str],
+    available_tools: list[str]
+) -> AgentSelectionPlan
+```
+Selects tools for addressing knowledge gaps.
 **Parameters**:
 - `query`: Research query string
+- `knowledge_gaps`: List of knowledge gaps to address
+- `available_tools`: List of available tool names
 **Returns**: `AgentSelectionPlan` with list of `AgentTask` objects.
 #### `write_report`
+```python
+async def write_report(
+    self,
+    query: str,
+    findings: str,
+    output_length: str = "medium",
+    output_instructions: str | None = None
+) -> str
+```
 Generates a markdown report from research findings.
 **Parameters**:
 - `query`: Research query string
 - `findings`: Research findings to include in report
+- `output_length`: Desired output length ("short", "medium", "long")
+- `output_instructions`: Additional instructions for report generation
 **Returns**: Markdown string with numbered citations.
 #### `write_next_section`
+```python
+async def write_next_section(
+    self,
+    query: str,
+    draft: ReportDraft,
+    section_title: str,
+    section_content: str
+) -> LongWriterOutput
+```
 Writes the next section of a long-form report.
 **Parameters**:
+- `query`: Research query string
+- `draft`: Current report draft
+- `section_title`: Title of the section to write
+- `section_content`: Content/guidance for the section
+**Returns**: `LongWriterOutput` with updated draft.
 #### `write_report`
+```python
+async def write_report(
+    self,
+    query: str,
+    report_title: str,
+    report_draft: ReportDraft
+) -> str
+```
 Generates final report from draft.
 #### `proofread`
+```python
+async def proofread(
+    self,
+    query: str,
+    report_title: str,
+    report_draft: ReportDraft
+) -> str
+```
 Proofreads and polishes a report draft.
 #### `generate_observations`
+```python
+async def generate_observations(
+    self,
+    query: str,
+    background_context: str,
+    conversation_history: Conversation
+) -> str
+```
 Generates observations from conversation history.
 **Parameters**:
 - `query`: Research query string
+- `background_context`: Background context
+- `conversation_history`: Conversation history
 **Returns**: Observation string.
 ### Methods
+#### `parse_query`
+```python
+async def parse_query(
+    self,
+    query: str
+) -> ParsedQuery
+```
 Parses and improves a user query.
 All agents have factory functions in `src.agent_factory.agents`:
+```python
+def create_knowledge_gap_agent(model: Any | None = None) -> KnowledgeGapAgent
+def create_tool_selector_agent(model: Any | None = None) -> ToolSelectorAgent
+def create_writer_agent(model: Any | None = None) -> WriterAgent
+def create_long_writer_agent(model: Any | None = None) -> LongWriterAgent
+def create_proofreader_agent(model: Any | None = None) -> ProofreaderAgent
+def create_thinking_agent(model: Any | None = None) -> ThinkingAgent
+def create_input_parser_agent(model: Any | None = None) -> InputParserAgent
+```
 **Parameters**:
 - `model`: Optional Pydantic AI model. If None, uses `get_model()` from settings.
 **Returns**: Agent instance.
 - [Architecture - Agents](../architecture/agents.md) - Architecture overview
 - [Models API](models.md) - Data models used by agents

docs/api/models.md CHANGED Viewed

@@ -8,14 +8,18 @@ This page documents the Pydantic models used throughout DeepCritical.
 **Purpose**: Represents evidence from search results.
-<!--codeinclude-->
-[Evidence Model](../src/utils/models.py) start_line:33 end_line:44
-<!--/codeinclude-->
 **Fields**:
 - `citation`: Citation information (title, URL, date, authors)
 - `content`: Evidence text content
-- `relevance`: Relevance score (0.0-1.0)
 - `metadata`: Additional metadata dictionary
 ## Citation
@@ -24,15 +28,18 @@ This page documents the Pydantic models used throughout DeepCritical.
 **Purpose**: Citation information for evidence.
-<!--codeinclude-->
-[Citation Model](../src/utils/models.py) start_line:12 end_line:30
-<!--/codeinclude-->
 **Fields**:
-- `source`: Source name (e.g., "pubmed", "clinicaltrials", "europepmc", "web", "rag")
 - `title`: Article/trial title
 - `url`: Source URL
-- `date`: Publication date (YYYY-MM-DD or "Unknown")
 - `authors`: List of authors (optional)
 ## KnowledgeGapOutput
@@ -41,9 +48,11 @@ This page documents the Pydantic models used throughout DeepCritical.
 **Purpose**: Output from knowledge gap evaluation.
-<!--codeinclude-->
-[KnowledgeGapOutput Model](../src/utils/models.py) start_line:494 end_line:504
-<!--/codeinclude-->
 **Fields**:
 - `research_complete`: Boolean indicating if research is complete
@@ -55,9 +64,10 @@ This page documents the Pydantic models used throughout DeepCritical.
 **Purpose**: Plan for tool/agent selection.
-<!--codeinclude-->
-[AgentSelectionPlan Model](../src/utils/models.py) start_line:521 end_line:526
-<!--/codeinclude-->
 **Fields**:
 - `tasks`: List of agent tasks to execute
@@ -68,15 +78,17 @@ This page documents the Pydantic models used throughout DeepCritical.
 **Purpose**: Individual agent task.
-<!--codeinclude-->
-[AgentTask Model](../src/utils/models.py) start_line:507 end_line:518
-<!--/codeinclude-->
 **Fields**:
-- `gap`: The knowledge gap being addressed (optional)
-- `agent`: Name of agent to use
-- `query`: The specific query for the agent
-- `entity_website`: The website of the entity being researched, if known (optional)
 ## ReportDraft
@@ -84,12 +96,17 @@ This page documents the Pydantic models used throughout DeepCritical.
 **Purpose**: Draft structure for long-form reports.
-<!--codeinclude-->
-[ReportDraft Model](../src/utils/models.py) start_line:538 end_line:545
-<!--/codeinclude-->
 **Fields**:
 - `sections`: List of report sections
 ## ReportSection
@@ -97,13 +114,17 @@ This page documents the Pydantic models used throughout DeepCritical.
 **Purpose**: Individual section in a report draft.
-<!--codeinclude-->
-[ReportDraftSection Model](../src/utils/models.py) start_line:529 end_line:535
-<!--/codeinclude-->
 **Fields**:
-- `section_title`: The title of the section
-- `section_content`: The content of the section
 ## ParsedQuery
@@ -111,9 +132,14 @@ This page documents the Pydantic models used throughout DeepCritical.
 **Purpose**: Parsed and improved query.
-<!--codeinclude-->
-[ParsedQuery Model](../src/utils/models.py) start_line:557 end_line:572
-<!--/codeinclude-->
 **Fields**:
 - `original_query`: Original query string
@@ -128,12 +154,13 @@ This page documents the Pydantic models used throughout DeepCritical.
 **Purpose**: Conversation history with iterations.
-<!--codeinclude-->
-[Conversation Model](../src/utils/models.py) start_line:331 end_line:337
-<!--/codeinclude-->
 **Fields**:
-- `history`: List of iteration data
 ## IterationData
@@ -141,15 +168,23 @@ This page documents the Pydantic models used throughout DeepCritical.
 **Purpose**: Data for a single iteration.
-<!--codeinclude-->
-[IterationData Model](../src/utils/models.py) start_line:315 end_line:328
-<!--/codeinclude-->
 **Fields**:
-- `gap`: The gap addressed in the iteration
-- `tool_calls`: The tool calls made
-- `findings`: The findings collected from tool calls
-- `thought`: The thinking done to reflect on the success of the iteration and next steps
 ## AgentEvent
@@ -157,9 +192,12 @@ This page documents the Pydantic models used throughout DeepCritical.
 **Purpose**: Event emitted during research execution.
-<!--codeinclude-->
-[AgentEvent Model](../src/utils/models.py) start_line:104 end_line:125
-<!--/codeinclude-->
 **Fields**:
 - `type`: Event type (e.g., "started", "search_complete", "complete")
@@ -172,20 +210,35 @@ This page documents the Pydantic models used throughout DeepCritical.
 **Purpose**: Current budget status.
-<!--codeinclude-->
-[BudgetStatus Model](../src/middleware/budget_tracker.py) start_line:15 end_line:25
-<!--/codeinclude-->
 **Fields**:
-- `tokens_used`: Total tokens used
-- `tokens_limit`: Token budget limit
-- `time_elapsed_seconds`: Time elapsed in seconds
-- `time_limit_seconds`: Time budget limit (default: 600.0 seconds / 10 minutes)
-- `iterations`: Number of iterations completed
-- `iterations_limit`: Maximum iterations (default: 10)
-- `iteration_tokens`: Tokens used per iteration (iteration number -> token count)
 ## See Also
 - [Architecture - Agents](../architecture/agents.md) - How models are used
 - [Configuration](../configuration/index.md) - Model configuration

 **Purpose**: Represents evidence from search results.
+```python
+class Evidence(BaseModel):
+    citation: Citation
+    content: str
+    relevance_score: float = Field(ge=0.0, le=1.0)
+    metadata: dict[str, Any] = Field(default_factory=dict)
+```
 **Fields**:
 - `citation`: Citation information (title, URL, date, authors)
 - `content`: Evidence text content
+- `relevance_score`: Relevance score (0.0-1.0)
 - `metadata`: Additional metadata dictionary
 ## Citation
 **Purpose**: Citation information for evidence.
+```python
+class Citation(BaseModel):
+    title: str
+    url: str
+    date: str | None = None
+    authors: list[str] = Field(default_factory=list)
+```
 **Fields**:
 - `title`: Article/trial title
 - `url`: Source URL
+- `date`: Publication date (optional)
 - `authors`: List of authors (optional)
 ## KnowledgeGapOutput
 **Purpose**: Output from knowledge gap evaluation.
+```python
+class KnowledgeGapOutput(BaseModel):
+    research_complete: bool
+    outstanding_gaps: list[str] = Field(default_factory=list)
+```
 **Fields**:
 - `research_complete`: Boolean indicating if research is complete
 **Purpose**: Plan for tool/agent selection.
+```python
+class AgentSelectionPlan(BaseModel):
+    tasks: list[AgentTask] = Field(default_factory=list)
+```
 **Fields**:
 - `tasks`: List of agent tasks to execute
 **Purpose**: Individual agent task.
+```python
+class AgentTask(BaseModel):
+    agent_name: str
+    query: str
+    context: dict[str, Any] = Field(default_factory=dict)
+```
 **Fields**:
+- `agent_name`: Name of agent to use
+- `query`: Task query
+- `context`: Additional context dictionary
 ## ReportDraft
 **Purpose**: Draft structure for long-form reports.
+```python
+class ReportDraft(BaseModel):
+    title: str
+    sections: list[ReportSection] = Field(default_factory=list)
+    references: list[Citation] = Field(default_factory=list)
+```
 **Fields**:
+- `title`: Report title
 - `sections`: List of report sections
+- `references`: List of citations
 ## ReportSection
 **Purpose**: Individual section in a report draft.
+```python
+class ReportSection(BaseModel):
+    title: str
+    content: str
+    order: int
+```
 **Fields**:
+- `title`: Section title
+- `content`: Section content
+- `order`: Section order number
 ## ParsedQuery
 **Purpose**: Parsed and improved query.
+```python
+class ParsedQuery(BaseModel):
+    original_query: str
+    improved_query: str
+    research_mode: Literal["iterative", "deep"]
+    key_entities: list[str] = Field(default_factory=list)
+    research_questions: list[str] = Field(default_factory=list)
+```
 **Fields**:
 - `original_query`: Original query string
 **Purpose**: Conversation history with iterations.
+```python
+class Conversation(BaseModel):
+    iterations: list[IterationData] = Field(default_factory=list)
+```
 **Fields**:
+- `iterations`: List of iteration data
 ## IterationData
 **Purpose**: Data for a single iteration.
+```python
+class IterationData(BaseModel):
+    iteration: int
+    observations: str | None = None
+    knowledge_gaps: list[str] = Field(default_factory=list)
+    tool_calls: list[dict[str, Any]] = Field(default_factory=list)
+    findings: str | None = None
+    thoughts: str | None = None
+```
 **Fields**:
+- `iteration`: Iteration number
+- `observations`: Generated observations
+- `knowledge_gaps`: Identified knowledge gaps
+- `tool_calls`: Tool calls made
+- `findings`: Findings from tools
+- `thoughts`: Agent thoughts
 ## AgentEvent
 **Purpose**: Event emitted during research execution.
+```python
+class AgentEvent(BaseModel):
+    type: str
+    iteration: int | None = None
+    data: dict[str, Any] = Field(default_factory=dict)
+```
 **Fields**:
 - `type`: Event type (e.g., "started", "search_complete", "complete")
 **Purpose**: Current budget status.
+```python
+class BudgetStatus(BaseModel):
+    tokens_used: int
+    tokens_limit: int
+    time_elapsed_seconds: float
+    time_limit_seconds: float
+    iterations: int
+    iterations_limit: int
+```
 **Fields**:
+- `tokens_used`: Tokens used so far
+- `tokens_limit`: Token limit
+- `time_elapsed_seconds`: Elapsed time in seconds
+- `time_limit_seconds`: Time limit in seconds
+- `iterations`: Current iteration count
+- `iterations_limit`: Iteration limit
 ## See Also
 - [Architecture - Agents](../architecture/agents.md) - How models are used
 - [Configuration](../configuration/index.md) - Model configuration

docs/api/orchestrators.md CHANGED Viewed

@@ -12,24 +12,33 @@ This page documents the API for DeepCritical orchestrators.
 #### `run`
-<!--codeinclude-->
-[IterativeResearchFlow.run](../src/orchestrator/research_flow.py) start_line:134 end_line:140
-<!--/codeinclude-->
 Runs iterative research flow.
 **Parameters**:
 - `query`: Research query string
 - `background_context`: Background context (default: "")
-- `output_length`: Optional description of desired output length (default: "")
-- `output_instructions`: Optional additional instructions for report generation (default: "")
-- `message_history`: Optional user conversation history in Pydantic AI `ModelMessage` format (default: None)
-**Returns**: Final report string.
-**Note**: The `message_history` parameter enables multi-turn conversations by providing context from previous interactions.
-**Note**: `max_iterations`, `max_time_minutes`, and `token_budget` are constructor parameters, not `run()` parameters.
 ## DeepResearchFlow
@@ -41,21 +50,33 @@ Runs iterative research flow.
 #### `run`
-<!--codeinclude-->
-[DeepResearchFlow.run](../src/orchestrator/research_flow.py) start_line:778 end_line:778
-<!--/codeinclude-->
 Runs deep research flow.
 **Parameters**:
 - `query`: Research query string
-- `message_history`: Optional user conversation history in Pydantic AI `ModelMessage` format (default: None)
-**Returns**: Final report string.
-**Note**: The `message_history` parameter enables multi-turn conversations by providing context from previous interactions.
-**Note**: `max_iterations_per_section`, `max_time_minutes`, and `token_budget` are constructor parameters, not `run()` parameters.
 ## GraphOrchestrator
@@ -67,22 +88,24 @@ Runs deep research flow.
 #### `run`
-<!--codeinclude-->
-[GraphOrchestrator.run](../src/orchestrator/graph_orchestrator.py) start_line:177 end_line:177
-<!--/codeinclude-->
 Runs graph-based research orchestration.
 **Parameters**:
 - `query`: Research query string
-- `message_history`: Optional user conversation history in Pydantic AI `ModelMessage` format (default: None)
 **Yields**: `AgentEvent` objects during graph execution.
-**Note**:
-- `research_mode` and `use_graph` are constructor parameters, not `run()` parameters.
-- The `message_history` parameter enables multi-turn conversations by providing context from previous interactions. Message history is stored in `GraphExecutionContext` and passed to agents during execution.
 ## Orchestrator Factory
 **Module**: `src.orchestrator_factory`
@@ -93,18 +116,22 @@ Runs graph-based research orchestration.
 #### `create_orchestrator`
-<!--codeinclude-->
-[create_orchestrator](../src/orchestrator_factory.py) start_line:44 end_line:50
-<!--/codeinclude-->
 Creates an orchestrator instance.
 **Parameters**:
-- `search_handler`: Search handler protocol implementation (optional, required for simple mode)
-- `judge_handler`: Judge handler protocol implementation (optional, required for simple mode)
-- `config`: Configuration object (optional)
-- `mode`: Orchestrator mode ("simple", "advanced", "magentic", "iterative", "deep", "auto", or None for auto-detect)
-- `oauth_token`: Optional OAuth token from HuggingFace login (takes priority over env vars)
 **Returns**: Orchestrator instance.
@@ -126,19 +153,24 @@ Creates an orchestrator instance.
 #### `run`
-<!--codeinclude-->
-[MagenticOrchestrator.run](../src/orchestrator_magentic.py) start_line:101 end_line:101
-<!--/codeinclude-->
 Runs Magentic orchestration.
 **Parameters**:
 - `query`: Research query string
 **Yields**: `AgentEvent` objects converted from Magentic events.
-**Note**: `max_rounds` and `max_stalls` are constructor parameters, not `run()` parameters.
 **Requirements**:
 - `agent-framework-core` package
 - OpenAI API key
@@ -146,4 +178,14 @@ Runs Magentic orchestration.
 ## See Also
 - [Architecture - Orchestrators](../architecture/orchestrators.md) - Architecture overview
-- [Graph Orchestration](../architecture/graph_orchestration.md) - Graph execution details

 #### `run`
+```python
+async def run(
+    self,
+    query: str,
+    background_context: str = "",
+    max_iterations: int | None = None,
+    max_time_minutes: float | None = None,
+    token_budget: int | None = None
+) -> AsyncGenerator[AgentEvent, None]
+```
 Runs iterative research flow.
 **Parameters**:
 - `query`: Research query string
 - `background_context`: Background context (default: "")
+- `max_iterations`: Maximum iterations (default: from settings)
+- `max_time_minutes`: Maximum time in minutes (default: from settings)
+- `token_budget`: Token budget (default: from settings)
+**Yields**: `AgentEvent` objects for:
+- `started`: Research started
+- `search_complete`: Search completed
+- `judge_complete`: Evidence evaluation completed
+- `synthesizing`: Generating report
+- `complete`: Research completed
+- `error`: Error occurred
 ## DeepResearchFlow
 #### `run`
+```python
+async def run(
+    self,
+    query: str,
+    background_context: str = "",
+    max_iterations_per_section: int | None = None,
+    max_time_minutes: float | None = None,
+    token_budget: int | None = None
+) -> AsyncGenerator[AgentEvent, None]
+```
 Runs deep research flow.
 **Parameters**:
 - `query`: Research query string
+- `background_context`: Background context (default: "")
+- `max_iterations_per_section`: Maximum iterations per section (default: from settings)
+- `max_time_minutes`: Maximum time in minutes (default: from settings)
+- `token_budget`: Token budget (default: from settings)
+**Yields**: `AgentEvent` objects for:
+- `started`: Research started
+- `planning`: Creating research plan
+- `looping`: Running parallel research loops
+- `synthesizing`: Synthesizing results
+- `complete`: Research completed
+- `error`: Error occurred
 ## GraphOrchestrator
 #### `run`
+```python
+async def run(
+    self,
+    query: str,
+    research_mode: str = "auto",
+    use_graph: bool = True
+) -> AsyncGenerator[AgentEvent, None]
+```
 Runs graph-based research orchestration.
 **Parameters**:
 - `query`: Research query string
+- `research_mode`: Research mode ("iterative", "deep", or "auto")
+- `use_graph`: Whether to use graph execution (default: True)
 **Yields**: `AgentEvent` objects during graph execution.
 ## Orchestrator Factory
 **Module**: `src.orchestrator_factory`
 #### `create_orchestrator`
+```python
+def create_orchestrator(
+    search_handler: SearchHandlerProtocol,
+    judge_handler: JudgeHandlerProtocol,
+    config: dict[str, Any],
+    mode: str | None = None
+) -> Any
+```
 Creates an orchestrator instance.
 **Parameters**:
+- `search_handler`: Search handler protocol implementation
+- `judge_handler`: Judge handler protocol implementation
+- `config`: Configuration dictionary
+- `mode`: Orchestrator mode ("simple", "advanced", "magentic", or None for auto-detect)
 **Returns**: Orchestrator instance.
 #### `run`
+```python
+async def run(
+    self,
+    query: str,
+    max_rounds: int = 15,
+    max_stalls: int = 3
+) -> AsyncGenerator[AgentEvent, None]
+```
 Runs Magentic orchestration.
 **Parameters**:
 - `query`: Research query string
+- `max_rounds`: Maximum rounds (default: 15)
+- `max_stalls`: Maximum stalls before reset (default: 3)
 **Yields**: `AgentEvent` objects converted from Magentic events.
 **Requirements**:
 - `agent-framework-core` package
 - OpenAI API key
 ## See Also
 - [Architecture - Orchestrators](../architecture/orchestrators.md) - Architecture overview
+- [Graph Orchestration](../architecture/graph-orchestration.md) - Graph execution details

docs/api/services.md CHANGED Viewed

@@ -12,9 +12,9 @@ This page documents the API for DeepCritical services.
 #### `embed`
-<!--codeinclude-->
-[EmbeddingService.embed](../src/services/embeddings.py) start_line:55 end_line:55
-<!--/codeinclude-->
 Generates embedding for a text string.
@@ -68,60 +68,6 @@ Finds duplicate texts based on similarity threshold.
 **Returns**: List of (index1, index2) tuples for duplicate pairs.
-#### `add_evidence`
-```python
-async def add_evidence(
-    self,
-    evidence_id: str,
-    content: str,
-    metadata: dict[str, Any]
-) -> None
-```
-Adds evidence to vector store for semantic search.
-**Parameters**:
-- `evidence_id`: Unique identifier for the evidence
-- `content`: Evidence text content
-- `metadata`: Additional metadata dictionary
-#### `search_similar`
-```python
-async def search_similar(
-    self,
-    query: str,
-    n_results: int = 5
-) -> list[dict[str, Any]]
-```
-Finds semantically similar evidence.
-**Parameters**:
-- `query`: Search query string
-- `n_results`: Number of results to return (default: 5)
-**Returns**: List of dictionaries with `id`, `content`, `metadata`, and `distance` keys.
-#### `deduplicate`
-```python
-async def deduplicate(
-    self,
-    new_evidence: list[Evidence],
-    threshold: float = 0.9
-) -> list[Evidence]
-```
-Removes semantically duplicate evidence.
-**Parameters**:
-- `new_evidence`: List of evidence items to deduplicate
-- `threshold`: Similarity threshold (default: 0.9, where 0.9 = 90% similar is duplicate)
-**Returns**: List of unique evidence items (not already in vector store).
 ### Factory Function
 #### `get_embedding_service`
@@ -143,97 +89,63 @@ Returns singleton EmbeddingService instance.
 #### `ingest_evidence`
-<!--codeinclude-->
-[LlamaIndexRAGService.ingest_evidence](../src/services/llamaindex_rag.py) start_line:290 end_line:290
-<!--/codeinclude-->
 Ingests evidence into RAG service.
 **Parameters**:
-- `evidence_list`: List of Evidence objects to ingest
-**Note**: Supports multiple embedding providers (OpenAI, local sentence-transformers, Hugging Face).
 #### `retrieve`
 ```python
-def retrieve(
     self,
     query: str,
-    top_k: int | None = None
-) -> list[dict[str, Any]]
 ```
 Retrieves relevant documents for a query.
 **Parameters**:
 - `query`: Search query string
-- `top_k`: Number of top results to return (defaults to `similarity_top_k` from constructor)
-**Returns**: List of dictionaries with `text`, `score`, and `metadata` keys.
 #### `query`
 ```python
-def query(
     self,
-    query_str: str,
-    top_k: int | None = None
 ) -> str
 ```
-Queries RAG service and returns synthesized response.
-**Parameters**:
-- `query_str`: Query string
-- `top_k`: Number of results to use (defaults to `similarity_top_k` from constructor)
-**Returns**: Synthesized response string.
-**Raises**:
-- `ConfigurationError`: If no LLM API key is available for query synthesis
-#### `ingest_documents`
-```python
-def ingest_documents(self, documents: list[Any]) -> None
-```
-Ingests raw LlamaIndex Documents.
 **Parameters**:
-- `documents`: List of LlamaIndex Document objects
-#### `clear_collection`
-```python
-def clear_collection(self) -> None
-```
-Clears all documents from the collection.
 ### Factory Function
 #### `get_rag_service`
 ```python
-def get_rag_service(
-    collection_name: str = "deepcritical_evidence",
-    oauth_token: str | None = None,
-    **kwargs: Any
-) -> LlamaIndexRAGService
 ```
-Get or create a RAG service instance.
-**Parameters**:
-- `collection_name`: Name of the ChromaDB collection (default: "deepcritical_evidence")
-- `oauth_token`: Optional OAuth token from HuggingFace login (takes priority over env vars)
-- `**kwargs`: Additional arguments for LlamaIndexRAGService (e.g., `use_openai_embeddings=False`)
-**Returns**: Configured LlamaIndexRAGService instance.
-**Note**: By default, uses local embeddings (sentence-transformers) which require no API keys.
 ## StatisticalAnalyzer
@@ -248,27 +160,24 @@ Get or create a RAG service instance.
 ```python
 async def analyze(
     self,
-    query: str,
     evidence: list[Evidence],
-    hypothesis: dict[str, Any] | None = None
 ) -> AnalysisResult
 ```
-Analyzes a research question using statistical methods.
 **Parameters**:
-- `query`: The research question
-- `evidence`: List of Evidence objects to analyze
-- `hypothesis`: Optional hypothesis dict with `drug`, `target`, `pathway`, `effect`, `confidence` keys
 **Returns**: `AnalysisResult` with:
 - `verdict`: SUPPORTED, REFUTED, or INCONCLUSIVE
-- `confidence`: Confidence in verdict (0.0-1.0)
-- `statistical_evidence`: Summary of statistical findings
-- `code_generated`: Python code that was executed
-- `execution_output`: Output from code execution
-- `key_takeaways`: Key takeaways from analysis
-- `limitations`: List of limitations
 **Note**: Requires Modal credentials for sandbox execution.
@@ -277,3 +186,12 @@ Analyzes a research question using statistical methods.
 - [Architecture - Services](../architecture/services.md) - Architecture overview
 - [Configuration](../configuration/index.md) - Service configuration

 #### `embed`
+```python
+async def embed(self, text: str) -> list[float]
+```
 Generates embedding for a text string.
 **Returns**: List of (index1, index2) tuples for duplicate pairs.
 ### Factory Function
 #### `get_embedding_service`
 #### `ingest_evidence`
+```python
+async def ingest_evidence(self, evidence: list[Evidence]) -> None
+```
 Ingests evidence into RAG service.
 **Parameters**:
+- `evidence`: List of Evidence objects to ingest
+**Note**: Requires OpenAI API key for embeddings.
 #### `retrieve`
 ```python
+async def retrieve(
     self,
     query: str,
+    top_k: int = 5
+) -> list[Document]
 ```
 Retrieves relevant documents for a query.
 **Parameters**:
 - `query`: Search query string
+- `top_k`: Number of top results to return (default: 5)
+**Returns**: List of Document objects with metadata.
 #### `query`
 ```python
+async def query(
     self,
+    query: str,
+    top_k: int = 5
 ) -> str
 ```
+Queries RAG service and returns formatted results.
 **Parameters**:
+- `query`: Search query string
+- `top_k`: Number of top results to return (default: 5)
+**Returns**: Formatted query results as string.
 ### Factory Function
 #### `get_rag_service`
 ```python
+@lru_cache(maxsize=1)
+def get_rag_service() -> LlamaIndexRAGService | None
 ```
+Returns singleton LlamaIndexRAGService instance, or None if OpenAI key not available.
 ## StatisticalAnalyzer
 ```python
 async def analyze(
     self,
+    hypothesis: str,
     evidence: list[Evidence],
+    data_description: str | None = None
 ) -> AnalysisResult
 ```
+Analyzes a hypothesis using statistical methods.
 **Parameters**:
+- `hypothesis`: Hypothesis to analyze
+- `evidence`: List of Evidence objects
+- `data_description`: Optional data description
 **Returns**: `AnalysisResult` with:
 - `verdict`: SUPPORTED, REFUTED, or INCONCLUSIVE
+- `code`: Generated analysis code
+- `output`: Execution output
+- `error`: Error message if execution failed
 **Note**: Requires Modal credentials for sandbox execution.
 - [Architecture - Services](../architecture/services.md) - Architecture overview
 - [Configuration](../configuration/index.md) - Service configuration

docs/api/tools.md CHANGED Viewed

@@ -56,10 +56,8 @@ Searches PubMed for articles.
 **Returns**: List of `Evidence` objects with PubMed articles.
 **Raises**:
-- `SearchError`: If search fails (timeout, HTTP error, XML parsing error)
-- `RateLimitError`: If rate limit is exceeded (429 status code)
-**Note**: Uses NCBI E-utilities (ESearch → EFetch). Rate limit: 0.34s between requests. Handles single vs. multiple articles.
 ## ClinicalTrialsTool
@@ -98,10 +96,10 @@ Searches ClinicalTrials.gov for trials.
 **Returns**: List of `Evidence` objects with clinical trials.
-**Note**: Only returns interventional studies with status: COMPLETED, ACTIVE_NOT_RECRUITING, RECRUITING, ENROLLING_BY_INVITATION. Uses `requests` library (NOT httpx - WAF blocks httpx). Runs in thread pool for async compatibility.
 **Raises**:
-- `SearchError`: If search fails (HTTP error, request exception)
 ## EuropePMCTool
@@ -140,10 +138,10 @@ Searches Europe PMC for articles and preprints.
 **Returns**: List of `Evidence` objects with articles/preprints.
-**Note**: Includes both preprints (marked with `[PREPRINT - Not peer-reviewed]`) and peer-reviewed articles. Handles preprint markers. Builds URLs from DOI or PMID.
 **Raises**:
-- `SearchError`: If search fails (HTTP error, connection error)
 ## RAGTool
@@ -151,20 +149,6 @@ Searches Europe PMC for articles and preprints.
 **Purpose**: Semantic search within collected evidence.
-### Initialization
-```python
-def __init__(
-    self,
-    rag_service: LlamaIndexRAGService | None = None,
-    oauth_token: str | None = None
-) -> None
-```
-**Parameters**:
-- `rag_service`: Optional RAG service instance. If None, will be lazy-initialized.
-- `oauth_token`: Optional OAuth token from HuggingFace login (for RAG LLM)
 ### Properties
 #### `name`
@@ -196,10 +180,7 @@ Searches collected evidence using semantic similarity.
 **Returns**: List of `Evidence` objects from collected evidence.
-**Raises**:
-- `ConfigurationError`: If RAG service is unavailable
-**Note**: Requires evidence to be ingested into RAG service first. Wraps `LlamaIndexRAGService`. Returns Evidence from RAG results.
 ## SearchHandler
@@ -207,53 +188,44 @@ Searches collected evidence using semantic similarity.
 **Purpose**: Orchestrates parallel searches across multiple tools.
-### Initialization
 ```python
-def __init__(
     self,
-    tools: list[SearchTool],
-    timeout: float = 30.0,
-    include_rag: bool = False,
-    auto_ingest_to_rag: bool = True,
-    oauth_token: str | None = None
-) -> None
 ```
-**Parameters**:
-- `tools`: List of search tools to use
-- `timeout`: Timeout for each search in seconds (default: 30.0)
-- `include_rag`: Whether to include RAG tool in searches (default: False)
-- `auto_ingest_to_rag`: Whether to automatically ingest results into RAG (default: True)
-- `oauth_token`: Optional OAuth token from HuggingFace login (for RAG LLM)
-### Methods
-#### `execute`
-<!--codeinclude-->
-[SearchHandler.execute](../src/tools/search_handler.py) start_line:86 end_line:86
-<!--/codeinclude-->
 Searches multiple tools in parallel.
 **Parameters**:
 - `query`: Search query string
 - `max_results_per_tool`: Maximum results per tool (default: 10)
 **Returns**: `SearchResult` with:
-- `query`: The search query
 - `evidence`: Aggregated list of evidence
-- `sources_searched`: List of source names searched
-- `total_found`: Total number of results
-- `errors`: List of error messages from failed tools
-**Raises**:
-- `SearchError`: If search times out
-**Note**: Uses `asyncio.gather()` for parallel execution. Handles tool failures gracefully (returns errors in `SearchResult.errors`). Automatically ingests evidence into RAG if enabled.
 ## See Also
 - [Architecture - Tools](../architecture/tools.md) - Architecture overview
 - [Models API](models.md) - Data models used by tools

 **Returns**: List of `Evidence` objects with PubMed articles.
 **Raises**:
+- `SearchError`: If search fails
+- `RateLimitError`: If rate limit is exceeded
 ## ClinicalTrialsTool
 **Returns**: List of `Evidence` objects with clinical trials.
+**Note**: Only returns interventional studies with status: COMPLETED, ACTIVE_NOT_RECRUITING, RECRUITING, ENROLLING_BY_INVITATION
 **Raises**:
+- `SearchError`: If search fails
 ## EuropePMCTool
 **Returns**: List of `Evidence` objects with articles/preprints.
+**Note**: Includes both preprints (marked with `[PREPRINT - Not peer-reviewed]`) and peer-reviewed articles.
 **Raises**:
+- `SearchError`: If search fails
 ## RAGTool
 **Purpose**: Semantic search within collected evidence.
 ### Properties
 #### `name`
 **Returns**: List of `Evidence` objects from collected evidence.
+**Note**: Requires evidence to be ingested into RAG service first.
 ## SearchHandler
 **Purpose**: Orchestrates parallel searches across multiple tools.
+### Methods
+#### `search`
 ```python
+async def search(
     self,
+    query: str,
+    tools: list[SearchTool] | None = None,
+    max_results_per_tool: int = 10
+) -> SearchResult
 ```
 Searches multiple tools in parallel.
 **Parameters**:
 - `query`: Search query string
+- `tools`: List of tools to use (default: all available tools)
 - `max_results_per_tool`: Maximum results per tool (default: 10)
 **Returns**: `SearchResult` with:
 - `evidence`: Aggregated list of evidence
+- `tool_results`: Results per tool
+- `total_count`: Total number of results
+**Note**: Uses `asyncio.gather()` for parallel execution. Handles tool failures gracefully.
 ## See Also
 - [Architecture - Tools](../architecture/tools.md) - Architecture overview
 - [Models API](models.md) - Data models used by tools

docs/architecture/agents.md CHANGED Viewed

@@ -4,16 +4,12 @@ DeepCritical uses Pydantic AI agents for all AI-powered operations. All agents f
 ## Agent Pattern
-### Pydantic AI Agents
-Pydantic AI agents use the `Agent` class with the following structure:
 - **System Prompt**: Module-level constant with date injection
 - **Agent Class**: `__init__(model: Any | None = None)`
 - **Main Method**: Async method (e.g., `async def evaluate()`, `async def write_report()`)
-- **Factory Function**: `def create_agent_name(model: Any | None = None, oauth_token: str | None = None) -> AgentName`
-**Note**: Factory functions accept an optional `oauth_token` parameter for HuggingFace authentication, which takes priority over environment variables.
 ## Model Initialization
@@ -159,130 +155,19 @@ For text output (writer agents), agents return `str` directly.
 - `key_entities`: List of key entities
 - `research_questions`: List of research questions
-## Magentic Agents
-The following agents use the `BaseAgent` pattern from `agent-framework` and are used exclusively with `MagenticOrchestrator`:
-### Hypothesis Agent
-**File**: `src/agents/hypothesis_agent.py`
-**Purpose**: Generates mechanistic hypotheses based on evidence.
-**Pattern**: `BaseAgent` from `agent-framework`
-**Methods**:
-- `async def run(messages, thread, **kwargs) -> AgentRunResponse`
-**Features**:
-- Uses internal Pydantic AI `Agent` with `HypothesisAssessment` output type
-- Accesses shared `evidence_store` for evidence
-- Uses embedding service for diverse evidence selection (MMR algorithm)
-- Stores hypotheses in shared context
-### Search Agent
-**File**: `src/agents/search_agent.py`
-**Purpose**: Wraps `SearchHandler` as an agent for Magentic orchestrator.
-**Pattern**: `BaseAgent` from `agent-framework`
-**Methods**:
-- `async def run(messages, thread, **kwargs) -> AgentRunResponse`
-**Features**:
-- Executes searches via `SearchHandlerProtocol`
-- Deduplicates evidence using embedding service
-- Searches for semantically related evidence
-- Updates shared evidence store
-### Analysis Agent
-**File**: `src/agents/analysis_agent.py`
-**Purpose**: Performs statistical analysis using Modal sandbox.
-**Pattern**: `BaseAgent` from `agent-framework`
-**Methods**:
-- `async def run(messages, thread, **kwargs) -> AgentRunResponse`
-**Features**:
-- Wraps `StatisticalAnalyzer` service
-- Analyzes evidence and hypotheses
-- Returns verdict (SUPPORTED/REFUTED/INCONCLUSIVE)
-- Stores analysis results in shared context
-### Report Agent (Magentic)
-**File**: `src/agents/report_agent.py`
-**Purpose**: Generates structured scientific reports from evidence and hypotheses.
-**Pattern**: `BaseAgent` from `agent-framework`
-**Methods**:
-- `async def run(messages, thread, **kwargs) -> AgentRunResponse`
-**Features**:
-- Uses internal Pydantic AI `Agent` with `ResearchReport` output type
-- Accesses shared evidence store and hypotheses
-- Validates citations before returning
-- Formats report as markdown
-### Judge Agent
-**File**: `src/agents/judge_agent.py`
-**Purpose**: Evaluates evidence quality and determines if sufficient for synthesis.
-**Pattern**: `BaseAgent` from `agent-framework`
-**Methods**:
-- `async def run(messages, thread, **kwargs) -> AgentRunResponse`
-- `async def run_stream(messages, thread, **kwargs) -> AsyncIterable[AgentRunResponseUpdate]`
-**Features**:
-- Wraps `JudgeHandlerProtocol`
-- Accesses shared evidence store
-- Returns `JudgeAssessment` with sufficient flag, confidence, and recommendation
-## Agent Patterns
-DeepCritical uses two distinct agent patterns:
-### 1. Pydantic AI Agents (Traditional Pattern)
-These agents use the Pydantic AI `Agent` class directly and are used in iterative and deep research flows:
-- **Pattern**: `Agent(model, output_type, system_prompt)`
-- **Initialization**: `__init__(model: Any | None = None)`
-- **Methods**: Agent-specific async methods (e.g., `async def evaluate()`, `async def write_report()`)
-- **Examples**: `KnowledgeGapAgent`, `ToolSelectorAgent`, `WriterAgent`, `LongWriterAgent`, `ProofreaderAgent`, `ThinkingAgent`, `InputParserAgent`
-### 2. Magentic Agents (Agent-Framework Pattern)
-These agents use the `BaseAgent` class from `agent-framework` and are used in Magentic orchestrator:
-- **Pattern**: `BaseAgent` from `agent-framework` with `async def run()` method
-- **Initialization**: `__init__(evidence_store, embedding_service, ...)`
-- **Methods**: `async def run(messages, thread, **kwargs) -> AgentRunResponse`
-- **Examples**: `HypothesisAgent`, `SearchAgent`, `AnalysisAgent`, `ReportAgent`, `JudgeAgent`
-**Note**: Magentic agents are used exclusively with the `MagenticOrchestrator` and follow the agent-framework protocol for multi-agent coordination.
 ## Factory Functions
 All agents have factory functions in `src/agent_factory/agents.py`:
-<!--codeinclude-->
-[Factory Functions](../src/agent_factory/agents.py) start_line:79 end_line:100
-<!--/codeinclude-->
 Factory functions:
 - Use `get_model()` if no model provided
-- Accept `oauth_token` parameter for HuggingFace authentication
 - Raise `ConfigurationError` if creation fails
 - Log agent creation
@@ -291,3 +176,13 @@ Factory functions:
 - [Orchestrators](orchestrators.md) - How agents are orchestrated
 - [API Reference - Agents](../api/agents.md) - API documentation
 - [Contributing - Code Style](../contributing/code-style.md) - Development guidelines

 ## Agent Pattern
+All agents use the Pydantic AI `Agent` class with the following structure:
 - **System Prompt**: Module-level constant with date injection
 - **Agent Class**: `__init__(model: Any | None = None)`
 - **Main Method**: Async method (e.g., `async def evaluate()`, `async def write_report()`)
+- **Factory Function**: `def create_agent_name(model: Any | None = None) -> AgentName`
 ## Model Initialization
 - `key_entities`: List of key entities
 - `research_questions`: List of research questions
 ## Factory Functions
 All agents have factory functions in `src/agent_factory/agents.py`:
+```python
+def create_knowledge_gap_agent(model: Any | None = None) -> KnowledgeGapAgent
+def create_tool_selector_agent(model: Any | None = None) -> ToolSelectorAgent
+def create_writer_agent(model: Any | None = None) -> WriterAgent
+# ... etc
+```
 Factory functions:
 - Use `get_model()` if no model provided
 - Raise `ConfigurationError` if creation fails
 - Log agent creation
 - [Orchestrators](orchestrators.md) - How agents are orchestrated
 - [API Reference - Agents](../api/agents.md) - API documentation
 - [Contributing - Code Style](../contributing/code-style.md) - Development guidelines

docs/architecture/graph-orchestration.md ADDED Viewed

	@@ -0,0 +1,152 @@

+# Graph Orchestration Architecture
+## Overview
+Phase 4 implements a graph-based orchestration system for research workflows using Pydantic AI agents as nodes. This enables better parallel execution, conditional routing, and state management compared to simple agent chains.
+## Graph Structure
+### Nodes
+Graph nodes represent different stages in the research workflow:
+1. **Agent Nodes**: Execute Pydantic AI agents
+   - Input: Prompt/query
+   - Output: Structured or unstructured response
+   - Examples: `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`
+2. **State Nodes**: Update or read workflow state
+   - Input: Current state
+   - Output: Updated state
+   - Examples: Update evidence, update conversation history
+3. **Decision Nodes**: Make routing decisions based on conditions
+   - Input: Current state/results
+   - Output: Next node ID
+   - Examples: Continue research vs. complete research
+4. **Parallel Nodes**: Execute multiple nodes concurrently
+   - Input: List of node IDs
+   - Output: Aggregated results
+   - Examples: Parallel iterative research loops
+### Edges
+Edges define transitions between nodes:
+1. **Sequential Edges**: Always traversed (no condition)
+   - From: Source node
+   - To: Target node
+   - Condition: None (always True)
+2. **Conditional Edges**: Traversed based on condition
+   - From: Source node
+   - To: Target node
+   - Condition: Callable that returns bool
+   - Example: If research complete → go to writer, else → continue loop
+3. **Parallel Edges**: Used for parallel execution branches
+   - From: Parallel node
+   - To: Multiple target nodes
+   - Execution: All targets run concurrently
+## Graph Patterns
+### Iterative Research Graph
+```
+[Input] → [Thinking] → [Knowledge Gap] → [Decision: Complete?]
+                                              ↓ No          ↓ Yes
+                                    [Tool Selector]    [Writer]
+                                              ↓
+                                    [Execute Tools] → [Loop Back]
+```
+### Deep Research Graph
+```
+[Input] → [Planner] → [Parallel Iterative Loops] → [Synthesizer]
+                           ↓         ↓         ↓
+                        [Loop1]  [Loop2]  [Loop3]
+```
+## State Management
+State is managed via `WorkflowState` using `ContextVar` for thread-safe isolation:
+- **Evidence**: Collected evidence from searches
+- **Conversation**: Iteration history (gaps, tool calls, findings, thoughts)
+- **Embedding Service**: For semantic search
+State transitions occur at state nodes, which update the global workflow state.
+## Execution Flow
+1. **Graph Construction**: Build graph from nodes and edges
+2. **Graph Validation**: Ensure graph is valid (no cycles, all nodes reachable)
+3. **Graph Execution**: Traverse graph from entry node
+4. **Node Execution**: Execute each node based on type
+5. **Edge Evaluation**: Determine next node(s) based on edges
+6. **Parallel Execution**: Use `asyncio.gather()` for parallel nodes
+7. **State Updates**: Update state at state nodes
+8. **Event Streaming**: Yield events during execution for UI
+## Conditional Routing
+Decision nodes evaluate conditions and return next node IDs:
+- **Knowledge Gap Decision**: If `research_complete` → writer, else → tool selector
+- **Budget Decision**: If budget exceeded → exit, else → continue
+- **Iteration Decision**: If max iterations → exit, else → continue
+## Parallel Execution
+Parallel nodes execute multiple nodes concurrently:
+- Each parallel branch runs independently
+- Results are aggregated after all branches complete
+- State is synchronized after parallel execution
+- Errors in one branch don't stop other branches
+## Budget Enforcement
+Budget constraints are enforced at decision nodes:
+- **Token Budget**: Track LLM token usage
+- **Time Budget**: Track elapsed time
+- **Iteration Budget**: Track iteration count
+If any budget is exceeded, execution routes to exit node.
+## Error Handling
+Errors are handled at multiple levels:
+1. **Node Level**: Catch errors in individual node execution
+2. **Graph Level**: Handle errors during graph traversal
+3. **State Level**: Rollback state changes on error
+Errors are logged and yield error events for UI.
+## Backward Compatibility
+Graph execution is optional via feature flag:
+- `USE_GRAPH_EXECUTION=true`: Use graph-based execution
+- `USE_GRAPH_EXECUTION=false`: Use agent chain execution (existing)
+This allows gradual migration and fallback if needed.

docs/architecture/graph_orchestration.md CHANGED Viewed

@@ -2,163 +2,7 @@
 ## Overview
-DeepCritical implements a graph-based orchestration system for research workflows using Pydantic AI agents as nodes. This enables better parallel execution, conditional routing, and state management compared to simple agent chains.
-## Conversation History
-DeepCritical supports multi-turn conversations through Pydantic AI's native message history format. The system maintains two types of history:
-1. **User Conversation History**: Multi-turn user interactions (from Gradio chat interface) stored as `list[ModelMessage]`
-2. **Research Iteration History**: Internal research process state (existing `Conversation` model)
-### Message History Flow
-```
-Gradio Chat History → convert_gradio_to_message_history() → GraphOrchestrator.run(message_history)
-    ↓
-GraphExecutionContext (stores message_history)
-    ↓
-Agent Nodes (receive message_history via agent.run())
-    ↓
-WorkflowState (persists user_message_history)
-```
-### Usage
-Message history is automatically converted from Gradio format and passed through the orchestrator:
-```python
-# In app.py - automatic conversion
-message_history = convert_gradio_to_message_history(history) if history else None
-async for event in orchestrator.run(query, message_history=message_history):
-    yield event
-```
-Agents receive message history through their `run()` methods:
-```python
-# In agent execution
-if message_history:
-    result = await agent.run(input_data, message_history=message_history)
-```
-## Graph Patterns
-### Iterative Research Graph
-The iterative research graph follows this pattern:
-```
-[Input] → [Thinking] → [Knowledge Gap] → [Decision: Complete?]
-                                              ↓ No          ↓ Yes
-                                    [Tool Selector]    [Writer]
-                                              ↓
-                                    [Execute Tools] → [Loop Back]
-```
-**Node IDs**: `thinking` → `knowledge_gap` → `continue_decision` → `tool_selector`/`writer` → `execute_tools` → (loop back to `thinking`)
-**Special Node Handling**:
-- `execute_tools`: State node that uses `search_handler` to execute searches and add evidence to workflow state
-- `continue_decision`: Decision node that routes based on `research_complete` flag from `KnowledgeGapOutput`
-### Deep Research Graph
-The deep research graph follows this pattern:
-```
-[Input] → [Planner] → [Store Plan] → [Parallel Loops] → [Collect Drafts] → [Synthesizer]
-                                        ↓         ↓         ↓
-                                     [Loop1]  [Loop2]  [Loop3]
-```
-**Node IDs**: `planner` → `store_plan` → `parallel_loops` → `collect_drafts` → `synthesizer`
-**Special Node Handling**:
-- `planner`: Agent node that creates `ReportPlan` with report outline
-- `store_plan`: State node that stores `ReportPlan` in context for parallel loops
-- `parallel_loops`: Parallel node that executes `IterativeResearchFlow` instances for each section
-- `collect_drafts`: State node that collects section drafts from parallel loops
-- `synthesizer`: Agent node that calls `LongWriterAgent.write_report()` directly with `ReportDraft`
-### Deep Research
-```mermaid
-sequenceDiagram
-    actor User
-    participant GraphOrchestrator
-    participant InputParser
-    participant GraphBuilder
-    participant GraphExecutor
-    participant Agent
-    participant BudgetTracker
-    participant WorkflowState
-    User->>GraphOrchestrator: run(query)
-    GraphOrchestrator->>InputParser: detect_research_mode(query)
-    InputParser-->>GraphOrchestrator: mode (iterative/deep)
-    GraphOrchestrator->>GraphBuilder: build_graph(mode)
-    GraphBuilder-->>GraphOrchestrator: ResearchGraph
-    GraphOrchestrator->>WorkflowState: init_workflow_state()
-    GraphOrchestrator->>BudgetTracker: create_budget()
-    GraphOrchestrator->>GraphExecutor: _execute_graph(graph)
-    loop For each node in graph
-        GraphExecutor->>Agent: execute_node(agent_node)
-        Agent->>Agent: process_input
-        Agent-->>GraphExecutor: result
-        GraphExecutor->>WorkflowState: update_state(result)
-        GraphExecutor->>BudgetTracker: add_tokens(used)
-        GraphExecutor->>BudgetTracker: check_budget()
-        alt Budget exceeded
-            GraphExecutor->>GraphOrchestrator: emit(error_event)
-        else Continue
-            GraphExecutor->>GraphOrchestrator: emit(progress_event)
-        end
-    end
-    GraphOrchestrator->>User: AsyncGenerator[AgentEvent]
-```
-### Iterative Research
-```mermaid
-sequenceDiagram
-    participant IterativeFlow
-    participant ThinkingAgent
-    participant KnowledgeGapAgent
-    participant ToolSelector
-    participant ToolExecutor
-    participant JudgeHandler
-    participant WriterAgent
-    IterativeFlow->>IterativeFlow: run(query)
-    loop Until complete or max_iterations
-        IterativeFlow->>ThinkingAgent: generate_observations()
-        ThinkingAgent-->>IterativeFlow: observations
-        IterativeFlow->>KnowledgeGapAgent: evaluate_gaps()
-        KnowledgeGapAgent-->>IterativeFlow: KnowledgeGapOutput
-        alt Research complete
-            IterativeFlow->>WriterAgent: create_final_report()
-            WriterAgent-->>IterativeFlow: final_report
-        else Gaps remain
-            IterativeFlow->>ToolSelector: select_agents(gap)
-            ToolSelector-->>IterativeFlow: AgentSelectionPlan
-            IterativeFlow->>ToolExecutor: execute_tool_tasks()
-            ToolExecutor-->>IterativeFlow: ToolAgentOutput[]
-            IterativeFlow->>JudgeHandler: assess_evidence()
-            JudgeHandler-->>IterativeFlow: should_continue
-        end
-    end
-```
 ## Graph Structure
@@ -206,6 +50,25 @@ Edges define transitions between nodes:
    - To: Multiple target nodes
    - Execution: All targets run concurrently
 ## State Management
@@ -219,35 +82,14 @@ State transitions occur at state nodes, which update the global workflow state.
 ## Execution Flow
-1. **Graph Construction**: Build graph from nodes and edges using `create_iterative_graph()` or `create_deep_graph()`
-2. **Graph Validation**: Ensure graph is valid (no cycles, all nodes reachable) via `ResearchGraph.validate_structure()`
-3. **Graph Execution**: Traverse graph from entry node using `GraphOrchestrator._execute_graph()`
-4. **Node Execution**: Execute each node based on type:
-   - **Agent Nodes**: Call `agent.run()` with transformed input
-   - **State Nodes**: Update workflow state via `state_updater` function
-   - **Decision Nodes**: Evaluate `decision_function` to get next node ID
-   - **Parallel Nodes**: Execute all parallel nodes concurrently via `asyncio.gather()`
-5. **Edge Evaluation**: Determine next node(s) based on edges and conditions
 6. **Parallel Execution**: Use `asyncio.gather()` for parallel nodes
-7. **State Updates**: Update state at state nodes via `GraphExecutionContext.update_state()`
-8. **Event Streaming**: Yield `AgentEvent` objects during execution for UI
-### GraphExecutionContext
-The `GraphExecutionContext` class manages execution state during graph traversal:
-- **State**: Current `WorkflowState` instance
-- **Budget Tracker**: `BudgetTracker` instance for budget enforcement
-- **Node Results**: Dictionary storing results from each node execution
-- **Visited Nodes**: Set of node IDs that have been executed
-- **Current Node**: ID of the node currently being executed
-Methods:
-- `set_node_result(node_id, result)`: Store result from node execution
-- `get_node_result(node_id)`: Retrieve stored result
-- `has_visited(node_id)`: Check if node was visited
-- `mark_visited(node_id)`: Mark node as visited
-- `update_state(updater, data)`: Update workflow state
 ## Conditional Routing
@@ -298,5 +140,20 @@ This allows gradual migration and fallback if needed.
 ## See Also
 - [Orchestrators](orchestrators.md) - Overview of all orchestrator patterns
 - [Workflow Diagrams](workflow-diagrams.md) - Detailed workflow diagrams
 - [API Reference - Orchestrators](../api/orchestrators.md) - API documentation

 ## Overview
+Phase 4 implements a graph-based orchestration system for research workflows using Pydantic AI agents as nodes. This enables better parallel execution, conditional routing, and state management compared to simple agent chains.
 ## Graph Structure
    - To: Multiple target nodes
    - Execution: All targets run concurrently
+## Graph Patterns
+### Iterative Research Graph
+```
+[Input] → [Thinking] → [Knowledge Gap] → [Decision: Complete?]
+                                              ↓ No          ↓ Yes
+                                    [Tool Selector]    [Writer]
+                                              ↓
+                                    [Execute Tools] → [Loop Back]
+```
+### Deep Research Graph
+```
+[Input] → [Planner] → [Parallel Iterative Loops] → [Synthesizer]
+                           ↓         ↓         ↓
+                        [Loop1]  [Loop2]  [Loop3]
+```
 ## State Management
 ## Execution Flow
+1. **Graph Construction**: Build graph from nodes and edges
+2. **Graph Validation**: Ensure graph is valid (no cycles, all nodes reachable)
+3. **Graph Execution**: Traverse graph from entry node
+4. **Node Execution**: Execute each node based on type
+5. **Edge Evaluation**: Determine next node(s) based on edges
 6. **Parallel Execution**: Use `asyncio.gather()` for parallel nodes
+7. **State Updates**: Update state at state nodes
+8. **Event Streaming**: Yield events during execution for UI
 ## Conditional Routing
 ## See Also
 - [Orchestrators](orchestrators.md) - Overview of all orchestrator patterns
+- [Workflows](workflows.md) - Workflow diagrams and patterns
 - [Workflow Diagrams](workflow-diagrams.md) - Detailed workflow diagrams
 - [API Reference - Orchestrators](../api/orchestrators.md) - API documentation

docs/architecture/middleware.md CHANGED Viewed

@@ -18,20 +18,22 @@ DeepCritical uses middleware for state management, budget tracking, and workflow
 - `embedding_service: Any`: Embedding service for semantic search
 **Methods**:
-- `add_evidence(new_evidence: list[Evidence]) -> int`: Adds evidence with URL-based deduplication. Returns the number of new items added (excluding duplicates).
-- `async search_related(query: str, n_results: int = 5) -> list[Evidence]`: Semantic search for related evidence using embedding service
 **Initialization**:
-<!--codeinclude-->
-[Initialize Workflow State](../src/middleware/state_machine.py) start_line:98 end_line:110
-<!--/codeinclude-->
 **Access**:
-<!--codeinclude-->
-[Get Workflow State](../src/middleware/state_machine.py) start_line:115 end_line:129
-<!--/codeinclude-->
 ## Workflow Manager
@@ -40,10 +42,10 @@ DeepCritical uses middleware for state management, budget tracking, and workflow
 **Purpose**: Coordinates parallel research loops
 **Methods**:
-- `async add_loop(loop_id: str, query: str) -> ResearchLoop`: Add a new research loop to manage
-- `async run_loops_parallel(loop_configs: list[dict], loop_func: Callable, judge_handler: Any | None = None, budget_tracker: Any | None = None) -> list[Any]`: Run multiple research loops in parallel. Takes configuration dicts and a loop function.
-- `async update_loop_status(loop_id: str, status: LoopStatus, error: str | None = None)`: Update loop status
-- `async sync_loop_evidence_to_state(loop_id: str)`: Synchronize evidence from a specific loop to global state
 **Features**:
 - Uses `asyncio.gather()` for parallel execution
@@ -56,22 +58,9 @@ DeepCritical uses middleware for state management, budget tracking, and workflow
 from src.middleware.workflow_manager import WorkflowManager
 manager = WorkflowManager()
-await manager.add_loop("loop1", "Research query 1")
-await manager.add_loop("loop2", "Research query 2")
-async def run_research(config: dict) -> str:
-    loop_id = config["loop_id"]
-    query = config["query"]
-    # ... research logic ...
-    return "report"
-results = await manager.run_loops_parallel(
-    loop_configs=[
-        {"loop_id": "loop1", "query": "Research query 1"},
-        {"loop_id": "loop2", "query": "Research query 2"},
-    ],
-    loop_func=run_research,
-)
 ```
 ## Budget Tracker
@@ -86,13 +75,13 @@ results = await manager.run_loops_parallel(
 - **Iterations**: Number of iterations
 **Methods**:
-- `create_budget(loop_id: str, tokens_limit: int = 100000, time_limit_seconds: float = 600.0, iterations_limit: int = 10) -> BudgetStatus`: Create a budget for a specific loop
-- `add_tokens(loop_id: str, tokens: int)`: Add token usage to a loop's budget
-- `start_timer(loop_id: str)`: Start time tracking for a loop
-- `update_timer(loop_id: str)`: Update elapsed time for a loop
-- `increment_iteration(loop_id: str)`: Increment iteration count for a loop
-- `check_budget(loop_id: str) -> tuple[bool, str]`: Check if a loop's budget has been exceeded. Returns (exceeded: bool, reason: str)
-- `can_continue(loop_id: str) -> bool`: Check if a loop can continue based on budget
 **Token Estimation**:
 - `estimate_tokens(text: str) -> int`: ~4 chars per token
@@ -104,20 +93,13 @@ from src.middleware.budget_tracker import BudgetTracker
 tracker = BudgetTracker()
 budget = tracker.create_budget(
-    loop_id="research_loop",
-    tokens_limit=100000,
     time_limit_seconds=600,
     iterations_limit=10
 )
-tracker.start_timer("research_loop")
 # ... research operations ...
-tracker.add_tokens("research_loop", 5000)
-tracker.update_timer("research_loop")
-exceeded, reason = tracker.check_budget("research_loop")
-if exceeded:
-    # Budget exceeded, stop research
-    pass
-if not tracker.can_continue("research_loop"):
     # Budget exceeded, stop research
     pass
 ```
@@ -144,3 +126,13 @@ All middleware components use `ContextVar` for thread-safe isolation:
 - [Orchestrators](orchestrators.md) - How middleware is used in orchestration
 - [API Reference - Orchestrators](../api/orchestrators.md) - API documentation
 - [Contributing - Code Style](../contributing/code-style.md) - Development guidelines

 - `embedding_service: Any`: Embedding service for semantic search
 **Methods**:
+- `add_evidence(evidence: Evidence)`: Adds evidence with URL-based deduplication
+- `async search_related(query: str, top_k: int = 5) -> list[Evidence]`: Semantic search
 **Initialization**:
+```python
+from src.middleware.state_machine import init_workflow_state
+init_workflow_state(embedding_service)
+```
 **Access**:
+```python
+from src.middleware.state_machine import get_workflow_state
+state = get_workflow_state()  # Auto-initializes if missing
+```
 ## Workflow Manager
 **Purpose**: Coordinates parallel research loops
 **Methods**:
+- `add_loop(loop: ResearchLoop)`: Add a research loop to manage
+- `async run_loops_parallel() -> list[ResearchLoop]`: Run all loops in parallel
+- `update_loop_status(loop_id: str, status: str)`: Update loop status
+- `sync_loop_evidence_to_state()`: Synchronize evidence from loops to global state
 **Features**:
 - Uses `asyncio.gather()` for parallel execution
 from src.middleware.workflow_manager import WorkflowManager
 manager = WorkflowManager()
+manager.add_loop(loop1)
+manager.add_loop(loop2)
+completed_loops = await manager.run_loops_parallel()
 ```
 ## Budget Tracker
 - **Iterations**: Number of iterations
 **Methods**:
+- `create_budget(token_limit, time_limit_seconds, iterations_limit) -> BudgetStatus`
+- `add_tokens(tokens: int)`: Add token usage
+- `start_timer()`: Start time tracking
+- `update_timer()`: Update elapsed time
+- `increment_iteration()`: Increment iteration count
+- `check_budget() -> BudgetStatus`: Check current budget status
+- `can_continue() -> bool`: Check if research can continue
 **Token Estimation**:
 - `estimate_tokens(text: str) -> int`: ~4 chars per token
 tracker = BudgetTracker()
 budget = tracker.create_budget(
+    token_limit=100000,
     time_limit_seconds=600,
     iterations_limit=10
 )
+tracker.start_timer()
 # ... research operations ...
+if not tracker.can_continue():
     # Budget exceeded, stop research
     pass
 ```
 - [Orchestrators](orchestrators.md) - How middleware is used in orchestration
 - [API Reference - Orchestrators](../api/orchestrators.md) - API documentation
 - [Contributing - Code Style](../contributing/code-style.md) - Development guidelines

docs/architecture/orchestrators.md CHANGED Viewed

@@ -23,10 +23,19 @@ DeepCritical supports multiple orchestration patterns for research workflows.
 - Iterates until research complete or constraints met
 **Usage**:
-<!--codeinclude-->
-[IterativeResearchFlow Initialization](../src/orchestrator/research_flow.py) start_line:57 end_line:80
-<!--/codeinclude-->
 ### DeepResearchFlow
@@ -46,10 +55,19 @@ DeepCritical supports multiple orchestration patterns for research workflows.
 - Supports graph execution and agent chains
 **Usage**:
-<!--codeinclude-->
-[DeepResearchFlow Initialization](../src/orchestrator/research_flow.py) start_line:709 end_line:728
-<!--/codeinclude-->
 ## Graph Orchestrator
@@ -58,10 +76,9 @@ DeepCritical supports multiple orchestration patterns for research workflows.
 **Purpose**: Graph-based execution using Pydantic AI agents as nodes
 **Features**:
-- Uses graph execution (`use_graph=True`) or agent chains (`use_graph=False`) as fallback
 - Routes based on research mode (iterative/deep/auto)
 - Streams `AgentEvent` objects for UI
-- Uses `GraphExecutionContext` to manage execution state
 **Node Types**:
 - **Agent Nodes**: Execute Pydantic AI agents
@@ -74,22 +91,6 @@ DeepCritical supports multiple orchestration patterns for research workflows.
 - **Conditional Edges**: Traversed based on condition
 - **Parallel Edges**: Used for parallel execution branches
-**Special Node Handling**:
-The `GraphOrchestrator` has special handling for certain nodes:
-- **`execute_tools` node**: State node that uses `search_handler` to execute searches and add evidence to workflow state
-- **`parallel_loops` node**: Parallel node that executes `IterativeResearchFlow` instances for each section in deep research mode
-- **`synthesizer` node**: Agent node that calls `LongWriterAgent.write_report()` directly with `ReportDraft` instead of using `agent.run()`
-- **`writer` node**: Agent node that calls `WriterAgent.write_report()` directly with findings instead of using `agent.run()`
-**GraphExecutionContext**:
-The orchestrator uses `GraphExecutionContext` to manage execution state:
-- Tracks current node, visited nodes, and node results
-- Manages workflow state and budget tracker
-- Provides methods to store and retrieve node execution results
 ## Orchestrator Factory
 **File**: `src/orchestrator_factory.py`
@@ -102,10 +103,16 @@ The orchestrator uses `GraphExecutionContext` to manage execution state:
 - **Auto-detect**: Chooses based on API key availability
 **Usage**:
-<!--codeinclude-->
-[Create Orchestrator](../src/orchestrator_factory.py) start_line:44 end_line:66
-<!--/codeinclude-->
 ## Magentic Orchestrator
@@ -116,26 +123,14 @@ The orchestrator uses `GraphExecutionContext` to manage execution state:
 **Features**:
 - Uses `agent-framework-core`
 - ChatAgent pattern with internal LLMs per agent
-- `MagenticBuilder` with participants:
-  - `searcher`: SearchAgent (wraps SearchHandler)
-  - `hypothesizer`: HypothesisAgent (generates hypotheses)
-  - `judge`: JudgeAgent (evaluates evidence)
-  - `reporter`: ReportAgent (generates final report)
-- Manager orchestrates agents via chat client (OpenAI or HuggingFace)
-- Event-driven: converts Magentic events to `AgentEvent` for UI streaming via `_process_event()` method
-- Supports max rounds, stall detection, and reset handling
-**Event Processing**:
-The orchestrator processes Magentic events and converts them to `AgentEvent`:
-- `MagenticOrchestratorMessageEvent` → `AgentEvent` with type based on message content
-- `MagenticAgentMessageEvent` → `AgentEvent` with type based on agent name
-- `MagenticAgentDeltaEvent` → `AgentEvent` for streaming updates
-- `MagenticFinalResultEvent` → `AgentEvent` with type "complete"
 **Requirements**:
 - `agent-framework-core` package
-- OpenAI API key or HuggingFace authentication
 ## Hierarchical Orchestrator
@@ -164,9 +159,13 @@ The orchestrator processes Magentic events and converts them to `AgentEvent`:
 All orchestrators must initialize workflow state:
-<!--codeinclude-->
-[Initialize Workflow State](../src/middleware/state_machine.py) start_line:98 end_line:112
-<!--/codeinclude-->
 ## Event Streaming
@@ -174,28 +173,26 @@ All orchestrators yield `AgentEvent` objects:
 **Event Types**:
 - `started`: Research started
-- `searching`: Search in progress
 - `search_complete`: Search completed
-- `judging`: Evidence evaluation in progress
 - `judge_complete`: Evidence evaluation completed
-- `looping`: Iteration in progress
 - `hypothesizing`: Generating hypotheses
-- `analyzing`: Statistical analysis in progress
-- `analysis_complete`: Statistical analysis completed
 - `synthesizing`: Synthesizing results
 - `complete`: Research completed
 - `error`: Error occurred
-- `streaming`: Streaming update (delta events)
 **Event Structure**:
-<!--codeinclude-->
-[AgentEvent Model](../src/utils/models.py) start_line:104 end_line:126
-<!--/codeinclude-->
 ## See Also
-- [Graph Orchestration](graph_orchestration.md) - Graph-based execution details
 - [Workflow Diagrams](workflow-diagrams.md) - Detailed workflow diagrams
 - [API Reference - Orchestrators](../api/orchestrators.md) - API documentation

 - Iterates until research complete or constraints met
 **Usage**:
+```python
+from src.orchestrator.research_flow import IterativeResearchFlow
+flow = IterativeResearchFlow(
+    search_handler=search_handler,
+    judge_handler=judge_handler,
+    use_graph=False
+)
+async for event in flow.run(query):
+    # Handle events
+    pass
+```
 ### DeepResearchFlow
 - Supports graph execution and agent chains
 **Usage**:
+```python
+from src.orchestrator.research_flow import DeepResearchFlow
+flow = DeepResearchFlow(
+    search_handler=search_handler,
+    judge_handler=judge_handler,
+    use_graph=True
+)
+async for event in flow.run(query):
+    # Handle events
+    pass
+```
 ## Graph Orchestrator
 **Purpose**: Graph-based execution using Pydantic AI agents as nodes
 **Features**:
+- Uses Pydantic AI Graphs (when available) or agent chains (fallback)
 - Routes based on research mode (iterative/deep/auto)
 - Streams `AgentEvent` objects for UI
 **Node Types**:
 - **Agent Nodes**: Execute Pydantic AI agents
 - **Conditional Edges**: Traversed based on condition
 - **Parallel Edges**: Used for parallel execution branches
 ## Orchestrator Factory
 **File**: `src/orchestrator_factory.py`
 - **Auto-detect**: Chooses based on API key availability
 **Usage**:
+```python
+from src.orchestrator_factory import create_orchestrator
+orchestrator = create_orchestrator(
+    search_handler=search_handler,
+    judge_handler=judge_handler,
+    config={},
+    mode="advanced"  # or "simple" or None for auto-detect
+)
+```
 ## Magentic Orchestrator
 **Features**:
 - Uses `agent-framework-core`
 - ChatAgent pattern with internal LLMs per agent
+- `MagenticBuilder` with participants: searcher, hypothesizer, judge, reporter
+- Manager orchestrates agents via `OpenAIChatClient`
+- Requires OpenAI API key (function calling support)
+- Event-driven: converts Magentic events to `AgentEvent` for UI streaming
 **Requirements**:
 - `agent-framework-core` package
+- OpenAI API key
 ## Hierarchical Orchestrator
 All orchestrators must initialize workflow state:
+```python
+from src.middleware.state_machine import init_workflow_state
+from src.services.embeddings import get_embedding_service
+embedding_service = get_embedding_service()
+init_workflow_state(embedding_service)
+```
 ## Event Streaming
 **Event Types**:
 - `started`: Research started
 - `search_complete`: Search completed
 - `judge_complete`: Evidence evaluation completed
 - `hypothesizing`: Generating hypotheses
 - `synthesizing`: Synthesizing results
 - `complete`: Research completed
 - `error`: Error occurred
 **Event Structure**:
+```python
+class AgentEvent:
+    type: str
+    iteration: int | None
+    data: dict[str, Any]
+```
 ## See Also
+- [Graph Orchestration](graph-orchestration.md) - Graph-based execution details
+- [Graph Orchestration (Detailed)](graph_orchestration.md) - Detailed graph architecture
+- [Workflows](workflows.md) - Workflow diagrams and patterns
 - [Workflow Diagrams](workflow-diagrams.md) - Detailed workflow diagrams
 - [API Reference - Orchestrators](../api/orchestrators.md) - API documentation

docs/architecture/services.md CHANGED Viewed

@@ -10,18 +10,17 @@ DeepCritical provides several services for embeddings, RAG, and statistical anal
 **Features**:
 - **No API Key Required**: Uses local sentence-transformers models
-- **Async-Safe**: All operations use `run_in_executor()` to avoid blocking the event loop
-- **ChromaDB Storage**: In-memory vector storage for embeddings
-- **Deduplication**: 0.9 similarity threshold by default (90% similarity = duplicate, configurable)
 **Model**: Configurable via `settings.local_embedding_model` (default: `all-MiniLM-L6-v2`)
 **Methods**:
-- `async def embed(text: str) -> list[float]`: Generate embeddings (async-safe via `run_in_executor()`)
-- `async def embed_batch(texts: list[str]) -> list[list[float]]`: Batch embedding (more efficient)
-- `async def add_evidence(evidence_id: str, content: str, metadata: dict[str, Any]) -> None`: Add evidence to vector store
-- `async def search_similar(query: str, n_results: int = 5) -> list[dict[str, Any]]`: Find semantically similar evidence
-- `async def deduplicate(new_evidence: list[Evidence], threshold: float = 0.9) -> list[Evidence]`: Remove semantically duplicate evidence
 **Usage**:
 ```python
@@ -33,21 +32,15 @@ embedding = await service.embed("text to embed")
 ## LlamaIndex RAG Service
-**File**: `src/services/llamaindex_rag.py`
 **Purpose**: Retrieval-Augmented Generation using LlamaIndex
 **Features**:
-- **Multiple Embedding Providers**: OpenAI embeddings (requires `OPENAI_API_KEY`) or local sentence-transformers (no API key)
-- **Multiple LLM Providers**: HuggingFace LLM (preferred) or OpenAI LLM (fallback) for query synthesis
-- **ChromaDB Storage**: Vector database for document storage (supports in-memory mode)
 - **Metadata Preservation**: Preserves source, title, URL, date, authors
-- **Lazy Initialization**: Graceful fallback if dependencies not available
-**Initialization Parameters**:
-- `use_openai_embeddings: bool | None`: Force OpenAI embeddings (None = auto-detect)
-- `use_in_memory: bool`: Use in-memory ChromaDB client (useful for tests)
-- `oauth_token: str | None`: Optional OAuth token from HuggingFace login (takes priority over env vars)
 **Methods**:
 - `async def ingest_evidence(evidence: list[Evidence]) -> None`: Ingest evidence into RAG
@@ -56,13 +49,9 @@ embedding = await service.embed("text to embed")
 **Usage**:
 ```python
-from src.services.llamaindex_rag import get_rag_service
-service = get_rag_service(
-    use_openai_embeddings=False,  # Use local embeddings
-    use_in_memory=True,  # Use in-memory ChromaDB
-    oauth_token=token  # Optional HuggingFace token
-)
 if service:
     documents = await service.retrieve("query", top_k=5)
 ```
@@ -103,19 +92,13 @@ result = await analyzer.analyze(
 ## Singleton Pattern
-Services use singleton patterns for lazy initialization:
-**EmbeddingService**: Uses a global variable pattern:
-<!--codeinclude-->
-[EmbeddingService Singleton](../src/services/embeddings.py) start_line:164 end_line:172
-<!--/codeinclude-->
-**LlamaIndexRAGService**: Direct instantiation (no caching):
-<!--codeinclude-->
-[LlamaIndexRAGService Factory](../src/services/llamaindex_rag.py) start_line:440 end_line:466
-<!--/codeinclude-->
 This ensures:
 - Single instance per process
@@ -144,3 +127,12 @@ if settings.has_openai_key:
 - [API Reference - Services](../api/services.md) - API documentation
 - [Configuration](../configuration/index.md) - Service configuration

 **Features**:
 - **No API Key Required**: Uses local sentence-transformers models
+- **Async-Safe**: All operations use `run_in_executor()` to avoid blocking
+- **ChromaDB Storage**: Vector storage for embeddings
+- **Deduplication**: 0.85 similarity threshold (85% similarity = duplicate)
 **Model**: Configurable via `settings.local_embedding_model` (default: `all-MiniLM-L6-v2`)
 **Methods**:
+- `async def embed(text: str) -> list[float]`: Generate embeddings
+- `async def embed_batch(texts: list[str]) -> list[list[float]]`: Batch embedding
+- `async def similarity(text1: str, text2: str) -> float`: Calculate similarity
+- `async def find_duplicates(texts: list[str], threshold: float = 0.85) -> list[tuple[int, int]]`: Find duplicates
 **Usage**:
 ```python
 ## LlamaIndex RAG Service
+**File**: `src/services/rag.py`
 **Purpose**: Retrieval-Augmented Generation using LlamaIndex
 **Features**:
+- **OpenAI Embeddings**: Requires `OPENAI_API_KEY`
+- **ChromaDB Storage**: Vector database for document storage
 - **Metadata Preservation**: Preserves source, title, URL, date, authors
+- **Lazy Initialization**: Graceful fallback if OpenAI key not available
 **Methods**:
 - `async def ingest_evidence(evidence: list[Evidence]) -> None`: Ingest evidence into RAG
 **Usage**:
 ```python
+from src.services.rag import get_rag_service
+service = get_rag_service()
 if service:
     documents = await service.retrieve("query", top_k=5)
 ```
 ## Singleton Pattern
+All services use the singleton pattern with `@lru_cache(maxsize=1)`:
+```python
+@lru_cache(maxsize=1)
+def get_embedding_service() -> EmbeddingService:
+    return EmbeddingService()
+```
 This ensures:
 - Single instance per process
 - [API Reference - Services](../api/services.md) - API documentation
 - [Configuration](../configuration/index.md) - Service configuration

docs/architecture/tools.md CHANGED Viewed

@@ -6,17 +6,30 @@ DeepCritical implements a protocol-based search tool system for retrieving evide
 All tools implement the `SearchTool` protocol from `src/tools/base.py`:
-<!--codeinclude-->
-[SearchTool Protocol](../src/tools/base.py) start_line:8 end_line:31
-<!--/codeinclude-->
 ## Rate Limiting
 All tools use the `@retry` decorator from tenacity:
-<!--codeinclude-->
-[Retry Decorator Pattern](../src/tools/pubmed.py) start_line:46 end_line:50
-<!--/codeinclude-->
 Tools with API rate limits implement `_rate_limit()` method and use shared rate limiters from `src/tools/rate_limiter.py`.
@@ -117,23 +130,11 @@ Missing fields are handled gracefully with defaults.
 **Purpose**: Orchestrates parallel searches across multiple tools
-**Initialization Parameters**:
-- `tools: list[SearchTool]`: List of search tools to use
-- `timeout: float = 30.0`: Timeout for each search in seconds
-- `include_rag: bool = False`: Whether to include RAG tool in searches
-- `auto_ingest_to_rag: bool = True`: Whether to automatically ingest results into RAG
-- `oauth_token: str | None = None`: Optional OAuth token from HuggingFace login (for RAG LLM)
-**Methods**:
-- `async def execute(query: str, max_results_per_tool: int = 10) -> SearchResult`: Execute search across all tools in parallel
 **Features**:
-- Uses `asyncio.gather()` with `return_exceptions=True` for parallel execution
-- Aggregates results into `SearchResult` with evidence and metadata
-- Handles tool failures gracefully (continues with other tools)
 - Deduplicates results by URL
-- Automatically ingests results into RAG if `auto_ingest_to_rag=True`
-- Can add RAG tool dynamically via `add_rag_tool()` method
 ## Tool Registration
@@ -143,21 +144,14 @@ Tools are registered in the search handler:
 from src.tools.pubmed import PubMedTool
 from src.tools.clinicaltrials import ClinicalTrialsTool
 from src.tools.europepmc import EuropePMCTool
-from src.tools.search_handler import SearchHandler
 search_handler = SearchHandler(
     tools=[
         PubMedTool(),
         ClinicalTrialsTool(),
         EuropePMCTool(),
-    ],
-    include_rag=True,  # Include RAG tool for semantic search
-    auto_ingest_to_rag=True,  # Automatically ingest results into RAG
-    oauth_token=token  # Optional HuggingFace token for RAG LLM
 )
-# Execute search
-result = await search_handler.execute("query", max_results_per_tool=10)
 ```
 ## See Also
@@ -165,3 +159,13 @@ result = await search_handler.execute("query", max_results_per_tool=10)
 - [Services](services.md) - RAG and embedding services
 - [API Reference - Tools](../api/tools.md) - API documentation
 - [Contributing - Implementation Patterns](../contributing/implementation-patterns.md) - Development guidelines

 All tools implement the `SearchTool` protocol from `src/tools/base.py`:
+```python
+class SearchTool(Protocol):
+    @property
+    def name(self) -> str: ...
+    async def search(
+        self,
+        query: str,
+        max_results: int = 10
+    ) -> list[Evidence]: ...
+```
 ## Rate Limiting
 All tools use the `@retry` decorator from tenacity:
+```python
+@retry(
+    stop=stop_after_attempt(3),
+    wait=wait_exponential(...)
+)
+async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
+    # Implementation
+```
 Tools with API rate limits implement `_rate_limit()` method and use shared rate limiters from `src/tools/rate_limiter.py`.
 **Purpose**: Orchestrates parallel searches across multiple tools
 **Features**:
+- Uses `asyncio.gather()` with `return_exceptions=True`
+- Aggregates results into `SearchResult`
+- Handles tool failures gracefully
 - Deduplicates results by URL
 ## Tool Registration
 from src.tools.pubmed import PubMedTool
 from src.tools.clinicaltrials import ClinicalTrialsTool
 from src.tools.europepmc import EuropePMCTool
 search_handler = SearchHandler(
     tools=[
         PubMedTool(),
         ClinicalTrialsTool(),
         EuropePMCTool(),
+    ]
 )
 ```
 ## See Also
 - [Services](services.md) - RAG and embedding services
 - [API Reference - Tools](../api/tools.md) - API documentation
 - [Contributing - Implementation Patterns](../contributing/implementation-patterns.md) - Development guidelines

docs/architecture/workflow-diagrams.md CHANGED Viewed

@@ -627,10 +627,23 @@ gantt
 ## Implementation Highlights
 **Simple 4-Agent Setup:**
-<!--codeinclude-->
-[Magentic Workflow Builder](../src/orchestrator_magentic.py) start_line:72 end_line:99
-<!--/codeinclude-->
 **Manager handles quality assessment in its instructions:**
 - Checks hypothesis quality (testable, novel, clear)
@@ -651,5 +664,7 @@ No separate Judge Agent needed - manager does it all!
 ## See Also
 - [Orchestrators](orchestrators.md) - Overview of all orchestrator patterns
-- [Graph Orchestration](graph_orchestration.md) - Graph-based execution overview
 - [API Reference - Orchestrators](../api/orchestrators.md) - API documentation

 ## Implementation Highlights
 **Simple 4-Agent Setup:**
+```python
+workflow = (
+    MagenticBuilder()
+    .participants(
+        hypothesis=HypothesisAgent(tools=[background_tool]),
+        search=SearchAgent(tools=[web_search, rag_tool]),
+        analysis=AnalysisAgent(tools=[code_execution]),
+        report=ReportAgent(tools=[code_execution, visualization])
+    )
+    .with_standard_manager(
+        chat_client=AnthropicClient(model="claude-sonnet-4"),
+        max_round_count=15,    # Prevent infinite loops
+        max_stall_count=3      # Detect stuck workflows
+    )
+    .build()
+)
+```
 **Manager handles quality assessment in its instructions:**
 - Checks hypothesis quality (testable, novel, clear)
 ## See Also
 - [Orchestrators](orchestrators.md) - Overview of all orchestrator patterns
+- [Graph Orchestration](graph-orchestration.md) - Graph-based execution overview
+- [Graph Orchestration (Detailed)](graph_orchestration.md) - Detailed graph architecture
+- [Workflows](workflows.md) - Workflow patterns summary
 - [API Reference - Orchestrators](../api/orchestrators.md) - API documentation

docs/architecture/workflows.md ADDED Viewed

	@@ -0,0 +1,662 @@

+# DeepCritical Workflow - Simplified Magentic Architecture
+> **Architecture Pattern**: Microsoft Magentic Orchestration
+> **Design Philosophy**: Simple, dynamic, manager-driven coordination
+> **Key Innovation**: Intelligent manager replaces rigid sequential phases
+---
+## 1. High-Level Magentic Workflow
+```mermaid
+flowchart TD
+    Start([User Query]) --> Manager[Magentic Manager<br/>Plan • Select • Assess • Adapt]
+    Manager -->|Plans| Task1[Task Decomposition]
+    Task1 --> Manager
+    Manager -->|Selects & Executes| HypAgent[Hypothesis Agent]
+    Manager -->|Selects & Executes| SearchAgent[Search Agent]
+    Manager -->|Selects & Executes| AnalysisAgent[Analysis Agent]
+    Manager -->|Selects & Executes| ReportAgent[Report Agent]
+    HypAgent -->|Results| Manager
+    SearchAgent -->|Results| Manager
+    AnalysisAgent -->|Results| Manager
+    ReportAgent -->|Results| Manager
+    Manager -->|Assesses Quality| Decision{Good Enough?}
+    Decision -->|No - Refine| Manager
+    Decision -->|No - Different Agent| Manager
+    Decision -->|No - Stalled| Replan[Reset Plan]
+    Replan --> Manager
+    Decision -->|Yes| Synthesis[Synthesize Final Result]
+    Synthesis --> Output([Research Report])
+    style Start fill:#e1f5e1
+    style Manager fill:#ffe6e6
+    style HypAgent fill:#fff4e6
+    style SearchAgent fill:#fff4e6
+    style AnalysisAgent fill:#fff4e6
+    style ReportAgent fill:#fff4e6
+    style Decision fill:#ffd6d6
+    style Synthesis fill:#d4edda
+    style Output fill:#e1f5e1
+```
+## 2. Magentic Manager: The 6-Phase Cycle
+```mermaid
+flowchart LR
+    P1[1. Planning<br/>Analyze task<br/>Create strategy] --> P2[2. Agent Selection<br/>Pick best agent<br/>for subtask]
+    P2 --> P3[3. Execution<br/>Run selected<br/>agent with tools]
+    P3 --> P4[4. Assessment<br/>Evaluate quality<br/>Check progress]
+    P4 --> Decision{Quality OK?<br/>Progress made?}
+    Decision -->|Yes| P6[6. Synthesis<br/>Combine results<br/>Generate report]
+    Decision -->|No| P5[5. Iteration<br/>Adjust plan<br/>Try again]
+    P5 --> P2
+    P6 --> Done([Complete])
+    style P1 fill:#fff4e6
+    style P2 fill:#ffe6e6
+    style P3 fill:#e6f3ff
+    style P4 fill:#ffd6d6
+    style P5 fill:#fff3cd
+    style P6 fill:#d4edda
+    style Done fill:#e1f5e1
+```
+## 3. Simplified Agent Architecture
+```mermaid
+graph TB
+    subgraph "Orchestration Layer"
+        Manager[Magentic Manager<br/>• Plans workflow<br/>• Selects agents<br/>• Assesses quality<br/>• Adapts strategy]
+        SharedContext[(Shared Context<br/>• Hypotheses<br/>• Search Results<br/>• Analysis<br/>• Progress)]
+        Manager <--> SharedContext
+    end
+    subgraph "Specialist Agents"
+        HypAgent[Hypothesis Agent<br/>• Domain understanding<br/>• Hypothesis generation<br/>• Testability refinement]
+        SearchAgent[Search Agent<br/>• Multi-source search<br/>• RAG retrieval<br/>• Result ranking]
+        AnalysisAgent[Analysis Agent<br/>• Evidence extraction<br/>• Statistical analysis<br/>• Code execution]
+        ReportAgent[Report Agent<br/>• Report assembly<br/>• Visualization<br/>• Citation formatting]
+    end
+    subgraph "MCP Tools"
+        WebSearch[Web Search<br/>PubMed • arXiv • bioRxiv]
+        CodeExec[Code Execution<br/>Sandboxed Python]
+        RAG[RAG Retrieval<br/>Vector DB • Embeddings]
+        Viz[Visualization<br/>Charts • Graphs]
+    end
+    Manager -->|Selects & Directs| HypAgent
+    Manager -->|Selects & Directs| SearchAgent
+    Manager -->|Selects & Directs| AnalysisAgent
+    Manager -->|Selects & Directs| ReportAgent
+    HypAgent --> SharedContext
+    SearchAgent --> SharedContext
+    AnalysisAgent --> SharedContext
+    ReportAgent --> SharedContext
+    SearchAgent --> WebSearch
+    SearchAgent --> RAG
+    AnalysisAgent --> CodeExec
+    ReportAgent --> CodeExec
+    ReportAgent --> Viz
+    style Manager fill:#ffe6e6
+    style SharedContext fill:#ffe6f0
+    style HypAgent fill:#fff4e6
+    style SearchAgent fill:#fff4e6
+    style AnalysisAgent fill:#fff4e6
+    style ReportAgent fill:#fff4e6
+    style WebSearch fill:#e6f3ff
+    style CodeExec fill:#e6f3ff
+    style RAG fill:#e6f3ff
+    style Viz fill:#e6f3ff
+```
+## 4. Dynamic Workflow Example
+```mermaid
+sequenceDiagram
+    participant User
+    participant Manager
+    participant HypAgent
+    participant SearchAgent
+    participant AnalysisAgent
+    participant ReportAgent
+    User->>Manager: "Research protein folding in Alzheimer's"
+    Note over Manager: PLAN: Generate hypotheses → Search → Analyze → Report
+    Manager->>HypAgent: Generate 3 hypotheses
+    HypAgent-->>Manager: Returns 3 hypotheses
+    Note over Manager: ASSESS: Good quality, proceed
+    Manager->>SearchAgent: Search literature for hypothesis 1
+    SearchAgent-->>Manager: Returns 15 papers
+    Note over Manager: ASSESS: Good results, continue
+    Manager->>SearchAgent: Search for hypothesis 2
+    SearchAgent-->>Manager: Only 2 papers found
+    Note over Manager: ASSESS: Insufficient, refine search
+    Manager->>SearchAgent: Refined query for hypothesis 2
+    SearchAgent-->>Manager: Returns 12 papers
+    Note over Manager: ASSESS: Better, proceed
+    Manager->>AnalysisAgent: Analyze evidence for all hypotheses
+    AnalysisAgent-->>Manager: Returns analysis with code
+    Note over Manager: ASSESS: Complete, generate report
+    Manager->>ReportAgent: Create comprehensive report
+    ReportAgent-->>Manager: Returns formatted report
+    Note over Manager: SYNTHESIZE: Combine all results
+    Manager->>User: Final Research Report
+```
+## 5. Manager Decision Logic
+```mermaid
+flowchart TD
+    Start([Manager Receives Task]) --> Plan[Create Initial Plan]
+    Plan --> Select[Select Agent for Next Subtask]
+    Select --> Execute[Execute Agent]
+    Execute --> Collect[Collect Results]
+    Collect --> Assess[Assess Quality & Progress]
+    Assess --> Q1{Quality Sufficient?}
+    Q1 -->|No| Q2{Same Agent Can Fix?}
+    Q2 -->|Yes| Feedback[Provide Specific Feedback]
+    Feedback --> Execute
+    Q2 -->|No| Different[Try Different Agent]
+    Different --> Select
+    Q1 -->|Yes| Q3{Task Complete?}
+    Q3 -->|No| Q4{Making Progress?}
+    Q4 -->|Yes| Select
+    Q4 -->|No - Stalled| Replan[Reset Plan & Approach]
+    Replan --> Plan
+    Q3 -->|Yes| Synth[Synthesize Final Result]
+    Synth --> Done([Return Report])
+    style Start fill:#e1f5e1
+    style Plan fill:#fff4e6
+    style Select fill:#ffe6e6
+    style Execute fill:#e6f3ff
+    style Assess fill:#ffd6d6
+    style Q1 fill:#ffe6e6
+    style Q2 fill:#ffe6e6
+    style Q3 fill:#ffe6e6
+    style Q4 fill:#ffe6e6
+    style Synth fill:#d4edda
+    style Done fill:#e1f5e1
+```
+## 6. Hypothesis Agent Workflow
+```mermaid
+flowchart LR
+    Input[Research Query] --> Domain[Identify Domain<br/>& Key Concepts]
+    Domain --> Context[Retrieve Background<br/>Knowledge]
+    Context --> Generate[Generate 3-5<br/>Initial Hypotheses]
+    Generate --> Refine[Refine for<br/>Testability]
+    Refine --> Rank[Rank by<br/>Quality Score]
+    Rank --> Output[Return Top<br/>Hypotheses]
+    Output --> Struct[Hypothesis Structure:<br/>• Statement<br/>• Rationale<br/>• Testability Score<br/>• Data Requirements<br/>• Expected Outcomes]
+    style Input fill:#e1f5e1
+    style Output fill:#fff4e6
+    style Struct fill:#e6f3ff
+```
+## 7. Search Agent Workflow
+```mermaid
+flowchart TD
+    Input[Hypotheses] --> Strategy[Formulate Search<br/>Strategy per Hypothesis]
+    Strategy --> Multi[Multi-Source Search]
+    Multi --> PubMed[PubMed Search<br/>via MCP]
+    Multi --> ArXiv[arXiv Search<br/>via MCP]
+    Multi --> BioRxiv[bioRxiv Search<br/>via MCP]
+    PubMed --> Aggregate[Aggregate Results]
+    ArXiv --> Aggregate
+    BioRxiv --> Aggregate
+    Aggregate --> Filter[Filter & Rank<br/>by Relevance]
+    Filter --> Dedup[Deduplicate<br/>Cross-Reference]
+    Dedup --> Embed[Embed Documents<br/>via MCP]
+    Embed --> Vector[(Vector DB)]
+    Vector --> RAGRetrieval[RAG Retrieval<br/>Top-K per Hypothesis]
+    RAGRetrieval --> Output[Return Contextualized<br/>Search Results]
+    style Input fill:#fff4e6
+    style Multi fill:#ffe6e6
+    style Vector fill:#ffe6f0
+    style Output fill:#e6f3ff
+```
+## 8. Analysis Agent Workflow
+```mermaid
+flowchart TD
+    Input1[Hypotheses] --> Extract
+    Input2[Search Results] --> Extract[Extract Evidence<br/>per Hypothesis]
+    Extract --> Methods[Determine Analysis<br/>Methods Needed]
+    Methods --> Branch{Requires<br/>Computation?}
+    Branch -->|Yes| GenCode[Generate Python<br/>Analysis Code]
+    Branch -->|No| Qual[Qualitative<br/>Synthesis]
+    GenCode --> Execute[Execute Code<br/>via MCP Sandbox]
+    Execute --> Interpret1[Interpret<br/>Results]
+    Qual --> Interpret2[Interpret<br/>Findings]
+    Interpret1 --> Synthesize[Synthesize Evidence<br/>Across Sources]
+    Interpret2 --> Synthesize
+    Synthesize --> Verdict[Determine Verdict<br/>per Hypothesis]
+    Verdict --> Support[• Supported<br/>• Refuted<br/>• Inconclusive]
+    Support --> Gaps[Identify Knowledge<br/>Gaps & Limitations]
+    Gaps --> Output[Return Analysis<br/>Report]
+    style Input1 fill:#fff4e6
+    style Input2 fill:#e6f3ff
+    style Execute fill:#ffe6e6
+    style Output fill:#e6ffe6
+```
+## 9. Report Agent Workflow
+```mermaid
+flowchart TD
+    Input1[Query] --> Assemble
+    Input2[Hypotheses] --> Assemble
+    Input3[Search Results] --> Assemble
+    Input4[Analysis] --> Assemble[Assemble Report<br/>Sections]
+    Assemble --> Exec[Executive Summary]
+    Assemble --> Intro[Introduction]
+    Assemble --> Methods[Methods]
+    Assemble --> Results[Results per<br/>Hypothesis]
+    Assemble --> Discussion[Discussion]
+    Assemble --> Future[Future Directions]
+    Assemble --> Refs[References]
+    Results --> VizCheck{Needs<br/>Visualization?}
+    VizCheck -->|Yes| GenViz[Generate Viz Code]
+    GenViz --> ExecViz[Execute via MCP<br/>Create Charts]
+    ExecViz --> Combine
+    VizCheck -->|No| Combine[Combine All<br/>Sections]
+    Exec --> Combine
+    Intro --> Combine
+    Methods --> Combine
+    Discussion --> Combine
+    Future --> Combine
+    Refs --> Combine
+    Combine --> Format[Format Output]
+    Format --> MD[Markdown]
+    Format --> PDF[PDF]
+    Format --> JSON[JSON]
+    MD --> Output[Return Final<br/>Report]
+    PDF --> Output
+    JSON --> Output
+    style Input1 fill:#e1f5e1
+    style Input2 fill:#fff4e6
+    style Input3 fill:#e6f3ff
+    style Input4 fill:#e6ffe6
+    style Output fill:#d4edda
+```
+## 10. Data Flow & Event Streaming
+```mermaid
+flowchart TD
+    User[👤 User] -->|Research Query| UI[Gradio UI]
+    UI -->|Submit| Manager[Magentic Manager]
+    Manager -->|Event: Planning| UI
+    Manager -->|Select Agent| HypAgent[Hypothesis Agent]
+    HypAgent -->|Event: Delta/Message| UI
+    HypAgent -->|Hypotheses| Context[(Shared Context)]
+    Context -->|Retrieved by| Manager
+    Manager -->|Select Agent| SearchAgent[Search Agent]
+    SearchAgent -->|MCP Request| WebSearch[Web Search Tool]
+    WebSearch -->|Results| SearchAgent
+    SearchAgent -->|Event: Delta/Message| UI
+    SearchAgent -->|Documents| Context
+    SearchAgent -->|Embeddings| VectorDB[(Vector DB)]
+    Context -->|Retrieved by| Manager
+    Manager -->|Select Agent| AnalysisAgent[Analysis Agent]
+    AnalysisAgent -->|MCP Request| CodeExec[Code Execution Tool]
+    CodeExec -->|Results| AnalysisAgent
+    AnalysisAgent -->|Event: Delta/Message| UI
+    AnalysisAgent -->|Analysis| Context
+    Context -->|Retrieved by| Manager
+    Manager -->|Select Agent| ReportAgent[Report Agent]
+    ReportAgent -->|MCP Request| CodeExec
+    ReportAgent -->|Event: Delta/Message| UI
+    ReportAgent -->|Report| Context
+    Manager -->|Event: Final Result| UI
+    UI -->|Display| User
+    style User fill:#e1f5e1
+    style UI fill:#e6f3ff
+    style Manager fill:#ffe6e6
+    style Context fill:#ffe6f0
+    style VectorDB fill:#ffe6f0
+    style WebSearch fill:#f0f0f0
+    style CodeExec fill:#f0f0f0
+```
+## 11. MCP Tool Architecture
+```mermaid
+graph TB
+    subgraph "Agent Layer"
+        Manager[Magentic Manager]
+        HypAgent[Hypothesis Agent]
+        SearchAgent[Search Agent]
+        AnalysisAgent[Analysis Agent]
+        ReportAgent[Report Agent]
+    end
+    subgraph "MCP Protocol Layer"
+        Registry[MCP Tool Registry<br/>• Discovers tools<br/>• Routes requests<br/>• Manages connections]
+    end
+    subgraph "MCP Servers"
+        Server1[Web Search Server<br/>localhost:8001<br/>• PubMed<br/>• arXiv<br/>• bioRxiv]
+        Server2[Code Execution Server<br/>localhost:8002<br/>• Sandboxed Python<br/>• Package management]
+        Server3[RAG Server<br/>localhost:8003<br/>• Vector embeddings<br/>• Similarity search]
+        Server4[Visualization Server<br/>localhost:8004<br/>• Chart generation<br/>• Plot rendering]
+    end
+    subgraph "External Services"
+        PubMed[PubMed API]
+        ArXiv[arXiv API]
+        BioRxiv[bioRxiv API]
+        Modal[Modal Sandbox]
+        ChromaDB[(ChromaDB)]
+    end
+    SearchAgent -->|Request| Registry
+    AnalysisAgent -->|Request| Registry
+    ReportAgent -->|Request| Registry
+    Registry --> Server1
+    Registry --> Server2
+    Registry --> Server3
+    Registry --> Server4
+    Server1 --> PubMed
+    Server1 --> ArXiv
+    Server1 --> BioRxiv
+    Server2 --> Modal
+    Server3 --> ChromaDB
+    style Manager fill:#ffe6e6
+    style Registry fill:#fff4e6
+    style Server1 fill:#e6f3ff
+    style Server2 fill:#e6f3ff
+    style Server3 fill:#e6f3ff
+    style Server4 fill:#e6f3ff
+```
+## 12. Progress Tracking & Stall Detection
+```mermaid
+stateDiagram-v2
+    [*] --> Initialization: User Query
+    Initialization --> Planning: Manager starts
+    Planning --> AgentExecution: Select agent
+    AgentExecution --> Assessment: Collect results
+    Assessment --> QualityCheck: Evaluate output
+    QualityCheck --> AgentExecution: Poor quality<br/>(retry < max_rounds)
+    QualityCheck --> Planning: Poor quality<br/>(try different agent)
+    QualityCheck --> NextAgent: Good quality<br/>(task incomplete)
+    QualityCheck --> Synthesis: Good quality<br/>(task complete)
+    NextAgent --> AgentExecution: Select next agent
+    state StallDetection <<choice>>
+    Assessment --> StallDetection: Check progress
+    StallDetection --> Planning: No progress<br/>(stall count < max)
+    StallDetection --> ErrorRecovery: No progress<br/>(max stalls reached)
+    ErrorRecovery --> PartialReport: Generate partial results
+    PartialReport --> [*]
+    Synthesis --> FinalReport: Combine all outputs
+    FinalReport --> [*]
+    note right of QualityCheck
+        Manager assesses:
+        • Output completeness
+        • Quality metrics
+        • Progress made
+    end note
+    note right of StallDetection
+        Stall = no new progress
+        after agent execution
+        Triggers plan reset
+    end note
+```
+## 13. Gradio UI Integration
+```mermaid
+graph TD
+    App[Gradio App<br/>DeepCritical Research Agent]
+    App --> Input[Input Section]
+    App --> Status[Status Section]
+    App --> Output[Output Section]
+    Input --> Query[Research Question<br/>Text Area]
+    Input --> Controls[Controls]
+    Controls --> MaxHyp[Max Hypotheses: 1-10]
+    Controls --> MaxRounds[Max Rounds: 5-20]
+    Controls --> Submit[Start Research Button]
+    Status --> Log[Real-time Event Log<br/>• Manager planning<br/>• Agent selection<br/>• Execution updates<br/>• Quality assessment]
+    Status --> Progress[Progress Tracker<br/>• Current agent<br/>• Round count<br/>• Stall count]
+    Output --> Tabs[Tabbed Results]
+    Tabs --> Tab1[Hypotheses Tab<br/>Generated hypotheses with scores]
+    Tabs --> Tab2[Search Results Tab<br/>Papers & sources found]
+    Tabs --> Tab3[Analysis Tab<br/>Evidence & verdicts]
+    Tabs --> Tab4[Report Tab<br/>Final research report]
+    Tab4 --> Download[Download Report<br/>MD / PDF / JSON]
+    Submit -.->|Triggers| Workflow[Magentic Workflow]
+    Workflow -.->|MagenticOrchestratorMessageEvent| Log
+    Workflow -.->|MagenticAgentDeltaEvent| Log
+    Workflow -.->|MagenticAgentMessageEvent| Log
+    Workflow -.->|MagenticFinalResultEvent| Tab4
+    style App fill:#e1f5e1
+    style Input fill:#fff4e6
+    style Status fill:#e6f3ff
+    style Output fill:#e6ffe6
+    style Workflow fill:#ffe6e6
+```
+## 14. Complete System Context
+```mermaid
+graph LR
+    User[👤 Researcher<br/>Asks research questions] -->|Submits query| DC[DeepCritical<br/>Magentic Workflow]
+    DC -->|Literature search| PubMed[PubMed API<br/>Medical papers]
+    DC -->|Preprint search| ArXiv[arXiv API<br/>Scientific preprints]
+    DC -->|Biology search| BioRxiv[bioRxiv API<br/>Biology preprints]
+    DC -->|Agent reasoning| Claude[Claude API<br/>Sonnet 4 / Opus]
+    DC -->|Code execution| Modal[Modal Sandbox<br/>Safe Python env]
+    DC -->|Vector storage| Chroma[ChromaDB<br/>Embeddings & RAG]
+    DC -->|Deployed on| HF[HuggingFace Spaces<br/>Gradio 6.0]
+    PubMed -->|Results| DC
+    ArXiv -->|Results| DC
+    BioRxiv -->|Results| DC
+    Claude -->|Responses| DC
+    Modal -->|Output| DC
+    Chroma -->|Context| DC
+    DC -->|Research report| User
+    style User fill:#e1f5e1
+    style DC fill:#ffe6e6
+    style PubMed fill:#e6f3ff
+    style ArXiv fill:#e6f3ff
+    style BioRxiv fill:#e6f3ff
+    style Claude fill:#ffd6d6
+    style Modal fill:#f0f0f0
+    style Chroma fill:#ffe6f0
+    style HF fill:#d4edda
+```
+## 15. Workflow Timeline (Simplified)
+```mermaid
+gantt
+    title DeepCritical Magentic Workflow - Typical Execution
+    dateFormat mm:ss
+    axisFormat %M:%S
+    section Manager Planning
+    Initial planning         :p1, 00:00, 10s
+    section Hypothesis Agent
+    Generate hypotheses      :h1, after p1, 30s
+    Manager assessment       :h2, after h1, 5s
+    section Search Agent
+    Search hypothesis 1      :s1, after h2, 20s
+    Search hypothesis 2      :s2, after s1, 20s
+    Search hypothesis 3      :s3, after s2, 20s
+    RAG processing          :s4, after s3, 15s
+    Manager assessment      :s5, after s4, 5s
+    section Analysis Agent
+    Evidence extraction     :a1, after s5, 15s
+    Code generation        :a2, after a1, 20s
+    Code execution         :a3, after a2, 25s
+    Synthesis              :a4, after a3, 20s
+    Manager assessment     :a5, after a4, 5s
+    section Report Agent
+    Report assembly        :r1, after a5, 30s
+    Visualization          :r2, after r1, 15s
+    Formatting             :r3, after r2, 10s
+    section Manager Synthesis
+    Final synthesis        :f1, after r3, 10s
+```
+---
+## Key Differences from Original Design
+| Aspect | Original (Judge-in-Loop) | New (Magentic) |
+|--------|-------------------------|----------------|
+| **Control Flow** | Fixed sequential phases | Dynamic agent selection |
+| **Quality Control** | Separate Judge Agent | Manager assessment built-in |
+| **Retry Logic** | Phase-level with feedback | Agent-level with adaptation |
+| **Flexibility** | Rigid 4-phase pipeline | Adaptive workflow |
+| **Complexity** | 5 agents (including Judge) | 4 agents (no Judge) |
+| **Progress Tracking** | Manual state management | Built-in round/stall detection |
+| **Agent Coordination** | Sequential handoff | Manager-driven dynamic selection |
+| **Error Recovery** | Retry same phase | Try different agent or replan |
+---
+## Simplified Design Principles
+1. **Manager is Intelligent**: LLM-powered manager handles planning, selection, and quality assessment
+2. **No Separate Judge**: Manager's assessment phase replaces dedicated Judge Agent
+3. **Dynamic Workflow**: Agents can be called multiple times in any order based on need
+4. **Built-in Safety**: max_round_count (15) and max_stall_count (3) prevent infinite loops
+5. **Event-Driven UI**: Real-time streaming updates to Gradio interface
+6. **MCP-Powered Tools**: All external capabilities via Model Context Protocol
+7. **Shared Context**: Centralized state accessible to all agents
+8. **Progress Awareness**: Manager tracks what's been done and what's needed
+---
+## Legend
+- 🔴 **Red/Pink**: Manager, orchestration, decision-making
+- 🟡 **Yellow/Orange**: Specialist agents, processing
+- 🔵 **Blue**: Data, tools, MCP services
+- 🟣 **Purple/Pink**: Storage, databases, state
+- 🟢 **Green**: User interactions, final outputs
+- ⚪ **Gray**: External services, APIs
+---
+## Implementation Highlights
+**Simple 4-Agent Setup:**
+```python
+workflow = (
+    MagenticBuilder()
+    .participants(
+        hypothesis=HypothesisAgent(tools=[background_tool]),
+        search=SearchAgent(tools=[web_search, rag_tool]),
+        analysis=AnalysisAgent(tools=[code_execution]),
+        report=ReportAgent(tools=[code_execution, visualization])
+    )
+    .with_standard_manager(
+        chat_client=AnthropicClient(model="claude-sonnet-4"),
+        max_round_count=15,    # Prevent infinite loops
+        max_stall_count=3      # Detect stuck workflows
+    )
+    .build()
+)
+```
+**Manager handles quality assessment in its instructions:**
+- Checks hypothesis quality (testable, novel, clear)
+- Validates search results (relevant, authoritative, recent)
+- Assesses analysis soundness (methodology, evidence, conclusions)
+- Ensures report completeness (all sections, proper citations)
+No separate Judge Agent needed - manager does it all!
+---
+**Document Version**: 2.0 (Magentic Simplified)
+**Last Updated**: 2025-11-24
+**Architecture**: Microsoft Magentic Orchestration Pattern
+**Agents**: 4 (Hypothesis, Search, Analysis, Report) + 1 Manager
+**License**: MIT

docs/configuration/CONFIGURATION.md ADDED Viewed

	@@ -0,0 +1,743 @@

+# Configuration Guide
+## Overview
+DeepCritical uses **Pydantic Settings** for centralized configuration management. All settings are defined in the `Settings` class in `src/utils/config.py` and can be configured via environment variables or a `.env` file.
+The configuration system provides:
+- **Type Safety**: Strongly-typed fields with Pydantic validation
+- **Environment File Support**: Automatically loads from `.env` file (if present)
+- **Case-Insensitive**: Environment variables are case-insensitive
+- **Singleton Pattern**: Global `settings` instance for easy access throughout the codebase
+- **Validation**: Automatic validation on load with helpful error messages
+## Quick Start
+1. Create a `.env` file in the project root
+2. Set at least one LLM API key (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or `HF_TOKEN`)
+3. Optionally configure other services as needed
+4. The application will automatically load and validate your configuration
+## Configuration System Architecture
+### Settings Class
+The `Settings` class extends `BaseSettings` from `pydantic_settings` and defines all application configuration:
+```13:21:src/utils/config.py
+class Settings(BaseSettings):
+    """Strongly-typed application settings."""
+    model_config = SettingsConfigDict(
+        env_file=".env",
+        env_file_encoding="utf-8",
+        case_sensitive=False,
+        extra="ignore",
+    )
+```
+### Singleton Instance
+A global `settings` instance is available for import:
+```234:235:src/utils/config.py
+# Singleton for easy import
+settings = get_settings()
+```
+### Usage Pattern
+Access configuration throughout the codebase:
+```python
+from src.utils.config import settings
+# Check if API keys are available
+if settings.has_openai_key:
+    # Use OpenAI
+    pass
+# Access configuration values
+max_iterations = settings.max_iterations
+web_search_provider = settings.web_search_provider
+```
+## Required Configuration
+### LLM Provider
+You must configure at least one LLM provider. The system supports:
+- **OpenAI**: Requires `OPENAI_API_KEY`
+- **Anthropic**: Requires `ANTHROPIC_API_KEY`
+- **HuggingFace**: Optional `HF_TOKEN` or `HUGGINGFACE_API_KEY` (can work without key for public models)
+#### OpenAI Configuration
+```bash
+LLM_PROVIDER=openai
+OPENAI_API_KEY=your_openai_api_key_here
+OPENAI_MODEL=gpt-5.1
+```
+The default model is defined in the `Settings` class:
+```29:29:src/utils/config.py
+    openai_model: str = Field(default="gpt-5.1", description="OpenAI model name")
+```
+#### Anthropic Configuration
+```bash
+LLM_PROVIDER=anthropic
+ANTHROPIC_API_KEY=your_anthropic_api_key_here
+ANTHROPIC_MODEL=claude-sonnet-4-5-20250929
+```
+The default model is defined in the `Settings` class:
+```30:32:src/utils/config.py
+    anthropic_model: str = Field(
+        default="claude-sonnet-4-5-20250929", description="Anthropic model"
+    )
+```
+#### HuggingFace Configuration
+HuggingFace can work without an API key for public models, but an API key provides higher rate limits:
+```bash
+# Option 1: Using HF_TOKEN (preferred)
+HF_TOKEN=your_huggingface_token_here
+# Option 2: Using HUGGINGFACE_API_KEY (alternative)
+HUGGINGFACE_API_KEY=your_huggingface_api_key_here
+# Default model
+HUGGINGFACE_MODEL=meta-llama/Llama-3.1-8B-Instruct
+```
+The HuggingFace token can be set via either environment variable:
+```33:35:src/utils/config.py
+    hf_token: str | None = Field(
+        default=None, alias="HF_TOKEN", description="HuggingFace API token"
+    )
+```
+```57:59:src/utils/config.py
+    huggingface_api_key: str | None = Field(
+        default=None, description="HuggingFace API token (HF_TOKEN or HUGGINGFACE_API_KEY)"
+    )
+```
+## Optional Configuration
+### Embedding Configuration
+DeepCritical supports multiple embedding providers for semantic search and RAG:
+```bash
+# Embedding Provider: "openai", "local", or "huggingface"
+EMBEDDING_PROVIDER=local
+# OpenAI Embedding Model (used by LlamaIndex RAG)
+OPENAI_EMBEDDING_MODEL=text-embedding-3-small
+# Local Embedding Model (sentence-transformers, used by EmbeddingService)
+LOCAL_EMBEDDING_MODEL=all-MiniLM-L6-v2
+# HuggingFace Embedding Model
+HUGGINGFACE_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
+```
+The embedding provider configuration:
+```47:50:src/utils/config.py
+    embedding_provider: Literal["openai", "local", "huggingface"] = Field(
+        default="local",
+        description="Embedding provider to use",
+    )
+```
+**Note**: OpenAI embeddings require `OPENAI_API_KEY`. The local provider (default) uses sentence-transformers and requires no API key.
+### Web Search Configuration
+DeepCritical supports multiple web search providers:
+```bash
+# Web Search Provider: "serper", "searchxng", "brave", "tavily", or "duckduckgo"
+# Default: "duckduckgo" (no API key required)
+WEB_SEARCH_PROVIDER=duckduckgo
+# Serper API Key (for Google search via Serper)
+SERPER_API_KEY=your_serper_api_key_here
+# SearchXNG Host URL (for self-hosted search)
+SEARCHXNG_HOST=http://localhost:8080
+# Brave Search API Key
+BRAVE_API_KEY=your_brave_api_key_here
+# Tavily API Key
+TAVILY_API_KEY=your_tavily_api_key_here
+```
+The web search provider configuration:
+```71:74:src/utils/config.py
+    web_search_provider: Literal["serper", "searchxng", "brave", "tavily", "duckduckgo"] = Field(
+        default="duckduckgo",
+        description="Web search provider to use",
+    )
+```
+**Note**: DuckDuckGo is the default and requires no API key, making it ideal for development and testing.
+### PubMed Configuration
+PubMed search supports optional NCBI API key for higher rate limits:
+```bash
+# NCBI API Key (optional, for higher rate limits: 10 req/sec vs 3 req/sec)
+NCBI_API_KEY=your_ncbi_api_key_here
+```
+The PubMed tool uses this configuration:
+```22:29:src/tools/pubmed.py
+    def __init__(self, api_key: str | None = None) -> None:
+        self.api_key = api_key or settings.ncbi_api_key
+        # Ignore placeholder values from .env.example
+        if self.api_key == "your-ncbi-key-here":
+            self.api_key = None
+        # Use shared rate limiter
+        self._limiter = get_pubmed_limiter(self.api_key)
+```
+### Agent Configuration
+Control agent behavior and research loop execution:
+```bash
+# Maximum iterations per research loop (1-50, default: 10)
+MAX_ITERATIONS=10
+# Search timeout in seconds
+SEARCH_TIMEOUT=30
+# Use graph-based execution for research flows
+USE_GRAPH_EXECUTION=false
+```
+The agent configuration fields:
+```80:85:src/utils/config.py
+    # Agent Configuration
+    max_iterations: int = Field(default=10, ge=1, le=50)
+    search_timeout: int = Field(default=30, description="Seconds to wait for search")
+    use_graph_execution: bool = Field(
+        default=False, description="Use graph-based execution for research flows"
+    )
+```
+### Budget & Rate Limiting Configuration
+Control resource limits for research loops:
+```bash
+# Default token budget per research loop (1000-1000000, default: 100000)
+DEFAULT_TOKEN_LIMIT=100000
+# Default time limit per research loop in minutes (1-120, default: 10)
+DEFAULT_TIME_LIMIT_MINUTES=10
+# Default iterations limit per research loop (1-50, default: 10)
+DEFAULT_ITERATIONS_LIMIT=10
+```
+The budget configuration with validation:
+```87:105:src/utils/config.py
+    # Budget & Rate Limiting Configuration
+    default_token_limit: int = Field(
+        default=100000,
+        ge=1000,
+        le=1000000,
+        description="Default token budget per research loop",
+    )
+    default_time_limit_minutes: int = Field(
+        default=10,
+        ge=1,
+        le=120,
+        description="Default time limit per research loop (minutes)",
+    )
+    default_iterations_limit: int = Field(
+        default=10,
+        ge=1,
+        le=50,
+        description="Default iterations limit per research loop",
+    )
+```
+### RAG Service Configuration
+Configure the Retrieval-Augmented Generation service:
+```bash
+# ChromaDB collection name for RAG
+RAG_COLLECTION_NAME=deepcritical_evidence
+# Number of top results to retrieve from RAG (1-50, default: 5)
+RAG_SIMILARITY_TOP_K=5
+# Automatically ingest evidence into RAG
+RAG_AUTO_INGEST=true
+```
+The RAG configuration:
+```127:141:src/utils/config.py
+    # RAG Service Configuration
+    rag_collection_name: str = Field(
+        default="deepcritical_evidence",
+        description="ChromaDB collection name for RAG",
+    )
+    rag_similarity_top_k: int = Field(
+        default=5,
+        ge=1,
+        le=50,
+        description="Number of top results to retrieve from RAG",
+    )
+    rag_auto_ingest: bool = Field(
+        default=True,
+        description="Automatically ingest evidence into RAG",
+    )
+```
+### ChromaDB Configuration
+Configure the vector database for embeddings and RAG:
+```bash
+# ChromaDB storage path
+CHROMA_DB_PATH=./chroma_db
+# Whether to persist ChromaDB to disk
+CHROMA_DB_PERSIST=true
+# ChromaDB server host (for remote ChromaDB, optional)
+CHROMA_DB_HOST=localhost
+# ChromaDB server port (for remote ChromaDB, optional)
+CHROMA_DB_PORT=8000
+```
+The ChromaDB configuration:
+```113:125:src/utils/config.py
+    chroma_db_path: str = Field(default="./chroma_db", description="ChromaDB storage path")
+    chroma_db_persist: bool = Field(
+        default=True,
+        description="Whether to persist ChromaDB to disk",
+    )
+    chroma_db_host: str | None = Field(
+        default=None,
+        description="ChromaDB server host (for remote ChromaDB)",
+    )
+    chroma_db_port: int | None = Field(
+        default=None,
+        description="ChromaDB server port (for remote ChromaDB)",
+    )
+```
+### External Services
+#### Modal Configuration
+Modal is used for secure sandbox execution of statistical analysis:
+```bash
+# Modal Token ID (for Modal sandbox execution)
+MODAL_TOKEN_ID=your_modal_token_id_here
+# Modal Token Secret
+MODAL_TOKEN_SECRET=your_modal_token_secret_here
+```
+The Modal configuration:
+```110:112:src/utils/config.py
+    # External Services
+    modal_token_id: str | None = Field(default=None, description="Modal token ID")
+    modal_token_secret: str | None = Field(default=None, description="Modal token secret")
+```
+### Logging Configuration
+Configure structured logging:
+```bash
+# Log Level: "DEBUG", "INFO", "WARNING", or "ERROR"
+LOG_LEVEL=INFO
+```
+The logging configuration:
+```107:108:src/utils/config.py
+    # Logging
+    log_level: Literal["DEBUG", "INFO", "WARNING", "ERROR"] = "INFO"
+```
+Logging is configured via the `configure_logging()` function:
+```212:231:src/utils/config.py
+def configure_logging(settings: Settings) -> None:
+    """Configure structured logging with the configured log level."""
+    # Set stdlib logging level from settings
+    logging.basicConfig(
+        level=getattr(logging, settings.log_level),
+        format="%(message)s",
+    )
+    structlog.configure(
+        processors=[
+            structlog.stdlib.filter_by_level,
+            structlog.stdlib.add_logger_name,
+            structlog.stdlib.add_log_level,
+            structlog.processors.TimeStamper(fmt="iso"),
+            structlog.processors.JSONRenderer(),
+        ],
+        wrapper_class=structlog.stdlib.BoundLogger,
+        context_class=dict,
+        logger_factory=structlog.stdlib.LoggerFactory(),
+    )
+```
+## Configuration Properties
+The `Settings` class provides helpful properties for checking configuration state:
+### API Key Availability
+Check which API keys are available:
+```171:189:src/utils/config.py
+    @property
+    def has_openai_key(self) -> bool:
+        """Check if OpenAI API key is available."""
+        return bool(self.openai_api_key)
+    @property
+    def has_anthropic_key(self) -> bool:
+        """Check if Anthropic API key is available."""
+        return bool(self.anthropic_api_key)
+    @property
+    def has_huggingface_key(self) -> bool:
+        """Check if HuggingFace API key is available."""
+        return bool(self.huggingface_api_key or self.hf_token)
+    @property
+    def has_any_llm_key(self) -> bool:
+        """Check if any LLM API key is available."""
+        return self.has_openai_key or self.has_anthropic_key or self.has_huggingface_key
+```
+**Usage:**
+```python
+from src.utils.config import settings
+# Check API key availability
+if settings.has_openai_key:
+    # Use OpenAI
+    pass
+if settings.has_anthropic_key:
+    # Use Anthropic
+    pass
+if settings.has_huggingface_key:
+    # Use HuggingFace
+    pass
+if settings.has_any_llm_key:
+    # At least one LLM is available
+    pass
+```
+### Service Availability
+Check if external services are configured:
+```143:146:src/utils/config.py
+    @property
+    def modal_available(self) -> bool:
+        """Check if Modal credentials are configured."""
+        return bool(self.modal_token_id and self.modal_token_secret)
+```
+```191:204:src/utils/config.py
+    @property
+    def web_search_available(self) -> bool:
+        """Check if web search is available (either no-key provider or API key present)."""
+        if self.web_search_provider == "duckduckgo":
+            return True  # No API key required
+        if self.web_search_provider == "serper":
+            return bool(self.serper_api_key)
+        if self.web_search_provider == "searchxng":
+            return bool(self.searchxng_host)
+        if self.web_search_provider == "brave":
+            return bool(self.brave_api_key)
+        if self.web_search_provider == "tavily":
+            return bool(self.tavily_api_key)
+        return False
+```
+**Usage:**
+```python
+from src.utils.config import settings
+# Check service availability
+if settings.modal_available:
+    # Use Modal sandbox
+    pass
+if settings.web_search_available:
+    # Web search is configured
+    pass
+```
+### API Key Retrieval
+Get the API key for the configured provider:
+```148:160:src/utils/config.py
+    def get_api_key(self) -> str:
+        """Get the API key for the configured provider."""
+        if self.llm_provider == "openai":
+            if not self.openai_api_key:
+                raise ConfigurationError("OPENAI_API_KEY not set")
+            return self.openai_api_key
+        if self.llm_provider == "anthropic":
+            if not self.anthropic_api_key:
+                raise ConfigurationError("ANTHROPIC_API_KEY not set")
+            return self.anthropic_api_key
+        raise ConfigurationError(f"Unknown LLM provider: {self.llm_provider}")
+```
+For OpenAI-specific operations (e.g., Magentic mode):
+```162:169:src/utils/config.py
+    def get_openai_api_key(self) -> str:
+        """Get OpenAI API key (required for Magentic function calling)."""
+        if not self.openai_api_key:
+            raise ConfigurationError(
+                "OPENAI_API_KEY not set. Magentic mode requires OpenAI for function calling. "
+                "Use mode='simple' for other providers."
+            )
+        return self.openai_api_key
+```
+## Configuration Usage in Codebase
+The configuration system is used throughout the codebase:
+### LLM Factory
+The LLM factory uses settings to create appropriate models:
+```129:144:src/utils/llm_factory.py
+    if settings.llm_provider == "huggingface":
+        model_name = settings.huggingface_model or "meta-llama/Llama-3.1-8B-Instruct"
+        hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
+        return HuggingFaceModel(model_name, provider=hf_provider)
+    if settings.llm_provider == "openai":
+        if not settings.openai_api_key:
+            raise ConfigurationError("OPENAI_API_KEY not set for pydantic-ai")
+        provider = OpenAIProvider(api_key=settings.openai_api_key)
+        return OpenAIModel(settings.openai_model, provider=provider)
+    if settings.llm_provider == "anthropic":
+        if not settings.anthropic_api_key:
+            raise ConfigurationError("ANTHROPIC_API_KEY not set for pydantic-ai")
+        anthropic_provider = AnthropicProvider(api_key=settings.anthropic_api_key)
+        return AnthropicModel(settings.anthropic_model, provider=anthropic_provider)
+```
+### Embedding Service
+The embedding service uses local embedding model configuration:
+```29:31:src/services/embeddings.py
+    def __init__(self, model_name: str | None = None):
+        self._model_name = model_name or settings.local_embedding_model
+        self._model = SentenceTransformer(self._model_name)
+```
+### Orchestrator Factory
+The orchestrator factory uses settings to determine mode:
+```69:80:src/orchestrator_factory.py
+def _determine_mode(explicit_mode: str | None) -> str:
+    """Determine which mode to use."""
+    if explicit_mode:
+        if explicit_mode in ("magentic", "advanced"):
+            return "advanced"
+        return "simple"
+    # Auto-detect: advanced if paid API key available
+    if settings.has_openai_key:
+        return "advanced"
+    return "simple"
+```
+## Environment Variables Reference
+### Required (at least one LLM)
+- `OPENAI_API_KEY` - OpenAI API key (required for OpenAI provider)
+- `ANTHROPIC_API_KEY` - Anthropic API key (required for Anthropic provider)
+- `HF_TOKEN` or `HUGGINGFACE_API_KEY` - HuggingFace API token (optional, can work without for public models)
+#### LLM Configuration Variables
+- `LLM_PROVIDER` - Provider to use: `"openai"`, `"anthropic"`, or `"huggingface"` (default: `"huggingface"`)
+- `OPENAI_MODEL` - OpenAI model name (default: `"gpt-5.1"`)
+- `ANTHROPIC_MODEL` - Anthropic model name (default: `"claude-sonnet-4-5-20250929"`)
+- `HUGGINGFACE_MODEL` - HuggingFace model ID (default: `"meta-llama/Llama-3.1-8B-Instruct"`)
+#### Embedding Configuration Variables
+- `EMBEDDING_PROVIDER` - Provider: `"openai"`, `"local"`, or `"huggingface"` (default: `"local"`)
+- `OPENAI_EMBEDDING_MODEL` - OpenAI embedding model (default: `"text-embedding-3-small"`)
+- `LOCAL_EMBEDDING_MODEL` - Local sentence-transformers model (default: `"all-MiniLM-L6-v2"`)
+- `HUGGINGFACE_EMBEDDING_MODEL` - HuggingFace embedding model (default: `"sentence-transformers/all-MiniLM-L6-v2"`)
+#### Web Search Configuration Variables
+- `WEB_SEARCH_PROVIDER` - Provider: `"serper"`, `"searchxng"`, `"brave"`, `"tavily"`, or `"duckduckgo"` (default: `"duckduckgo"`)
+- `SERPER_API_KEY` - Serper API key (required for Serper provider)
+- `SEARCHXNG_HOST` - SearchXNG host URL (required for SearchXNG provider)
+- `BRAVE_API_KEY` - Brave Search API key (required for Brave provider)
+- `TAVILY_API_KEY` - Tavily API key (required for Tavily provider)
+#### PubMed Configuration Variables
+- `NCBI_API_KEY` - NCBI API key (optional, increases rate limit from 3 to 10 req/sec)
+#### Agent Configuration Variables
+- `MAX_ITERATIONS` - Maximum iterations per research loop (1-50, default: `10`)
+- `SEARCH_TIMEOUT` - Search timeout in seconds (default: `30`)
+- `USE_GRAPH_EXECUTION` - Use graph-based execution (default: `false`)
+#### Budget Configuration Variables
+- `DEFAULT_TOKEN_LIMIT` - Default token budget per research loop (1000-1000000, default: `100000`)
+- `DEFAULT_TIME_LIMIT_MINUTES` - Default time limit in minutes (1-120, default: `10`)
+- `DEFAULT_ITERATIONS_LIMIT` - Default iterations limit (1-50, default: `10`)
+#### RAG Configuration Variables
+- `RAG_COLLECTION_NAME` - ChromaDB collection name (default: `"deepcritical_evidence"`)
+- `RAG_SIMILARITY_TOP_K` - Number of top results to retrieve (1-50, default: `5`)
+- `RAG_AUTO_INGEST` - Automatically ingest evidence into RAG (default: `true`)
+#### ChromaDB Configuration Variables
+- `CHROMA_DB_PATH` - ChromaDB storage path (default: `"./chroma_db"`)
+- `CHROMA_DB_PERSIST` - Whether to persist ChromaDB to disk (default: `true`)
+- `CHROMA_DB_HOST` - ChromaDB server host (optional, for remote ChromaDB)
+- `CHROMA_DB_PORT` - ChromaDB server port (optional, for remote ChromaDB)
+#### External Services Variables
+- `MODAL_TOKEN_ID` - Modal token ID (optional, for Modal sandbox execution)
+- `MODAL_TOKEN_SECRET` - Modal token secret (optional, for Modal sandbox execution)
+#### Logging Configuration Variables
+- `LOG_LEVEL` - Log level: `"DEBUG"`, `"INFO"`, `"WARNING"`, or `"ERROR"` (default: `"INFO"`)
+## Validation
+Settings are validated on load using Pydantic validation:
+- **Type Checking**: All fields are strongly typed
+- **Range Validation**: Numeric fields have min/max constraints (e.g., `ge=1, le=50` for `max_iterations`)
+- **Literal Validation**: Enum fields only accept specific values (e.g., `Literal["openai", "anthropic", "huggingface"]`)
+- **Required Fields**: API keys are checked when accessed via `get_api_key()` or `get_openai_api_key()`
+### Validation Examples
+The `max_iterations` field has range validation:
+```81:81:src/utils/config.py
+    max_iterations: int = Field(default=10, ge=1, le=50)
+```
+The `llm_provider` field has literal validation:
+```26:28:src/utils/config.py
+    llm_provider: Literal["openai", "anthropic", "huggingface"] = Field(
+        default="openai", description="Which LLM provider to use"
+    )
+```
+## Error Handling
+Configuration errors raise `ConfigurationError` from `src/utils/exceptions.py`:
+```22:25:src/utils/exceptions.py
+class ConfigurationError(DeepCriticalError):
+    """Raised when configuration is invalid."""
+    pass
+```
+### Error Handling Example
+```python
+from src.utils.config import settings
+from src.utils.exceptions import ConfigurationError
+try:
+    api_key = settings.get_api_key()
+except ConfigurationError as e:
+    print(f"Configuration error: {e}")
+```
+### Common Configuration Errors
+1. **Missing API Key**: When `get_api_key()` is called but the required API key is not set
+2. **Invalid Provider**: When `llm_provider` is set to an unsupported value
+3. **Out of Range**: When numeric values exceed their min/max constraints
+4. **Invalid Literal**: When enum fields receive unsupported values
+## Configuration Best Practices
+1. **Use `.env` File**: Store sensitive keys in `.env` file (add to `.gitignore`)
+2. **Check Availability**: Use properties like `has_openai_key` before accessing API keys
+3. **Handle Errors**: Always catch `ConfigurationError` when calling `get_api_key()`
+4. **Validate Early**: Configuration is validated on import, so errors surface immediately
+5. **Use Defaults**: Leverage sensible defaults for optional configuration
+## Future Enhancements
+The following configurations are planned for future phases:
+1. **Additional LLM Providers**: DeepSeek, OpenRouter, Gemini, Perplexity, Azure OpenAI, Local models
+2. **Model Selection**: Reasoning/main/fast model configuration
+3. **Service Integration**: Additional service integrations and configurations

docs/configuration/index.md CHANGED Viewed

@@ -25,9 +25,17 @@ The configuration system provides:
 The [`Settings`][settings-class] class extends `BaseSettings` from `pydantic_settings` and defines all application configuration:
-<!--codeinclude-->
-[Settings Class Definition](../src/utils/config.py) start_line:13 end_line:21
-<!--/codeinclude-->
 [View source](https://github.com/DeepCritical/GradioDemo/blob/main/src/utils/config.py#L13-L21)
@@ -35,9 +43,10 @@ The [`Settings`][settings-class] class extends `BaseSettings` from `pydantic_set
 A global `settings` instance is available for import:
-<!--codeinclude-->
-[Singleton Instance](../src/utils/config.py) start_line:234 end_line:235
-<!--/codeinclude-->
 [View source](https://github.com/DeepCritical/GradioDemo/blob/main/src/utils/config.py#L234-L235)
@@ -78,9 +87,9 @@ OPENAI_MODEL=gpt-5.1
 The default model is defined in the `Settings` class:
-<!--codeinclude-->
-[OpenAI Model Configuration](../src/utils/config.py) start_line:29 end_line:29
-<!--/codeinclude-->
 #### Anthropic Configuration
@@ -92,9 +101,11 @@ ANTHROPIC_MODEL=claude-sonnet-4-5-20250929
 The default model is defined in the `Settings` class:
-<!--codeinclude-->
-[Anthropic Model Configuration](../src/utils/config.py) start_line:30 end_line:32
-<!--/codeinclude-->
 #### HuggingFace Configuration
@@ -113,13 +124,17 @@ HUGGINGFACE_MODEL=meta-llama/Llama-3.1-8B-Instruct
 The HuggingFace token can be set via either environment variable:
-<!--codeinclude-->
-[HuggingFace Token Configuration](../src/utils/config.py) start_line:33 end_line:35
-<!--/codeinclude-->
-<!--codeinclude-->
-[HuggingFace API Key Configuration](../src/utils/config.py) start_line:57 end_line:59
-<!--/codeinclude-->
 ## Optional Configuration
@@ -143,9 +158,12 @@ HUGGINGFACE_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
 The embedding provider configuration:
-<!--codeinclude-->
-[Embedding Provider Configuration](../src/utils/config.py) start_line:47 end_line:50
-<!--/codeinclude-->
 **Note**: OpenAI embeddings require `OPENAI_API_KEY`. The local provider (default) uses sentence-transformers and requires no API key.
@@ -173,9 +191,12 @@ TAVILY_API_KEY=your_tavily_api_key_here
 The web search provider configuration:
-<!--codeinclude-->
-[Web Search Provider Configuration](../src/utils/config.py) start_line:71 end_line:74
-<!--/codeinclude-->
 **Note**: DuckDuckGo is the default and requires no API key, making it ideal for development and testing.
@@ -190,9 +211,16 @@ NCBI_API_KEY=your_ncbi_api_key_here
 The PubMed tool uses this configuration:
-<!--codeinclude-->
-[PubMed Tool Configuration](../src/tools/pubmed.py) start_line:22 end_line:29
-<!--/codeinclude-->
 ### Agent Configuration
@@ -211,9 +239,14 @@ USE_GRAPH_EXECUTION=false
 The agent configuration fields:
-<!--codeinclude-->
-[Agent Configuration](../src/utils/config.py) start_line:80 end_line:85
-<!--/codeinclude-->
 ### Budget & Rate Limiting Configuration
@@ -232,9 +265,27 @@ DEFAULT_ITERATIONS_LIMIT=10
 The budget configuration with validation:
-<!--codeinclude-->
-[Budget Configuration](../src/utils/config.py) start_line:87 end_line:105
-<!--/codeinclude-->
 ### RAG Service Configuration
@@ -253,9 +304,23 @@ RAG_AUTO_INGEST=true
 The RAG configuration:
-<!--codeinclude-->
-[RAG Service Configuration](../src/utils/config.py) start_line:127 end_line:141
-<!--/codeinclude-->
 ### ChromaDB Configuration
@@ -277,9 +342,21 @@ CHROMA_DB_PORT=8000
 The ChromaDB configuration:
-<!--codeinclude-->
-[ChromaDB Configuration](../src/utils/config.py) start_line:113 end_line:125
-<!--/codeinclude-->
 ### External Services
@@ -297,9 +374,11 @@ MODAL_TOKEN_SECRET=your_modal_token_secret_here
 The Modal configuration:
-<!--codeinclude-->
-[Modal Configuration](../src/utils/config.py) start_line:110 end_line:112
-<!--/codeinclude-->
 ### Logging Configuration
@@ -312,15 +391,35 @@ LOG_LEVEL=INFO
 The logging configuration:
-<!--codeinclude-->
-[Logging Configuration](../src/utils/config.py) start_line:107 end_line:108
-<!--/codeinclude-->
 Logging is configured via the `configure_logging()` function:
-<!--codeinclude-->
-[Configure Logging Function](../src/utils/config.py) start_line:212 end_line:231
-<!--/codeinclude-->
 ## Configuration Properties
@@ -330,9 +429,27 @@ The `Settings` class provides helpful properties for checking configuration stat
 Check which API keys are available:
-<!--codeinclude-->
-[API Key Availability Properties](../src/utils/config.py) start_line:171 end_line:189
-<!--/codeinclude-->
 **Usage:**
@@ -361,13 +478,29 @@ if settings.has_any_llm_key:
 Check if external services are configured:
-<!--codeinclude-->
-[Modal Availability Property](../src/utils/config.py) start_line:143 end_line:146
-<!--/codeinclude-->
-<!--codeinclude-->
-[Web Search Availability Property](../src/utils/config.py) start_line:191 end_line:204
-<!--/codeinclude-->
 **Usage:**
@@ -388,15 +521,34 @@ if settings.web_search_available:
 Get the API key for the configured provider:
-<!--codeinclude-->
-[Get API Key Method](../src/utils/config.py) start_line:148 end_line:160
-<!--/codeinclude-->
 For OpenAI-specific operations (e.g., Magentic mode):
-<!--codeinclude-->
-[Get OpenAI API Key Method](../src/utils/config.py) start_line:162 end_line:169
-<!--/codeinclude-->
 ## Configuration Usage in Codebase
@@ -406,25 +558,53 @@ The configuration system is used throughout the codebase:
 The LLM factory uses settings to create appropriate models:
-<!--codeinclude-->
-[LLM Factory Usage](../src/utils/llm_factory.py) start_line:129 end_line:144
-<!--/codeinclude-->
 ### Embedding Service
 The embedding service uses local embedding model configuration:
-<!--codeinclude-->
-[Embedding Service Usage](../src/services/embeddings.py) start_line:29 end_line:31
-<!--/codeinclude-->
 ### Orchestrator Factory
 The orchestrator factory uses settings to determine mode:
-<!--codeinclude-->
-[Orchestrator Factory Mode Detection](../src/orchestrator_factory.py) start_line:69 end_line:80
-<!--/codeinclude-->
 ## Environment Variables Reference
@@ -507,15 +687,17 @@ Settings are validated on load using Pydantic validation:
 The `max_iterations` field has range validation:
-<!--codeinclude-->
-[Max Iterations Validation](../src/utils/config.py) start_line:81 end_line:81
-<!--/codeinclude-->
 The `llm_provider` field has literal validation:
-<!--codeinclude-->
-[LLM Provider Literal Validation](../src/utils/config.py) start_line:26 end_line:28
-<!--/codeinclude-->
 ## Error Handling

 The [`Settings`][settings-class] class extends `BaseSettings` from `pydantic_settings` and defines all application configuration:
+```13:21:src/utils/config.py
+class Settings(BaseSettings):
+    """Strongly-typed application settings."""
+    model_config = SettingsConfigDict(
+        env_file=".env",
+        env_file_encoding="utf-8",
+        case_sensitive=False,
+        extra="ignore",
+    )
+```
 [View source](https://github.com/DeepCritical/GradioDemo/blob/main/src/utils/config.py#L13-L21)
 A global `settings` instance is available for import:
+```234:235:src/utils/config.py
+# Singleton for easy import
+settings = get_settings()
+```
 [View source](https://github.com/DeepCritical/GradioDemo/blob/main/src/utils/config.py#L234-L235)
 The default model is defined in the `Settings` class:
+```29:29:src/utils/config.py
+    openai_model: str = Field(default="gpt-5.1", description="OpenAI model name")
+```
 #### Anthropic Configuration
 The default model is defined in the `Settings` class:
+```30:32:src/utils/config.py
+    anthropic_model: str = Field(
+        default="claude-sonnet-4-5-20250929", description="Anthropic model"
+    )
+```
 #### HuggingFace Configuration
 The HuggingFace token can be set via either environment variable:
+```33:35:src/utils/config.py
+    hf_token: str | None = Field(
+        default=None, alias="HF_TOKEN", description="HuggingFace API token"
+    )
+```
+```57:59:src/utils/config.py
+    huggingface_api_key: str | None = Field(
+        default=None, description="HuggingFace API token (HF_TOKEN or HUGGINGFACE_API_KEY)"
+    )
+```
 ## Optional Configuration
 The embedding provider configuration:
+```47:50:src/utils/config.py
+    embedding_provider: Literal["openai", "local", "huggingface"] = Field(
+        default="local",
+        description="Embedding provider to use",
+    )
+```
 **Note**: OpenAI embeddings require `OPENAI_API_KEY`. The local provider (default) uses sentence-transformers and requires no API key.
 The web search provider configuration:
+```71:74:src/utils/config.py
+    web_search_provider: Literal["serper", "searchxng", "brave", "tavily", "duckduckgo"] = Field(
+        default="duckduckgo",
+        description="Web search provider to use",
+    )
+```
 **Note**: DuckDuckGo is the default and requires no API key, making it ideal for development and testing.
 The PubMed tool uses this configuration:
+```22:29:src/tools/pubmed.py
+    def __init__(self, api_key: str | None = None) -> None:
+        self.api_key = api_key or settings.ncbi_api_key
+        # Ignore placeholder values from .env.example
+        if self.api_key == "your-ncbi-key-here":
+            self.api_key = None
+        # Use shared rate limiter
+        self._limiter = get_pubmed_limiter(self.api_key)
+```
 ### Agent Configuration
 The agent configuration fields:
+```80:85:src/utils/config.py
+    # Agent Configuration
+    max_iterations: int = Field(default=10, ge=1, le=50)
+    search_timeout: int = Field(default=30, description="Seconds to wait for search")
+    use_graph_execution: bool = Field(
+        default=False, description="Use graph-based execution for research flows"
+    )
+```
 ### Budget & Rate Limiting Configuration
 The budget configuration with validation:
+```87:105:src/utils/config.py
+    # Budget & Rate Limiting Configuration
+    default_token_limit: int = Field(
+        default=100000,
+        ge=1000,
+        le=1000000,
+        description="Default token budget per research loop",
+    )
+    default_time_limit_minutes: int = Field(
+        default=10,
+        ge=1,
+        le=120,
+        description="Default time limit per research loop (minutes)",
+    )
+    default_iterations_limit: int = Field(
+        default=10,
+        ge=1,
+        le=50,
+        description="Default iterations limit per research loop",
+    )
+```
 ### RAG Service Configuration
 The RAG configuration:
+```127:141:src/utils/config.py
+    # RAG Service Configuration
+    rag_collection_name: str = Field(
+        default="deepcritical_evidence",
+        description="ChromaDB collection name for RAG",
+    )
+    rag_similarity_top_k: int = Field(
+        default=5,
+        ge=1,
+        le=50,
+        description="Number of top results to retrieve from RAG",
+    )
+    rag_auto_ingest: bool = Field(
+        default=True,
+        description="Automatically ingest evidence into RAG",
+    )
+```
 ### ChromaDB Configuration
 The ChromaDB configuration:
+```113:125:src/utils/config.py
+    chroma_db_path: str = Field(default="./chroma_db", description="ChromaDB storage path")
+    chroma_db_persist: bool = Field(
+        default=True,
+        description="Whether to persist ChromaDB to disk",
+    )
+    chroma_db_host: str | None = Field(
+        default=None,
+        description="ChromaDB server host (for remote ChromaDB)",
+    )
+    chroma_db_port: int | None = Field(
+        default=None,
+        description="ChromaDB server port (for remote ChromaDB)",
+    )
+```
 ### External Services
 The Modal configuration:
+```110:112:src/utils/config.py
+    # External Services
+    modal_token_id: str | None = Field(default=None, description="Modal token ID")
+    modal_token_secret: str | None = Field(default=None, description="Modal token secret")
+```
 ### Logging Configuration
 The logging configuration:
+```107:108:src/utils/config.py
+    # Logging
+    log_level: Literal["DEBUG", "INFO", "WARNING", "ERROR"] = "INFO"
+```
 Logging is configured via the `configure_logging()` function:
+```212:231:src/utils/config.py
+def configure_logging(settings: Settings) -> None:
+    """Configure structured logging with the configured log level."""
+    # Set stdlib logging level from settings
+    logging.basicConfig(
+        level=getattr(logging, settings.log_level),
+        format="%(message)s",
+    )
+    structlog.configure(
+        processors=[
+            structlog.stdlib.filter_by_level,
+            structlog.stdlib.add_logger_name,
+            structlog.stdlib.add_log_level,
+            structlog.processors.TimeStamper(fmt="iso"),
+            structlog.processors.JSONRenderer(),
+        ],
+        wrapper_class=structlog.stdlib.BoundLogger,
+        context_class=dict,
+        logger_factory=structlog.stdlib.LoggerFactory(),
+    )
+```
 ## Configuration Properties
 Check which API keys are available:
+```171:189:src/utils/config.py
+    @property
+    def has_openai_key(self) -> bool:
+        """Check if OpenAI API key is available."""
+        return bool(self.openai_api_key)
+    @property
+    def has_anthropic_key(self) -> bool:
+        """Check if Anthropic API key is available."""
+        return bool(self.anthropic_api_key)
+    @property
+    def has_huggingface_key(self) -> bool:
+        """Check if HuggingFace API key is available."""
+        return bool(self.huggingface_api_key or self.hf_token)
+    @property
+    def has_any_llm_key(self) -> bool:
+        """Check if any LLM API key is available."""
+        return self.has_openai_key or self.has_anthropic_key or self.has_huggingface_key
+```
 **Usage:**
 Check if external services are configured:
+```143:146:src/utils/config.py
+    @property
+    def modal_available(self) -> bool:
+        """Check if Modal credentials are configured."""
+        return bool(self.modal_token_id and self.modal_token_secret)
+```
+```191:204:src/utils/config.py
+    @property
+    def web_search_available(self) -> bool:
+        """Check if web search is available (either no-key provider or API key present)."""
+        if self.web_search_provider == "duckduckgo":
+            return True  # No API key required
+        if self.web_search_provider == "serper":
+            return bool(self.serper_api_key)
+        if self.web_search_provider == "searchxng":
+            return bool(self.searchxng_host)
+        if self.web_search_provider == "brave":
+            return bool(self.brave_api_key)
+        if self.web_search_provider == "tavily":
+            return bool(self.tavily_api_key)
+        return False
+```
 **Usage:**
 Get the API key for the configured provider:
+```148:160:src/utils/config.py
+    def get_api_key(self) -> str:
+        """Get the API key for the configured provider."""
+        if self.llm_provider == "openai":
+            if not self.openai_api_key:
+                raise ConfigurationError("OPENAI_API_KEY not set")
+            return self.openai_api_key
+        if self.llm_provider == "anthropic":
+            if not self.anthropic_api_key:
+                raise ConfigurationError("ANTHROPIC_API_KEY not set")
+            return self.anthropic_api_key
+        raise ConfigurationError(f"Unknown LLM provider: {self.llm_provider}")
+```
 For OpenAI-specific operations (e.g., Magentic mode):
+```162:169:src/utils/config.py
+    def get_openai_api_key(self) -> str:
+        """Get OpenAI API key (required for Magentic function calling)."""
+        if not self.openai_api_key:
+            raise ConfigurationError(
+                "OPENAI_API_KEY not set. Magentic mode requires OpenAI for function calling. "
+                "Use mode='simple' for other providers."
+            )
+        return self.openai_api_key
+```
 ## Configuration Usage in Codebase
 The LLM factory uses settings to create appropriate models:
+```129:144:src/utils/llm_factory.py
+    if settings.llm_provider == "huggingface":
+        model_name = settings.huggingface_model or "meta-llama/Llama-3.1-8B-Instruct"
+        hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
+        return HuggingFaceModel(model_name, provider=hf_provider)
+    if settings.llm_provider == "openai":
+        if not settings.openai_api_key:
+            raise ConfigurationError("OPENAI_API_KEY not set for pydantic-ai")
+        provider = OpenAIProvider(api_key=settings.openai_api_key)
+        return OpenAIModel(settings.openai_model, provider=provider)
+    if settings.llm_provider == "anthropic":
+        if not settings.anthropic_api_key:
+            raise ConfigurationError("ANTHROPIC_API_KEY not set for pydantic-ai")
+        anthropic_provider = AnthropicProvider(api_key=settings.anthropic_api_key)
+        return AnthropicModel(settings.anthropic_model, provider=anthropic_provider)
+```
 ### Embedding Service
 The embedding service uses local embedding model configuration:
+```29:31:src/services/embeddings.py
+    def __init__(self, model_name: str | None = None):
+        self._model_name = model_name or settings.local_embedding_model
+        self._model = SentenceTransformer(self._model_name)
+```
 ### Orchestrator Factory
 The orchestrator factory uses settings to determine mode:
+```69:80:src/orchestrator_factory.py
+def _determine_mode(explicit_mode: str | None) -> str:
+    """Determine which mode to use."""
+    if explicit_mode:
+        if explicit_mode in ("magentic", "advanced"):
+            return "advanced"
+        return "simple"
+    # Auto-detect: advanced if paid API key available
+    if settings.has_openai_key:
+        return "advanced"
+    return "simple"
+```
 ## Environment Variables Reference
 The `max_iterations` field has range validation:
+```81:81:src/utils/config.py
+    max_iterations: int = Field(default=10, ge=1, le=50)
+```
 The `llm_provider` field has literal validation:
+```26:28:src/utils/config.py
+    llm_provider: Literal["openai", "anthropic", "huggingface"] = Field(
+        default="openai", description="Which LLM provider to use"
+    )
+```
 ## Error Handling

CONTRIBUTING.md → docs/contributing.md RENAMED Viewed

@@ -1,26 +1,24 @@
-# Contributing to The DETERMINATOR
-Thank you for your interest in contributing to The DETERMINATOR! This guide will help you get started.
 ## Table of Contents
 - [Git Workflow](#git-workflow)
 - [Getting Started](#getting-started)
 - [Development Commands](#development-commands)
 - [MCP Integration](#mcp-integration)
 - [Common Pitfalls](#common-pitfalls)
 - [Key Principles](#key-principles)
 - [Pull Request Process](#pull-request-process)
-> **Note**: Additional sections (Code Style, Error Handling, Testing, Implementation Patterns, Code Quality, and Prompt Engineering) are available as separate pages in the [documentation](https://deepcritical.github.io/GradioDemo/contributing/).
-> **Note on Project Names**: "The DETERMINATOR" is the product name, "DeepCritical" is the organization/project name, and "determinator" is the Python package name.
-## Repository Information
-- **GitHub Repository**: [`DeepCritical/GradioDemo`](https://github.com/DeepCritical/GradioDemo) (source of truth, PRs, code review)
-- **HuggingFace Space**: [`DataQuests/DeepCritical`](https://huggingface.co/spaces/DataQuests/DeepCritical) (deployment/demo)
-- **Package Name**: `determinator` (Python package name in `pyproject.toml`)
 ## Git Workflow
 - `main`: Production-ready (GitHub)
@@ -29,31 +27,9 @@ Thank you for your interest in contributing to The DETERMINATOR! This guide will
 - **NEVER** push directly to `main` or `dev` on HuggingFace
 - GitHub is source of truth; HuggingFace is for deployment
-### Dual Repository Setup
-This project uses a dual repository setup:
-- **GitHub (`DeepCritical/GradioDemo`)**: Source of truth for code, PRs, and code review
-- **HuggingFace (`DataQuests/DeepCritical`)**: Deployment target for the Gradio demo
-#### Remote Configuration
-When cloning, set up remotes as follows:
-```bash
-# Clone from GitHub
-git clone https://github.com/DeepCritical/GradioDemo.git
-cd GradioDemo
-# Add HuggingFace remote (optional, for deployment)
-git remote add huggingface-upstream https://huggingface.co/spaces/DataQuests/DeepCritical
-```
-**Important**: Never push directly to `main` or `dev` on HuggingFace. Always work through GitHub PRs. GitHub is the source of truth; HuggingFace is for deployment/demo only.
 ## Getting Started
-1. **Fork the repository** on GitHub: [`DeepCritical/GradioDemo`](https://github.com/DeepCritical/GradioDemo)
 2. **Clone your fork**:
    ```bash
@@ -64,8 +40,7 @@ git remote add huggingface-upstream https://huggingface.co/spaces/DataQuests/Dee
 3. **Install dependencies**:
    ```bash
-   uv sync --all-extras
-   uv run pre-commit install
    ```
 4. **Create a feature branch**:
@@ -78,9 +53,7 @@ git remote add huggingface-upstream https://huggingface.co/spaces/DataQuests/Dee
 6. **Run checks**:
    ```bash
-   uv run ruff check src tests
-   uv run mypy src
-   uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire
    ```
 7. **Commit and push**:
@@ -89,72 +62,22 @@ git remote add huggingface-upstream https://huggingface.co/spaces/DataQuests/Dee
    git commit -m "Description of changes"
    git push origin yourname-feature-name
    ```
 8. **Create a pull request** on GitHub
-## Package Manager
-This project uses [`uv`](https://github.com/astral-sh/uv) as the package manager. All commands should be prefixed with `uv run` to ensure they run in the correct environment.
-### Installation
-```bash
-# Install uv if you haven't already (recommended: standalone installer)
-# Unix/macOS/Linux:
-curl -LsSf https://astral.sh/uv/install.sh | sh
-# Windows (PowerShell):
-powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
-# Alternative: pipx install uv
-# Or: pip install uv
-# Sync all dependencies including dev extras
-uv sync --all-extras
-# Install pre-commit hooks
-uv run pre-commit install
-```
 ## Development Commands
 ```bash
-# Installation
-uv sync --all-extras              # Install all dependencies including dev
-uv run pre-commit install          # Install pre-commit hooks
-# Code Quality Checks (run all before committing)
-uv run ruff check src tests       # Lint with ruff
-uv run ruff format src tests      # Format with ruff
-uv run mypy src                   # Type checking
-uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire  # Tests with coverage
-# Testing Commands
-uv run pytest tests/unit/ -v -m "not openai" -p no:logfire              # Run unit tests (excludes OpenAI tests)
-uv run pytest tests/ -v -m "huggingface" -p no:logfire                 # Run HuggingFace tests
-uv run pytest tests/ -v -p no:logfire                                  # Run all tests
-uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire  # Tests with terminal coverage
-uv run pytest --cov=src --cov-report=html -p no:logfire                # Generate HTML coverage report (opens htmlcov/index.html)
-# Documentation Commands
-uv run mkdocs build                # Build documentation
-uv run mkdocs serve                # Serve documentation locally (http://127.0.0.1:8000)
 ```
-### Test Markers
-The project uses pytest markers to categorize tests. See [Testing Guidelines](docs/contributing/testing.md) for details:
-- `unit`: Unit tests (mocked, fast)
-- `integration`: Integration tests (real APIs)
-- `slow`: Slow tests
-- `openai`: Tests requiring OpenAI API key
-- `huggingface`: Tests requiring HuggingFace API key
-- `embedding_provider`: Tests requiring API-based embedding providers
-- `local_embeddings`: Tests using local embeddings
-**Note**: The `-p no:logfire` flag disables the logfire plugin to avoid conflicts during testing.
 ## Code Style & Conventions
 ### Type Safety
@@ -163,9 +86,11 @@ The project uses pytest markers to categorize tests. See [Testing Guidelines](do
 - Use `mypy --strict` compliance (no `Any` unless absolutely necessary)
 - Use `TYPE_CHECKING` imports for circular dependencies:
-<!--codeinclude-->
-[TYPE_CHECKING Import Pattern](../src/utils/citation_validator.py) start_line:8 end_line:11
-<!--/codeinclude-->
 ### Pydantic Models
@@ -200,10 +125,10 @@ result = await loop.run_in_executor(None, cpu_bound_function, args)
 ### Pre-commit
-- Pre-commit hooks run automatically on commit
 - Must pass: lint + typecheck + test-cov
-- Install hooks with: `uv run pre-commit install`
-- Note: `uv sync --all-extras` installs the pre-commit package, but you must run `uv run pre-commit install` separately to set up the git hooks
 ## Error Handling & Logging
@@ -211,9 +136,10 @@ result = await loop.run_in_executor(None, cpu_bound_function, args)
 Use custom exception hierarchy (`src/utils/exceptions.py`):
-<!--codeinclude-->
-[Exception Hierarchy](../src/utils/exceptions.py) start_line:4 end_line:31
-<!--/codeinclude-->
 ### Error Handling Rules
@@ -273,7 +199,7 @@ except httpx.HTTPError as e:
 1. Write failing test in `tests/unit/`
 2. Implement in `src/`
 3. Ensure test passes
-4. Run checks: `uv run ruff check src tests && uv run mypy src && uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire`
 ### Test Examples
@@ -294,8 +220,7 @@ async def test_real_pubmed_search():
 ### Test Coverage
-- Run `uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire` for coverage report
-- Run `uv run pytest --cov=src --cov-report=html -p no:logfire` for HTML coverage report (opens `htmlcov/index.html`)
 - Aim for >80% coverage on critical paths
 - Exclude: `__init__.py`, `TYPE_CHECKING` blocks
@@ -339,9 +264,11 @@ class MySearchTool:
 - Lazy initialization for optional dependencies (e.g., embeddings, Modal)
 - Check requirements before initialization:
-<!--codeinclude-->
-[Check Magentic Requirements](../src/utils/llm_factory.py) start_line:152 end_line:170
-<!--/codeinclude-->
 ### State Management
@@ -353,9 +280,11 @@ class MySearchTool:
 Use `@lru_cache(maxsize=1)` for singletons:
-<!--codeinclude-->
-[Singleton Pattern Example](../src/services/statistical_analyzer.py) start_line:252 end_line:255
-<!--/codeinclude-->
 - Lazy initialization to avoid requiring dependencies at import time
@@ -369,9 +298,22 @@ Use `@lru_cache(maxsize=1)` for singletons:
 Example:
-<!--codeinclude-->
-[Search Method Docstring Example](../src/tools/pubmed.py) start_line:51 end_line:58
-<!--/codeinclude-->
 ### Code Comments
@@ -468,7 +410,7 @@ Example:
 ## Pull Request Process
-1. Ensure all checks pass: `uv run ruff check src tests && uv run mypy src && uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire`
 2. Update documentation if needed
 3. Add tests for new features
 4. Update CHANGELOG if applicable
@@ -476,19 +418,11 @@ Example:
 6. Address review feedback
 7. Wait for approval before merging
-## Project Structure
-- `src/`: Main source code
-- `tests/`: Test files (`unit/` and `integration/`)
-- `docs/`: Documentation source files (MkDocs)
-- `examples/`: Example usage scripts
-- `pyproject.toml`: Project configuration and dependencies
-- `.pre-commit-config.yaml`: Pre-commit hook configuration
 ## Questions?
-- Open an issue on [GitHub](https://github.com/DeepCritical/GradioDemo)
-- Check existing [documentation](https://deepcritical.github.io/GradioDemo/)
 - Review code examples in the codebase
-Thank you for contributing to The DETERMINATOR!

+# Contributing to DeepCritical
+Thank you for your interest in contributing to DeepCritical! This guide will help you get started.
 ## Table of Contents
 - [Git Workflow](#git-workflow)
 - [Getting Started](#getting-started)
 - [Development Commands](#development-commands)
+- [Code Style & Conventions](#code-style--conventions)
+- [Type Safety](#type-safety)
+- [Error Handling & Logging](#error-handling--logging)
+- [Testing Requirements](#testing-requirements)
+- [Implementation Patterns](#implementation-patterns)
+- [Code Quality & Documentation](#code-quality--documentation)
+- [Prompt Engineering & Citation Validation](#prompt-engineering--citation-validation)
 - [MCP Integration](#mcp-integration)
 - [Common Pitfalls](#common-pitfalls)
 - [Key Principles](#key-principles)
 - [Pull Request Process](#pull-request-process)
 ## Git Workflow
 - `main`: Production-ready (GitHub)
 - **NEVER** push directly to `main` or `dev` on HuggingFace
 - GitHub is source of truth; HuggingFace is for deployment
 ## Getting Started
+1. **Fork the repository** on GitHub
 2. **Clone your fork**:
    ```bash
 3. **Install dependencies**:
    ```bash
+   make install
    ```
 4. **Create a feature branch**:
 6. **Run checks**:
    ```bash
+   make check
    ```
 7. **Commit and push**:
    git commit -m "Description of changes"
    git push origin yourname-feature-name
    ```
 8. **Create a pull request** on GitHub
 ## Development Commands
 ```bash
+make install      # Install dependencies + pre-commit
+make check        # Lint + typecheck + test (MUST PASS)
+make test         # Run unit tests
+make lint         # Run ruff
+make format       # Format with ruff
+make typecheck    # Run mypy
+make test-cov     # Test with coverage
+make docs-build  # Build documentation
+make docs-serve  # Serve documentation locally
 ```
 ## Code Style & Conventions
 ### Type Safety
 - Use `mypy --strict` compliance (no `Any` unless absolutely necessary)
 - Use `TYPE_CHECKING` imports for circular dependencies:
+```python
+from typing import TYPE_CHECKING
+if TYPE_CHECKING:
+    from src.services.embeddings import EmbeddingService
+```
 ### Pydantic Models
 ### Pre-commit
+- Run `make check` before committing
 - Must pass: lint + typecheck + test-cov
+- Pre-commit hooks installed via `make install`
+- **CRITICAL**: Make sure you run the full pre-commit checks before opening a PR (not draft), otherwise Obstacle is the Way will lose his mind
 ## Error Handling & Logging
 Use custom exception hierarchy (`src/utils/exceptions.py`):
+- `DeepCriticalError` (base)
+- `SearchError` → `RateLimitError`
+- `JudgeError`
+- `ConfigurationError`
 ### Error Handling Rules
 1. Write failing test in `tests/unit/`
 2. Implement in `src/`
 3. Ensure test passes
+4. Run `make check` (lint + typecheck + test)
 ### Test Examples
 ### Test Coverage
+- Run `make test-cov` for coverage report
 - Aim for >80% coverage on critical paths
 - Exclude: `__init__.py`, `TYPE_CHECKING` blocks
 - Lazy initialization for optional dependencies (e.g., embeddings, Modal)
 - Check requirements before initialization:
+```python
+def check_magentic_requirements() -> None:
+    if not settings.has_openai_key:
+        raise ConfigurationError("Magentic requires OpenAI")
+```
 ### State Management
 Use `@lru_cache(maxsize=1)` for singletons:
+```python
+@lru_cache(maxsize=1)
+def get_embedding_service() -> EmbeddingService:
+    return EmbeddingService()
+```
 - Lazy initialization to avoid requiring dependencies at import time
 Example:
+```python
+async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
+    """Search PubMed and return evidence.
+    Args:
+        query: The search query string
+        max_results: Maximum number of results to return
+    Returns:
+        List of Evidence objects
+    Raises:
+        SearchError: If the search fails
+        RateLimitError: If we hit rate limits
+    """
+```
 ### Code Comments
 ## Pull Request Process
+1. Ensure all checks pass: `make check`
 2. Update documentation if needed
 3. Add tests for new features
 4. Update CHANGELOG if applicable
 6. Address review feedback
 7. Wait for approval before merging
 ## Questions?
+- Open an issue on GitHub
+- Check existing documentation
 - Review code examples in the codebase
+Thank you for contributing to DeepCritical!

docs/contributing/code-quality.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Code Quality & Documentation
-This document outlines code quality standards and documentation requirements for The DETERMINATOR.
 ## Linting
@@ -12,9 +12,6 @@ This document outlines code quality standards and documentation requirements for
   - `PLR2004`: Magic values (statistical constants)
   - `PLW0603`: Global statement (singleton pattern)
   - `PLC0415`: Lazy imports for optional dependencies
-  - `E402`: Module level import not at top (needed for pytest.importorskip)
-  - `E501`: Line too long (ignore line length violations)
-  - `RUF100`: Unused noqa (version differences between local/CI)
 ## Type Checking
@@ -25,75 +22,12 @@ This document outlines code quality standards and documentation requirements for
 ## Pre-commit
-Pre-commit hooks run automatically on commit to ensure code quality. Configuration is in `.pre-commit-config.yaml`.
-### Installation
-```bash
-# Install dependencies (includes pre-commit package)
-uv sync --all-extras
-# Set up git hooks (must be run separately)
-uv run pre-commit install
-```
-**Note**: `uv sync --all-extras` installs the pre-commit package, but you must run `uv run pre-commit install` separately to set up the git hooks.
-### Pre-commit Hooks
-The following hooks run automatically on commit:
-1. **ruff**: Lints code and fixes issues automatically
-   - Runs on: `src/` (excludes `tests/`, `reference_repos/`)
-   - Auto-fixes: Yes
-2. **ruff-format**: Formats code with ruff
-   - Runs on: `src/` (excludes `tests/`, `reference_repos/`)
-   - Auto-fixes: Yes
-3. **mypy**: Type checking
-   - Runs on: `src/` (excludes `folder/`)
-   - Additional dependencies: pydantic, pydantic-settings, tenacity, pydantic-ai
-4. **pytest-unit**: Runs unit tests (excludes OpenAI and embedding_provider tests)
-   - Runs: `tests/unit/` with `-m "not openai and not embedding_provider"`
-   - Always runs: Yes (not just on changed files)
-5. **pytest-local-embeddings**: Runs local embedding tests
-   - Runs: `tests/` with `-m "local_embeddings"`
-   - Always runs: Yes
-### Manual Pre-commit Run
-To run pre-commit hooks manually (without committing):
-```bash
-uv run pre-commit run --all-files
-```
-### Troubleshooting
-- **Hooks failing**: Fix the issues shown in the output, then commit again
-- **Skipping hooks**: Use `git commit --no-verify` (not recommended)
-- **Hook not running**: Ensure hooks are installed with `uv run pre-commit install`
-- **Type errors**: Check that all dependencies are installed with `uv sync --all-extras`
 ## Documentation
-### Building Documentation
-Documentation is built using MkDocs. Source files are in `docs/`, and the configuration is in `mkdocs.yml`.
-```bash
-# Build documentation
-uv run mkdocs build
-# Serve documentation locally (http://127.0.0.1:8000)
-uv run mkdocs serve
-```
-The documentation site is published at: <https://deepcritical.github.io/GradioDemo/>
 ### Docstrings
 - Google-style docstrings for all public functions
@@ -102,9 +36,22 @@ The documentation site is published at: <https://deepcritical.github.io/GradioDe
 Example:
-<!--codeinclude-->
-[Search Method Docstring Example](../src/tools/pubmed.py) start_line:51 end_line:70
-<!--/codeinclude-->
 ### Code Comments
@@ -118,3 +65,13 @@ Example:
 - [Code Style](code-style.md) - Code style guidelines
 - [Testing](testing.md) - Testing guidelines

 # Code Quality & Documentation
+This document outlines code quality standards and documentation requirements.
 ## Linting
   - `PLR2004`: Magic values (statistical constants)
   - `PLW0603`: Global statement (singleton pattern)
   - `PLC0415`: Lazy imports for optional dependencies
 ## Type Checking
 ## Pre-commit
+- Run `make check` before committing
+- Must pass: lint + typecheck + test-cov
+- Pre-commit hooks installed via `make install`
 ## Documentation
 ### Docstrings
 - Google-style docstrings for all public functions
 Example:
+```python
+async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
+    """Search PubMed and return evidence.
+    Args:
+        query: The search query string
+        max_results: Maximum number of results to return
+    Returns:
+        List of Evidence objects
+    Raises:
+        SearchError: If the search fails
+        RateLimitError: If we hit rate limits
+    """
+```
 ### Code Comments
 - [Code Style](code-style.md) - Code style guidelines
 - [Testing](testing.md) - Testing guidelines

docs/contributing/code-style.md CHANGED Viewed

@@ -1,44 +1,6 @@
 # Code Style & Conventions
-This document outlines the code style and conventions for The DETERMINATOR.
-## Package Manager
-This project uses [`uv`](https://github.com/astral-sh/uv) as the package manager. All commands should be prefixed with `uv run` to ensure they run in the correct environment.
-### Installation
-```bash
-# Install uv if you haven't already (recommended: standalone installer)
-# Unix/macOS/Linux:
-curl -LsSf https://astral.sh/uv/install.sh | sh
-# Windows (PowerShell):
-powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
-# Alternative: pipx install uv
-# Or: pip install uv
-# Sync all dependencies including dev extras
-uv sync --all-extras
-```
-### Running Commands
-All development commands should use `uv run` prefix:
-```bash
-# Instead of: pytest tests/
-uv run pytest tests/
-# Instead of: ruff check src
-uv run ruff check src
-# Instead of: mypy src
-uv run mypy src
-```
-This ensures commands run in the correct virtual environment managed by `uv`.
 ## Type Safety
@@ -46,9 +8,11 @@ This ensures commands run in the correct virtual environment managed by `uv`.
 - Use `mypy --strict` compliance (no `Any` unless absolutely necessary)
 - Use `TYPE_CHECKING` imports for circular dependencies:
-<!--codeinclude-->
-[TYPE_CHECKING Import Pattern](../src/utils/citation_validator.py) start_line:8 end_line:11
-<!--/codeinclude-->
 ## Pydantic Models
@@ -81,3 +45,13 @@ result = await loop.run_in_executor(None, cpu_bound_function, args)
 - [Error Handling](error-handling.md) - Error handling guidelines
 - [Implementation Patterns](implementation-patterns.md) - Common patterns

 # Code Style & Conventions
+This document outlines the code style and conventions for DeepCritical.
 ## Type Safety
 - Use `mypy --strict` compliance (no `Any` unless absolutely necessary)
 - Use `TYPE_CHECKING` imports for circular dependencies:
+```python
+from typing import TYPE_CHECKING
+if TYPE_CHECKING:
+    from src.services.embeddings import EmbeddingService
+```
 ## Pydantic Models
 - [Error Handling](error-handling.md) - Error handling guidelines
 - [Implementation Patterns](implementation-patterns.md) - Common patterns

docs/contributing/error-handling.md CHANGED Viewed

@@ -1,14 +1,15 @@
 # Error Handling & Logging
-This document outlines error handling and logging conventions for The DETERMINATOR.
 ## Exception Hierarchy
 Use custom exception hierarchy (`src/utils/exceptions.py`):
-<!--codeinclude-->
-[Exception Hierarchy](../src/utils/exceptions.py) start_line:4 end_line:31
-<!--/codeinclude-->
 ## Error Handling Rules
@@ -52,3 +53,13 @@ except httpx.HTTPError as e:
 - [Code Style](code-style.md) - Code style guidelines
 - [Testing](testing.md) - Testing guidelines

 # Error Handling & Logging
+This document outlines error handling and logging conventions for DeepCritical.
 ## Exception Hierarchy
 Use custom exception hierarchy (`src/utils/exceptions.py`):
+- `DeepCriticalError` (base)
+- `SearchError` → `RateLimitError`
+- `JudgeError`
+- `ConfigurationError`
 ## Error Handling Rules
 - [Code Style](code-style.md) - Code style guidelines
 - [Testing](testing.md) - Testing guidelines

docs/contributing/implementation-patterns.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Implementation Patterns
-This document outlines common implementation patterns used in The DETERMINATOR.
 ## Search Tools
@@ -40,9 +40,11 @@ class MySearchTool:
 - Lazy initialization for optional dependencies (e.g., embeddings, Modal)
 - Check requirements before initialization:
-<!--codeinclude-->
-[Check Magentic Requirements](../src/utils/llm_factory.py) start_line:152 end_line:170
-<!--/codeinclude-->
 ## State Management
@@ -54,9 +56,11 @@ class MySearchTool:
 Use `@lru_cache(maxsize=1)` for singletons:
-<!--codeinclude-->
-[Singleton Pattern Example](../src/services/statistical_analyzer.py) start_line:252 end_line:255
-<!--/codeinclude-->
 - Lazy initialization to avoid requiring dependencies at import time
@@ -65,3 +69,12 @@ Use `@lru_cache(maxsize=1)` for singletons:
 - [Code Style](code-style.md) - Code style guidelines
 - [Error Handling](error-handling.md) - Error handling guidelines

 # Implementation Patterns
+This document outlines common implementation patterns used in DeepCritical.
 ## Search Tools
 - Lazy initialization for optional dependencies (e.g., embeddings, Modal)
 - Check requirements before initialization:
+```python
+def check_magentic_requirements() -> None:
+    if not settings.has_openai_key:
+        raise ConfigurationError("Magentic requires OpenAI")
+```
 ## State Management
 Use `@lru_cache(maxsize=1)` for singletons:
+```python
+@lru_cache(maxsize=1)
+def get_embedding_service() -> EmbeddingService:
+    return EmbeddingService()
+```
 - Lazy initialization to avoid requiring dependencies at import time
 - [Code Style](code-style.md) - Code style guidelines
 - [Error Handling](error-handling.md) - Error handling guidelines

docs/contributing/index.md CHANGED Viewed

@@ -1,8 +1,6 @@
-# Contributing to The DETERMINATOR
-Thank you for your interest in contributing to The DETERMINATOR! This guide will help you get started.
-> **Note on Project Names**: "The DETERMINATOR" is the product name, "DeepCritical" is the organization/project name, and "determinator" is the Python package name.
 ## Git Workflow
@@ -12,138 +10,44 @@ Thank you for your interest in contributing to The DETERMINATOR! This guide will
 - **NEVER** push directly to `main` or `dev` on HuggingFace
 - GitHub is source of truth; HuggingFace is for deployment
-## Repository Information
-- **GitHub Repository**: [`DeepCritical/GradioDemo`](https://github.com/DeepCritical/GradioDemo) (source of truth, PRs, code review)
-- **HuggingFace Space**: [`DataQuests/DeepCritical`](https://huggingface.co/spaces/DataQuests/DeepCritical) (deployment/demo)
-- **Package Name**: `determinator` (Python package name in `pyproject.toml`)
-### Dual Repository Setup
-This project uses a dual repository setup:
-- **GitHub (`DeepCritical/GradioDemo`)**: Source of truth for code, PRs, and code review
-- **HuggingFace (`DataQuests/DeepCritical`)**: Deployment target for the Gradio demo
-#### Remote Configuration
-When cloning, set up remotes as follows:
-```bash
-# Clone from GitHub
-git clone https://github.com/DeepCritical/GradioDemo.git
-cd GradioDemo
-# Add HuggingFace remote (optional, for deployment)
-git remote add huggingface-upstream https://huggingface.co/spaces/DataQuests/DeepCritical
-```
-**Important**: Never push directly to `main` or `dev` on HuggingFace. Always work through GitHub PRs. GitHub is the source of truth; HuggingFace is for deployment/demo only.
-## Package Manager
-This project uses [`uv`](https://github.com/astral-sh/uv) as the package manager. All commands should be prefixed with `uv run` to ensure they run in the correct environment.
-### Installation
-```bash
-# Install uv if you haven't already (recommended: standalone installer)
-# Unix/macOS/Linux:
-curl -LsSf https://astral.sh/uv/install.sh | sh
-# Windows (PowerShell):
-powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
-# Alternative: pipx install uv
-# Or: pip install uv
-# Sync all dependencies including dev extras
-uv sync --all-extras
-# Install pre-commit hooks
-uv run pre-commit install
-```
 ## Development Commands
 ```bash
-# Installation
-uv sync --all-extras              # Install all dependencies including dev
-uv run pre-commit install          # Install pre-commit hooks
-# Code Quality Checks (run all before committing)
-uv run ruff check src tests       # Lint with ruff
-uv run ruff format src tests      # Format with ruff
-uv run mypy src                   # Type checking
-uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire  # Tests with coverage
-# Testing Commands
-uv run pytest tests/unit/ -v -m "not openai" -p no:logfire              # Run unit tests (excludes OpenAI tests)
-uv run pytest tests/ -v -m "huggingface" -p no:logfire                 # Run HuggingFace tests
-uv run pytest tests/ -v -p no:logfire                                  # Run all tests
-uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire  # Tests with terminal coverage
-uv run pytest --cov=src --cov-report=html -p no:logfire                # Generate HTML coverage report (opens htmlcov/index.html)
-# Documentation Commands
-uv run mkdocs build                # Build documentation
-uv run mkdocs serve                # Serve documentation locally (http://127.0.0.1:8000)
 ```
-### Test Markers
-The project uses pytest markers to categorize tests. See [Testing Guidelines](testing.md) for details:
-- `unit`: Unit tests (mocked, fast)
-- `integration`: Integration tests (real APIs)
-- `slow`: Slow tests
-- `openai`: Tests requiring OpenAI API key
-- `huggingface`: Tests requiring HuggingFace API key
-- `embedding_provider`: Tests requiring API-based embedding providers
-- `local_embeddings`: Tests using local embeddings
-**Note**: The `-p no:logfire` flag disables the logfire plugin to avoid conflicts during testing.
 ## Getting Started
-1. **Fork the repository** on GitHub: [`DeepCritical/GradioDemo`](https://github.com/DeepCritical/GradioDemo)
 2. **Clone your fork**:
    ```bash
    git clone https://github.com/yourusername/GradioDemo.git
    cd GradioDemo
    ```
 3. **Install dependencies**:
    ```bash
-   uv sync --all-extras
-   uv run pre-commit install
    ```
 4. **Create a feature branch**:
    ```bash
    git checkout -b yourname-feature-name
    ```
 5. **Make your changes** following the guidelines below
 6. **Run checks**:
    ```bash
-   uv run ruff check src tests
-   uv run mypy src
-   uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire
    ```
 7. **Commit and push**:
    ```bash
    git commit -m "Description of changes"
    git push origin yourname-feature-name
    ```
 8. **Create a pull request** on GitHub
 ## Development Guidelines
@@ -228,7 +132,7 @@ The project uses pytest markers to categorize tests. See [Testing Guidelines](te
 ## Pull Request Process
-1. Ensure all checks pass: `uv run ruff check src tests && uv run mypy src && uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire`
 2. Update documentation if needed
 3. Add tests for new features
 4. Update CHANGELOG if applicable
@@ -236,19 +140,20 @@ The project uses pytest markers to categorize tests. See [Testing Guidelines](te
 6. Address review feedback
 7. Wait for approval before merging
-## Project Structure
-- `src/`: Main source code
-- `tests/`: Test files (`unit/` and `integration/`)
-- `docs/`: Documentation source files (MkDocs)
-- `examples/`: Example usage scripts
-- `pyproject.toml`: Project configuration and dependencies
-- `.pre-commit-config.yaml`: Pre-commit hook configuration
 ## Questions?
-- Open an issue on [GitHub](https://github.com/DeepCritical/GradioDemo)
-- Check existing [documentation](https://deepcritical.github.io/GradioDemo/)
 - Review code examples in the codebase
-Thank you for contributing to The DETERMINATOR!

+# Contributing to DeepCritical
+Thank you for your interest in contributing to DeepCritical! This guide will help you get started.
 ## Git Workflow
 - **NEVER** push directly to `main` or `dev` on HuggingFace
 - GitHub is source of truth; HuggingFace is for deployment
 ## Development Commands
 ```bash
+make install      # Install dependencies + pre-commit
+make check        # Lint + typecheck + test (MUST PASS)
+make test         # Run unit tests
+make lint         # Run ruff
+make format       # Format with ruff
+make typecheck    # Run mypy
+make test-cov     # Test with coverage
 ```
 ## Getting Started
+1. **Fork the repository** on GitHub
 2. **Clone your fork**:
    ```bash
    git clone https://github.com/yourusername/GradioDemo.git
    cd GradioDemo
    ```
 3. **Install dependencies**:
    ```bash
+   make install
    ```
 4. **Create a feature branch**:
    ```bash
    git checkout -b yourname-feature-name
    ```
 5. **Make your changes** following the guidelines below
 6. **Run checks**:
    ```bash
+   make check
    ```
 7. **Commit and push**:
    ```bash
    git commit -m "Description of changes"
    git push origin yourname-feature-name
    ```
 8. **Create a pull request** on GitHub
 ## Development Guidelines
 ## Pull Request Process
+1. Ensure all checks pass: `make check`
 2. Update documentation if needed
 3. Add tests for new features
 4. Update CHANGELOG if applicable
 6. Address review feedback
 7. Wait for approval before merging
 ## Questions?
+- Open an issue on GitHub
+- Check existing documentation
 - Review code examples in the codebase
+Thank you for contributing to DeepCritical!

docs/contributing/prompt-engineering.md CHANGED Viewed

@@ -53,3 +53,13 @@ This document outlines prompt engineering guidelines and citation validation rul
 - [Code Quality](code-quality.md) - Code quality guidelines
 - [Error Handling](error-handling.md) - Error handling guidelines


53
54	- [Code Quality](code-quality.md) - Code quality guidelines
55	- [Error Handling](error-handling.md) - Error handling guidelines
56	+
57	+
58	+
59	+
60	+
61	+
62	+
63	+
64	+
65	+

docs/contributing/testing.md CHANGED Viewed

@@ -1,45 +1,12 @@
 # Testing Requirements
-This document outlines testing requirements and guidelines for The DETERMINATOR.
 ## Test Structure
 - Unit tests in `tests/unit/` (mocked, fast)
 - Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`)
-- Use markers: `unit`, `integration`, `slow`, `openai`, `huggingface`, `embedding_provider`, `local_embeddings`
-## Test Markers
-The project uses pytest markers to categorize tests. These markers are defined in `pyproject.toml`:
-- `@pytest.mark.unit`: Unit tests (mocked, fast) - Run with `-m "unit"`
-- `@pytest.mark.integration`: Integration tests (real APIs) - Run with `-m "integration"`
-- `@pytest.mark.slow`: Slow tests - Run with `-m "slow"`
-- `@pytest.mark.openai`: Tests requiring OpenAI API key - Run with `-m "openai"` or exclude with `-m "not openai"`
-- `@pytest.mark.huggingface`: Tests requiring HuggingFace API key or using HuggingFace models - Run with `-m "huggingface"`
-- `@pytest.mark.embedding_provider`: Tests requiring API-based embedding providers (OpenAI, etc.) - Run with `-m "embedding_provider"`
-- `@pytest.mark.local_embeddings`: Tests using local embeddings (sentence-transformers, ChromaDB) - Run with `-m "local_embeddings"`
-### Running Tests by Marker
-```bash
-# Run only unit tests (excludes OpenAI tests by default)
-uv run pytest tests/unit/ -v -m "not openai" -p no:logfire
-# Run HuggingFace tests
-uv run pytest tests/ -v -m "huggingface" -p no:logfire
-# Run all tests
-uv run pytest tests/ -v -p no:logfire
-# Run only local embedding tests
-uv run pytest tests/ -v -m "local_embeddings" -p no:logfire
-# Exclude slow tests
-uv run pytest tests/ -v -m "not slow" -p no:logfire
-```
-**Note**: The `-p no:logfire` flag disables the logfire plugin to avoid conflicts during testing.
 ## Mocking
@@ -53,20 +20,7 @@ uv run pytest tests/ -v -m "not slow" -p no:logfire
 1. Write failing test in `tests/unit/`
 2. Implement in `src/`
 3. Ensure test passes
-4. Run checks: `uv run ruff check src tests && uv run mypy src && uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire`
-### Test Command Examples
-```bash
-# Run unit tests (default, excludes OpenAI tests)
-uv run pytest tests/unit/ -v -m "not openai" -p no:logfire
-# Run HuggingFace tests
-uv run pytest tests/ -v -m "huggingface" -p no:logfire
-# Run all tests
-uv run pytest tests/ -v -p no:logfire
-```
 ## Test Examples
@@ -87,29 +41,21 @@ async def test_real_pubmed_search():
 ## Test Coverage
-### Terminal Coverage Report
-```bash
-uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire
-```
-This shows coverage with missing lines highlighted in the terminal output.
-### HTML Coverage Report
-```bash
-uv run pytest --cov=src --cov-report=html -p no:logfire
-```
-This generates an HTML coverage report in `htmlcov/index.html`. Open this file in your browser to see detailed coverage information.
-### Coverage Goals
-- Aim for >80% coverage on critical paths
-- Exclude: `__init__.py`, `TYPE_CHECKING` blocks
-- Coverage configuration is in `pyproject.toml` under `[tool.coverage.*]`
-## See Also
-- [Code Style](code-style.md) - Code style guidelines
-- [Implementation Patterns](implementation-patterns.md) - Common patterns

 # Testing Requirements
+This document outlines testing requirements and guidelines for DeepCritical.
 ## Test Structure
 - Unit tests in `tests/unit/` (mocked, fast)
 - Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`)
+- Use markers: `unit`, `integration`, `slow`
 ## Mocking
 1. Write failing test in `tests/unit/`
 2. Implement in `src/`
 3. Ensure test passes
+4. Run `make check` (lint + typecheck + test)
 ## Test Examples
 ## Test Coverage
+- Run `make test-cov` for coverage report
+- Aim for >80% coverage on critical paths
+- Exclude: `__init__.py`, `TYPE_CHECKING` blocks
+## See Also
+- [Code Style](code-style.md) - Code style guidelines
+- [Implementation Patterns](implementation-patterns.md) - Common patterns

docs/getting-started/examples.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Examples
-This page provides examples of using The DETERMINATOR for various research tasks.
 ## Basic Research Query
@@ -11,7 +11,7 @@ This page provides examples of using The DETERMINATOR for various research tasks
 What are the latest treatments for Alzheimer's disease?
 ```
-**What The DETERMINATOR Does**:
 1. Searches PubMed for recent papers
 2. Searches ClinicalTrials.gov for active trials
 3. Evaluates evidence quality
@@ -24,8 +24,7 @@ What are the latest treatments for Alzheimer's disease?
 What clinical trials are investigating metformin for cancer prevention?
 ```
-**What The DETERMINATOR Does**:
 1. Searches ClinicalTrials.gov for relevant trials
 2. Searches PubMed for supporting literature
 3. Provides trial details and status
@@ -36,13 +35,12 @@ What clinical trials are investigating metformin for cancer prevention?
 ### Example 3: Comprehensive Review
 **Query**:
 ```
 Review the evidence for using metformin as an anti-aging intervention,
 including clinical trials, mechanisms of action, and safety profile.
 ```
-**What The DETERMINATOR Does**:
 1. Uses deep research mode (multi-section)
 2. Searches multiple sources in parallel
 3. Generates sections on:
@@ -58,7 +56,7 @@ including clinical trials, mechanisms of action, and safety profile.
 Test the hypothesis that regular exercise reduces Alzheimer's disease risk.
 ```
-**What The DETERMINATOR Does**:
 1. Generates testable hypotheses
 2. Searches for supporting/contradicting evidence
 3. Performs statistical analysis (if Modal configured)
@@ -102,13 +100,13 @@ from src.agent_factory.judges import create_judge_handler
 # Create orchestrator
 search_handler = SearchHandler()
 judge_handler = create_judge_handler()
-```
-<!--codeinclude-->
-[Create Orchestrator](../src/orchestrator_factory.py) start_line:44 end_line:66
-<!--/codeinclude-->
-```python
 # Run research query
 query = "What are the latest treatments for Alzheimer's disease?"
 async for event in orchestrator.run(query):
@@ -136,13 +134,13 @@ Single-loop research with search-judge-synthesize cycles:
 ```python
 from src.orchestrator.research_flow import IterativeResearchFlow
-```
-<!--codeinclude-->
-[IterativeResearchFlow Initialization](../src/orchestrator/research_flow.py) start_line:56 end_line:77
-<!--/codeinclude-->
-```python
 async for event in flow.run(query):
     # Handle events
     pass
@@ -154,13 +152,13 @@ Multi-section parallel research:
 ```python
 from src.orchestrator.research_flow import DeepResearchFlow
-```
-<!--codeinclude-->
-[DeepResearchFlow Initialization](../src/orchestrator/research_flow.py) start_line:674 end_line:697
-<!--/codeinclude-->
-```python
 async for event in flow.run(query):
     # Handle events
     pass
@@ -193,6 +191,15 @@ USE_GRAPH_EXECUTION=true
 ## Next Steps
 - Read the [Configuration Guide](../configuration/index.md) for all options
-- Explore the [Architecture Documentation](../architecture/graph_orchestration.md)
 - Check out the [API Reference](../api/agents.md) for programmatic usage

 # Examples
+This page provides examples of using DeepCritical for various research tasks.
 ## Basic Research Query
 What are the latest treatments for Alzheimer's disease?
 ```
+**What DeepCritical Does**:
 1. Searches PubMed for recent papers
 2. Searches ClinicalTrials.gov for active trials
 3. Evaluates evidence quality
 What clinical trials are investigating metformin for cancer prevention?
 ```
+**What DeepCritical Does**:
 1. Searches ClinicalTrials.gov for relevant trials
 2. Searches PubMed for supporting literature
 3. Provides trial details and status
 ### Example 3: Comprehensive Review
 **Query**:
 ```
 Review the evidence for using metformin as an anti-aging intervention,
 including clinical trials, mechanisms of action, and safety profile.
 ```
+**What DeepCritical Does**:
 1. Uses deep research mode (multi-section)
 2. Searches multiple sources in parallel
 3. Generates sections on:
 Test the hypothesis that regular exercise reduces Alzheimer's disease risk.
 ```
+**What DeepCritical Does**:
 1. Generates testable hypotheses
 2. Searches for supporting/contradicting evidence
 3. Performs statistical analysis (if Modal configured)
 # Create orchestrator
 search_handler = SearchHandler()
 judge_handler = create_judge_handler()
+orchestrator = create_orchestrator(
+    search_handler=search_handler,
+    judge_handler=judge_handler,
+    config={},
+    mode="advanced"
+)
 # Run research query
 query = "What are the latest treatments for Alzheimer's disease?"
 async for event in orchestrator.run(query):
 ```python
 from src.orchestrator.research_flow import IterativeResearchFlow
+flow = IterativeResearchFlow(
+    search_handler=search_handler,
+    judge_handler=judge_handler,
+    use_graph=False
+)
 async for event in flow.run(query):
     # Handle events
     pass
 ```python
 from src.orchestrator.research_flow import DeepResearchFlow
+flow = DeepResearchFlow(
+    search_handler=search_handler,
+    judge_handler=judge_handler,
+    use_graph=True
+)
 async for event in flow.run(query):
     # Handle events
     pass
 ## Next Steps
 - Read the [Configuration Guide](../configuration/index.md) for all options
+- Explore the [Architecture Documentation](../architecture/graph-orchestration.md)
 - Check out the [API Reference](../api/agents.md) for programmatic usage

docs/getting-started/installation.md CHANGED Viewed

@@ -12,29 +12,12 @@ This guide will help you install and set up DeepCritical on your system.
 ### 1. Install uv (Recommended)
-`uv` is a fast Python package installer and resolver. Install it using the standalone installer (recommended):
-**Unix/macOS/Linux:**
 ```bash
-curl -LsSf https://astral.sh/uv/install.sh | sh
-```
-**Windows (PowerShell):**
-```powershell
-powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
-```
-**Alternative methods:**
-```bash
-# Using pipx (recommended if you have pipx installed)
-pipx install uv
-# Or using pip
 pip install uv
 ```
-After installation, restart your terminal or add `~/.cargo/bin` to your PATH.
 ### 2. Clone the Repository
 ```bash
@@ -150,3 +133,12 @@ uv run pre-commit install
 - Learn about [MCP Integration](mcp-integration.md)
 - Explore [Examples](examples.md)

 ### 1. Install uv (Recommended)
+`uv` is a fast Python package installer and resolver. Install it with:
 ```bash
 pip install uv
 ```
 ### 2. Clone the Repository
 ```bash
 - Learn about [MCP Integration](mcp-integration.md)
 - Explore [Examples](examples.md)

docs/getting-started/mcp-integration.md CHANGED Viewed

@@ -1,10 +1,10 @@
 # MCP Integration
-The DETERMINATOR exposes a Model Context Protocol (MCP) server, allowing you to use its search tools directly from Claude Desktop or other MCP clients.
 ## What is MCP?
-The Model Context Protocol (MCP) is a standard for connecting AI assistants to external tools and data sources. The DETERMINATOR implements an MCP server that exposes its search capabilities as MCP tools.
 ## MCP Server URL
@@ -33,14 +33,14 @@ http://localhost:7860/gradio_api/mcp/
 ~/.config/Claude/claude_desktop_config.json
 ```
-### 2. Add The DETERMINATOR Server
 Edit `claude_desktop_config.json` and add:
 ```json
 {
   "mcpServers": {
-    "determinator": {
       "url": "http://localhost:7860/gradio_api/mcp/"
     }
   }
@@ -53,7 +53,7 @@ Close and restart Claude Desktop for changes to take effect.
 ### 4. Verify Connection
-In Claude Desktop, you should see The DETERMINATOR tools available:
 - `search_pubmed`
 - `search_clinical_trials`
 - `search_biorxiv`
@@ -198,6 +198,14 @@ You can configure multiple DeepCritical instances:
 - Learn about [Configuration](../configuration/index.md) for advanced settings
 - Explore [Examples](examples.md) for use cases
-- Read the [Architecture Documentation](../architecture/graph_orchestration.md)

 # MCP Integration
+DeepCritical exposes a Model Context Protocol (MCP) server, allowing you to use its search tools directly from Claude Desktop or other MCP clients.
 ## What is MCP?
+The Model Context Protocol (MCP) is a standard for connecting AI assistants to external tools and data sources. DeepCritical implements an MCP server that exposes its search capabilities as MCP tools.
 ## MCP Server URL
 ~/.config/Claude/claude_desktop_config.json
 ```
+### 2. Add DeepCritical Server
 Edit `claude_desktop_config.json` and add:
 ```json
 {
   "mcpServers": {
+    "deepcritical": {
       "url": "http://localhost:7860/gradio_api/mcp/"
     }
   }
 ### 4. Verify Connection
+In Claude Desktop, you should see DeepCritical tools available:
 - `search_pubmed`
 - `search_clinical_trials`
 - `search_biorxiv`
 - Learn about [Configuration](../configuration/index.md) for advanced settings
 - Explore [Examples](examples.md) for use cases
+- Read the [Architecture Documentation](../architecture/graph-orchestration.md)

docs/getting-started/quick-start.md CHANGED Viewed

@@ -1,47 +1,11 @@
-# Single Command Deploy
-Deploy with docker instandly with a single command :
-```bash
-docker run -it -p 7860:7860 --platform=linux/amd64 \
-    -e DB_KEY="YOUR_VALUE_HERE" \
-    -e SERP_API="YOUR_VALUE_HERE" \
-    -e INFERENCE_API="YOUR_VALUE_HERE" \
-    -e MODAL_TOKEN_ID="YOUR_VALUE_HERE" \
-    -e MODAL_TOKEN_SECRET="YOUR_VALUE_HERE" \
-    -e NCBI_API_KEY="YOUR_VALUE_HERE" \
-    -e SERPER_API_KEY="YOUR_VALUE_HERE" \
-    -e CHROMA_DB_PATH="./chroma_db" \
-    -e CHROMA_DB_HOST="localhost" \
-    -e CHROMA_DB_PORT="8000" \
-    -e RAG_COLLECTION_NAME="deepcritical_evidence" \
-    -e RAG_SIMILARITY_TOP_K="5" \
-    -e RAG_AUTO_INGEST="true" \
-    -e USE_GRAPH_EXECUTION="false" \
-    -e DEFAULT_TOKEN_LIMIT="100000" \
-    -e DEFAULT_TIME_LIMIT_MINUTES="10" \
-    -e DEFAULT_ITERATIONS_LIMIT="10" \
-    -e WEB_SEARCH_PROVIDER="duckduckgo" \
-    -e MAX_ITERATIONS="10" \
-    -e SEARCH_TIMEOUT="30" \
-    -e LOG_LEVEL="DEBUG" \
-    -e EMBEDDING_PROVIDER="local" \
-    -e OPENAI_EMBEDDING_MODEL="text-embedding-3-small" \
-    -e LOCAL_EMBEDDING_MODEL="BAAI/bge-small-en-v1.5" \
-    -e HUGGINGFACE_EMBEDDING_MODEL="sentence-transformers/all-MiniLM-L6-v2" \
-    -e HF_FALLBACK_MODELS="Qwen/Qwen3-Next-80B-A3B-Thinking,Qwen/Qwen3-Next-80B-A3B-Instruct,meta-llama/Llama-3.3-70B-Instruct,meta-llama/Llama-3.1-8B-Instruct,HuggingFaceH4/zephyr-7b-beta,Qwen/Qwen2-7B-Instruct" \
-    -e HUGGINGFACE_MODEL="Qwen/Qwen3-Next-80B-A3B-Thinking" \
-    registry.hf.space/dataquests-deepcritical:latest python src/app.py
-   ```
-## Quick start guide
-Get up and running with The DETERMINATOR in minutes.
 ## Start the Application
 ```bash
-gradio src/app.py
 ```
 Open your browser to `http://localhost:7860`.
@@ -135,8 +99,17 @@ What are the active clinical trials investigating Alzheimer's disease treatments
 ## Next Steps
-- Learn about [MCP Integration](mcp-integration.md) to use The DETERMINATOR from Claude Desktop
 - Explore [Examples](examples.md) for more use cases
 - Read the [Configuration Guide](../configuration/index.md) for advanced settings
-- Check out the [Architecture Documentation](../architecture/graph_orchestration.md) to understand how it works

+# Quick Start Guide
+Get up and running with DeepCritical in minutes.
 ## Start the Application
 ```bash
+uv run gradio run src/app.py
 ```
 Open your browser to `http://localhost:7860`.
 ## Next Steps
+- Learn about [MCP Integration](mcp-integration.md) to use DeepCritical from Claude Desktop
 - Explore [Examples](examples.md) for more use cases
 - Read the [Configuration Guide](../configuration/index.md) for advanced settings
+- Check out the [Architecture Documentation](../architecture/graph-orchestration.md) to understand how it works

docs/index.md CHANGED Viewed

@@ -1,24 +1,12 @@
-# The DETERMINATOR
-**Generalist Deep Research Agent - Stops at Nothing Until Finding Precise Answers**
-The DETERMINATOR is a powerful generalist deep research agent system that uses iterative search-and-judge loops to comprehensively investigate any research question. It stops at nothing until finding precise answers, only stopping at configured limits (budget, time, iterations).
-**Key Features**:
-- **Generalist**: Handles queries from any domain (medical, technical, business, scientific, etc.)
-- **Automatic Source Selection**: Automatically determines if medical knowledge sources (PubMed, ClinicalTrials.gov) are needed
-- **Multi-Source Search**: Web search, PubMed, ClinicalTrials.gov, Europe PMC, RAG
-- **Iterative Refinement**: Continues searching and refining until precise answers are found
-- **Evidence Synthesis**: Comprehensive reports with proper citations
-**Important**: The DETERMINATOR is a research tool that synthesizes evidence. It cannot provide medical advice or answer medical questions directly.
 ## Features
-- **Generalist Research**: Handles any research question from any domain
-- **Automatic Medical Detection**: Automatically determines if medical knowledge sources are needed
-- **Multi-Source Search**: Web search, PubMed, ClinicalTrials.gov, Europe PMC (includes bioRxiv/medRxiv), RAG
-- **Iterative Until Precise**: Stops at nothing until finding precise answers (only stops at configured limits)
 - **MCP Integration**: Use our tools from Claude Desktop or any MCP client
 - **HuggingFace OAuth**: Sign in with your HuggingFace account to automatically use your API token
 - **Modal Sandbox**: Secure execution of AI-generated statistical code
@@ -30,15 +18,8 @@ The DETERMINATOR is a powerful generalist deep research agent system that uses i
 ## Quick Start
 ```bash
-# Install uv if you haven't already (recommended: standalone installer)
-# Unix/macOS/Linux:
-curl -LsSf https://astral.sh/uv/install.sh | sh
-# Windows (PowerShell):
-powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
-# Alternative: pipx install uv
-# Or: pip install uv
 # Sync dependencies
 uv sync
@@ -53,9 +34,9 @@ For detailed installation and setup instructions, see the [Getting Started Guide
 ## Architecture
-The DETERMINATOR uses a Vertical Slice Architecture:
-1. **Search Slice**: Retrieving evidence from multiple sources (web, PubMed, ClinicalTrials.gov, Europe PMC, RAG) based on query analysis
 2. **Judge Slice**: Evaluating evidence quality using LLMs
 3. **Orchestrator Slice**: Managing the research loop and UI
@@ -73,7 +54,7 @@ Learn more about the [Architecture](overview/architecture.md).
 - [Getting Started](getting-started/installation.md) - Installation and setup
 - [Configuration](configuration/index.md) - Configuration guide
 - [API Reference](api/agents.md) - API documentation
-- [Contributing](contributing/index.md) - Development guidelines
 ## Links

+# DeepCritical
+**AI-Native Drug Repurposing Research Agent**
+DeepCritical is a deep research agent system that uses iterative search-and-judge loops to comprehensively answer research questions. The system supports multiple orchestration patterns, graph-based execution, parallel research workflows, and long-running task management with real-time streaming.
 ## Features
+- **Multi-Source Search**: PubMed, ClinicalTrials.gov, Europe PMC (includes bioRxiv/medRxiv)
 - **MCP Integration**: Use our tools from Claude Desktop or any MCP client
 - **HuggingFace OAuth**: Sign in with your HuggingFace account to automatically use your API token
 - **Modal Sandbox**: Secure execution of AI-generated statistical code
 ## Quick Start
 ```bash
+# Install uv if you haven't already
+pip install uv
 # Sync dependencies
 uv sync
 ## Architecture
+DeepCritical uses a Vertical Slice Architecture:
+1. **Search Slice**: Retrieving evidence from PubMed, ClinicalTrials.gov, and Europe PMC
 2. **Judge Slice**: Evaluating evidence quality using LLMs
 3. **Orchestrator Slice**: Managing the research loop and UI
 - [Getting Started](getting-started/installation.md) - Installation and setup
 - [Configuration](configuration/index.md) - Configuration guide
 - [API Reference](api/agents.md) - API documentation
+- [Contributing](contributing.md) - Development guidelines
 ## Links

docs/{LICENSE.md → license.md} RENAMED Viewed

File without changes

docs/overview/architecture.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Architecture Overview
-The DETERMINATOR is a powerful generalist deep research agent system that uses iterative search-and-judge loops to comprehensively investigate any research question. It stops at nothing until finding precise answers, only stopping at configured limits (budget, time, iterations). The system automatically determines if medical knowledge sources are needed and adapts its search strategy accordingly. It supports multiple orchestration patterns, graph-based execution, parallel research workflows, and long-running task management with real-time streaming.
 ## Core Architecture
@@ -134,11 +134,10 @@ The graph orchestrator (`src/orchestrator/graph_orchestrator.py`) implements a f
 - **Research Flows**: Iterative and deep research patterns (`src/orchestrator/research_flow.py`)
 - **Graph Builder**: Graph construction utilities (`src/agent_factory/graph_builder.py`)
 - **Agents**: Pydantic AI agents (`src/agents/`, `src/agent_factory/agents.py`)
-- **Search Tools**: Neo4j knowledge graph, PubMed, ClinicalTrials.gov, Europe PMC, Web search, RAG (`src/tools/`)
 - **Judge Handler**: LLM-based evidence assessment (`src/agent_factory/judges.py`)
 - **Embeddings**: Semantic search & deduplication (`src/services/embeddings.py`)
 - **Statistical Analyzer**: Modal sandbox execution (`src/services/statistical_analyzer.py`)
-- **Multimodal Processing**: Image OCR and audio STT/TTS services (`src/services/multimodal_processing.py`, `src/services/audio_processing.py`)
 - **Middleware**: State management, budget tracking, workflow coordination (`src/middleware/`)
 - **MCP Tools**: Claude Desktop integration (`src/mcp_tools.py`)
 - **Gradio UI**: Web interface with MCP server and streaming (`src/app.py`)
@@ -170,25 +169,24 @@ The system supports complex research workflows through:
 - **Orchestrator Factory** (`src/orchestrator_factory.py`):
   - Auto-detects mode: "advanced" if OpenAI key available, else "simple"
-  - Supports explicit mode selection: "simple", "magentic" (alias for "advanced"), "advanced", "iterative", "deep", "auto"
   - Lazy imports for optional dependencies
-- **Orchestrator Modes** (selected in UI or via factory):
-  - `simple`: Legacy linear search-judge loop (Free Tier)
-  - `advanced` or `magentic`: Multi-agent coordination using Microsoft Agent Framework (requires OpenAI API key)
-  - `iterative`: Knowledge-gap-driven research with single loop (Free Tier)
-  - `deep`: Parallel section-based research with planning (Free Tier)
-  - `auto`: Intelligent mode detection based on query complexity (Free Tier)
-- **Graph Research Modes** (used within graph orchestrator, separate from orchestrator mode):
-  - `iterative`: Single research loop pattern
-  - `deep`: Multi-section parallel research pattern
-  - `auto`: Auto-detect pattern based on query complexity
 - **Execution Modes**:
   - `use_graph=True`: Graph-based execution (parallel, conditional routing)
   - `use_graph=False`: Agent chains (sequential, backward compatible)
-**Note**: The UI provides separate controls for orchestrator mode and graph research mode. When using graph-based orchestrators (iterative/deep/auto), the graph research mode determines the specific pattern used within the graph execution.

 # Architecture Overview
+DeepCritical is a deep research agent system that uses iterative search-and-judge loops to comprehensively answer research questions. The system supports multiple orchestration patterns, graph-based execution, parallel research workflows, and long-running task management with real-time streaming.
 ## Core Architecture
 - **Research Flows**: Iterative and deep research patterns (`src/orchestrator/research_flow.py`)
 - **Graph Builder**: Graph construction utilities (`src/agent_factory/graph_builder.py`)
 - **Agents**: Pydantic AI agents (`src/agents/`, `src/agent_factory/agents.py`)
+- **Search Tools**: PubMed, ClinicalTrials.gov, Europe PMC, RAG (`src/tools/`)
 - **Judge Handler**: LLM-based evidence assessment (`src/agent_factory/judges.py`)
 - **Embeddings**: Semantic search & deduplication (`src/services/embeddings.py`)
 - **Statistical Analyzer**: Modal sandbox execution (`src/services/statistical_analyzer.py`)
 - **Middleware**: State management, budget tracking, workflow coordination (`src/middleware/`)
 - **MCP Tools**: Claude Desktop integration (`src/mcp_tools.py`)
 - **Gradio UI**: Web interface with MCP server and streaming (`src/app.py`)
 - **Orchestrator Factory** (`src/orchestrator_factory.py`):
   - Auto-detects mode: "advanced" if OpenAI key available, else "simple"
+  - Supports explicit mode selection: "simple", "magentic", "advanced"
   - Lazy imports for optional dependencies
+- **Research Modes**:
+  - `iterative`: Single research loop
+  - `deep`: Multi-section parallel research
+  - `auto`: Auto-detect based on query complexity
 - **Execution Modes**:
   - `use_graph=True`: Graph-based execution (parallel, conditional routing)
   - `use_graph=False`: Agent chains (sequential, backward compatible)

docs/overview/features.md CHANGED Viewed

@@ -1,32 +1,27 @@
 # Features
-The DETERMINATOR provides a comprehensive set of features for AI-assisted research:
 ## Core Features
 ### Multi-Source Search
-- **General Web Search**: Search general knowledge sources for any domain
-- **Neo4j Knowledge Graph**: Search structured knowledge graph for papers and disease relationships
-- **PubMed**: Search peer-reviewed biomedical literature via NCBI E-utilities (automatically used when medical knowledge needed)
-- **ClinicalTrials.gov**: Search interventional clinical trials (automatically used when medical knowledge needed)
 - **Europe PMC**: Search preprints and peer-reviewed articles (includes bioRxiv/medRxiv)
 - **RAG**: Semantic search within collected evidence using LlamaIndex
-- **Automatic Source Selection**: Automatically determines which sources are needed based on query analysis
 ### MCP Integration
 - **Model Context Protocol**: Expose search tools via MCP server
-- **Claude Desktop**: Use The DETERMINATOR tools directly from Claude Desktop
 - **MCP Clients**: Compatible with any MCP-compatible client
 ### Authentication
-- **REQUIRED**: Authentication is mandatory before using the application
-- **HuggingFace OAuth**: Sign in with HuggingFace account for automatic API token usage (recommended)
-- **Manual API Keys**: Support for HuggingFace API keys via environment variables (`HF_TOKEN` or `HUGGINGFACE_API_KEY`)
-- **Free Tier Support**: Automatic fallback to HuggingFace Inference API (public models) when no API key is available
-- **Authentication Check**: The application will display an error message if authentication is not provided
 ### Secure Code Execution
@@ -45,26 +40,9 @@ The DETERMINATOR provides a comprehensive set of features for AI-assisted resear
 - **Graph-Based Execution**: Flexible graph orchestration with conditional routing
 - **Parallel Research Loops**: Run multiple research tasks concurrently
-- **Iterative Research**: Single-loop research with search-judge-synthesize cycles that continues until precise answers are found
 - **Deep Research**: Multi-section parallel research with planning and synthesis
-- **Magentic Orchestration**: Multi-agent coordination using Microsoft Agent Framework (alias: "advanced" mode)
-- **Stops at Nothing**: Only stops at configured limits (budget, time, iterations), otherwise continues until finding precise answers
-**Orchestrator Modes**:
-- `simple`: Legacy linear search-judge loop
-- `advanced` (or `magentic`): Multi-agent coordination (requires OpenAI API key)
-- `iterative`: Knowledge-gap-driven research with single loop
-- `deep`: Parallel section-based research with planning
-- `auto`: Intelligent mode detection based on query complexity
-**Graph Research Modes** (used within graph orchestrator):
-- `iterative`: Single research loop pattern
-- `deep`: Multi-section parallel research pattern
-- `auto`: Auto-detect pattern based on query complexity
-**Execution Modes**:
-- `use_graph=True`: Graph-based execution with parallel and conditional routing
-- `use_graph=False`: Agent chains with sequential execution (backward compatible)
 ### Real-Time Streaming
@@ -86,16 +64,6 @@ The DETERMINATOR provides a comprehensive set of features for AI-assisted resear
 - **Conversation History**: Track iteration history and agent interactions
 - **State Synchronization**: Share evidence across parallel loops
-### Multimodal Input & Output
-- **Image Input (OCR)**: Upload images and extract text using optical character recognition
-- **Audio Input (STT)**: Record or upload audio files and transcribe to text using speech-to-text
-- **Audio Output (TTS)**: Generate audio responses with text-to-speech synthesis
-- **Configurable Settings**: Enable/disable multimodal features via sidebar settings
-- **Voice Selection**: Choose from multiple TTS voices (American English: af_*, am_*)
-- **Speech Speed Control**: Adjust TTS speech speed (0.5x to 2.0x)
-- **Multimodal Processing Service**: Integrated service for processing images and audio files
 ## Advanced Features
 ### Agent System
@@ -137,12 +105,10 @@ The DETERMINATOR provides a comprehensive set of features for AI-assisted resear
 ### Gradio Interface
-- **Real-Time Chat**: Interactive chat interface with multimodal support
 - **Streaming Updates**: Live progress updates
 - **Accordion UI**: Organized display of pending/done operations
 - **OAuth Integration**: Seamless HuggingFace authentication
-- **Multimodal Input**: Support for text, images, and audio input in the same interface
-- **Sidebar Settings**: Configuration accordions for research, multimodal, and audio settings
 ### MCP Server
@@ -167,3 +133,12 @@ The DETERMINATOR provides a comprehensive set of features for AI-assisted resear
 - **Architecture Diagrams**: Visual architecture documentation
 - **API Reference**: Complete API documentation

 # Features
+DeepCritical provides a comprehensive set of features for AI-assisted research:
 ## Core Features
 ### Multi-Source Search
+- **PubMed**: Search peer-reviewed biomedical literature via NCBI E-utilities
+- **ClinicalTrials.gov**: Search interventional clinical trials
 - **Europe PMC**: Search preprints and peer-reviewed articles (includes bioRxiv/medRxiv)
 - **RAG**: Semantic search within collected evidence using LlamaIndex
 ### MCP Integration
 - **Model Context Protocol**: Expose search tools via MCP server
+- **Claude Desktop**: Use DeepCritical tools directly from Claude Desktop
 - **MCP Clients**: Compatible with any MCP-compatible client
 ### Authentication
+- **HuggingFace OAuth**: Sign in with HuggingFace account for automatic API token usage
+- **Manual API Keys**: Support for OpenAI, Anthropic, and HuggingFace API keys
+- **Free Tier Support**: Automatic fallback to HuggingFace Inference API
 ### Secure Code Execution
 - **Graph-Based Execution**: Flexible graph orchestration with conditional routing
 - **Parallel Research Loops**: Run multiple research tasks concurrently
+- **Iterative Research**: Single-loop research with search-judge-synthesize cycles
 - **Deep Research**: Multi-section parallel research with planning and synthesis
+- **Magentic Orchestration**: Multi-agent coordination using Microsoft Agent Framework
 ### Real-Time Streaming
 - **Conversation History**: Track iteration history and agent interactions
 - **State Synchronization**: Share evidence across parallel loops
 ## Advanced Features
 ### Agent System
 ### Gradio Interface
+- **Real-Time Chat**: Interactive chat interface
 - **Streaming Updates**: Live progress updates
 - **Accordion UI**: Organized display of pending/done operations
 - **OAuth Integration**: Seamless HuggingFace authentication
 ### MCP Server
 - **Architecture Diagrams**: Visual architecture documentation
 - **API Reference**: Complete API documentation