Spaces:

gmkdigitalmedia
/

CTapi-raw

Paused

App Files Files Community

CTapi-raw / DEPLOY_TO_HUGGINGFACE.md

Your Name

Deploy Option B: Query Parser + RAG + 355M Ranking

45cf63e 2 months ago

preview code

raw

history blame contribute delete

6.87 kB

Deploy Option B to CTapi-raw HuggingFace Space

Your HuggingFace Space

Space: https://huggingface.co/spaces/gmkdigitalmedia/CTapi-raw
Local files: /mnt/c/Users/ibm/Documents/HF/CTapi-raw/
Target: Deploy Option B (7-10s per query)

✅ Files You Already Have (Ready to Deploy!)

Core Files

✅ app.py - Has /search endpoint (Option B!)
✅ foundation_engine.py - Has all Option B logic
✅ requirements.txt - All dependencies
✅ Dockerfile - Docker configuration

Documentation

✅ OPTION_B_IMPLEMENTATION_GUIDE.md - Complete guide
✅ TEST_RESULTS_PHYSICIAN_QUERY.md - Test results
✅ QUICK_START.md - Quick reference

🚀 Deployment Steps

Step 1: Set HuggingFace Token in Space Settings

Go to: https://huggingface.co/spaces/gmkdigitalmedia/CTapi-raw/settings

Add Secret:

Name: HF_TOKEN
Value: <your_huggingface_token>

Step 2: Push Your Local Files to HuggingFace

cd /mnt/c/Users/ibm/Documents/HF/CTapi-raw

# Initialize git if needed
git init
git remote add origin https://huggingface.co/spaces/gmkdigitalmedia/CTapi-raw

# Or if already initialized
git remote set-url origin https://huggingface.co/spaces/gmkdigitalmedia/CTapi-raw

# Stage all files
git add app.py foundation_engine.py requirements.txt Dockerfile README.md

# Commit
git commit -m "Deploy Option B: Query Parser + RAG + 355M Ranking"

# Push to HuggingFace
git push origin main

Step 3: Wait for Build

HuggingFace will automatically:

Build the Docker container
Download data files (3GB from gmkdigitalmedia/foundation1.2-data)
Start the API server
Expose it at: https://gmkdigitalmedia-ctapi-raw.hf.space

Build time: ~10-15 minutes

📋 What Your Space Will Have

Endpoints

Primary (Option B):

POST /search

Auxiliary:

GET /              # API info
GET /health        # Health check
GET /docs          # Swagger UI
GET /redoc         # ReDoc

Example Usage

# Test the API
curl -X POST https://gmkdigitalmedia-ctapi-raw.hf.space/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "what should a physician prescribing ianalumab for sjogrens know",
    "top_k": 5
  }'

Expected Response:

{
  "query": "...",
  "processing_time": 7.5,
  "query_analysis": {
    "extracted_entities": {
      "drugs": ["ianalumab", "VAY736"],
      "diseases": ["Sjögren's syndrome"]
    }
  },
  "results": {
    "total_found": 15,
    "returned": 5
  },
  "trials": [...],
  "benchmarking": {
    "query_parsing_time": 2.3,
    "rag_search_time": 2.9,
    "355m_ranking_time": 2.3
  }
}

🎯 For Your Clients

Client Code Example (Python)

import requests

# Your API endpoint
API_URL = "https://gmkdigitalmedia-ctapi-raw.hf.space/search"

def search_trials(query, top_k=10):
    """Search clinical trials using Option B API"""
    response = requests.post(
        API_URL,
        json={"query": query, "top_k": top_k}
    )
    return response.json()

# Use it
query = "what should a physician prescribing ianalumab for sjogrens know"
results = search_trials(query, top_k=5)

# Get structured data
trials = results["trials"]
for trial in trials:
    print(f"NCT ID: {trial['nct_id']}")
    print(f"Title: {trial['title']}")
    print(f"Relevance: {trial['scoring']['relevance_score']:.2%}")
    print(f"URL: {trial['url']}")
    print()

# Client generates their own response with their LLM
client_llm_response = their_llm.generate(
    f"Based on these trials: {trials}\nAnswer: {query}"
)

Client Code Example (JavaScript)

const API_URL = "https://gmkdigitalmedia-ctapi-raw.hf.space/search";

async function searchTrials(query, topK = 10) {
  const response = await fetch(API_URL, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ query, top_k: topK })
  });
  return response.json();
}

// Use it
const query = "what should a physician prescribing ianalumab for sjogrens know";
const results = await searchTrials(query, 5);

// Process results
results.trials.forEach(trial => {
  console.log(`NCT ID: ${trial.nct_id}`);
  console.log(`Title: ${trial.title}`);
  console.log(`Relevance: ${trial.scoring.relevance_score}`);
});

📊 Performance on HuggingFace

With GPU (Automatic on HF Spaces)

Query Parsing:  2-3s
RAG Search:     2-3s
355M Ranking:   2-3s (GPU-accelerated with @spaces.GPU)
Total:          7-10s

Resource Usage

RAM: ~10 GB (for 556K trials + embeddings + models)
GPU: T4 or better (automatic)
Storage: ~4 GB (data files cached)

🔧 Troubleshooting

If space doesn't start:

Check logs:
- Go to space settings → Logs
- Look for errors during data download or model loading
Common issues:
- Missing HF_TOKEN → Add in space secrets
- Out of memory → Increase hardware tier
- Data download fails → Check gmkdigitalmedia/foundation1.2-data exists
Check data files: Your space should download:
- dataset_chunks_TRIAL_AWARE.pkl (2.7 GB)
- dataset_embeddings_TRIAL_AWARE_FIXED.npy (816 MB)
- inverted_index_COMPREHENSIVE.pkl (308 MB)
These download automatically on first run.

If queries are slow:

Check GPU is enabled:
- Space settings → Hardware → Should be T4 or A10
- The @spaces.GPU decorator enables GPU for 355M ranking
First query is always slower:
- Models need to load (one-time)
- Subsequent queries are fast

✅ Verification Checklist

After deployment, verify:

Space is running (green badge)
/health endpoint returns healthy
/search returns JSON in 7-10s
Top trials have >90% relevance
Perplexity scores are calculated
No hallucinations (355M only scores)

📞 Client Onboarding

Send this to your clients:

🎉 Clinical Trial API - Option B

Fast foundational RAG for clinical trial search.

📍 Endpoint: https://gmkdigitalmedia-ctapi-raw.hf.space/search

⏱️  Response time: 7-10 seconds
💰 Cost: $0.001 per query
📊 Returns: Structured JSON with ranked trials

📖 Documentation: https://gmkdigitalmedia-ctapi-raw.hf.space/docs

Example:
curl -X POST https://gmkdigitalmedia-ctapi-raw.hf.space/search \
  -H "Content-Type: application/json" \
  -d '{"query": "ianalumab sjogren disease", "top_k": 10}'

Your LLM can then generate responses from the structured data.

🎯 Summary

You have everything ready to deploy!

✅ All code is in /mnt/c/Users/ibm/Documents/HF/CTapi-raw/
✅ Option B already implemented
✅ Tested locally (works perfectly!)
✅ Just needs to be pushed to HuggingFace

Next step:

cd /mnt/c/Users/ibm/Documents/HF/CTapi-raw
git push origin main

That's it! 🚀