translingo / .cursorrules
Ratan1's picture
Initial commit: Complete TransLingo translation system
1620846
# TransLingo - Full-Stack Transformer Translation System
## Project Overview
Build a complete, production-ready machine translation system with:
- Transformer model from scratch (following "Attention is All You Need")
- Web interface for user interaction
- REST API for programmatic access
- Deployment-ready architecture
Target: 26+ BLEU score with user-friendly interface
## Complete System Architecture
### Core Components
1. **ML Pipeline**: Transformer model (German-English translation)
2. **Backend API**: FastAPI/Flask serving model
3. **Frontend**: React/Gradio interface
4. **Infrastructure**: Docker, caching, monitoring
## Architecture Specifications
- Model dimension (d_model): 512
- Feed-forward dimension (d_ff): 2048
- Number of attention heads: 8
- Number of encoder/decoder layers: 6
- Dropout rate: 0.1
- Vocabulary size: ~37,000 tokens (shared BPE)
- Maximum sequence length: 100 tokens
- Label smoothing: 0.1
## Complete Project Structure
```
translingo/
β”œβ”€β”€ data/
β”‚ β”œβ”€β”€ download.py
β”‚ └── preprocessing.py
β”œβ”€β”€ model/
β”‚ β”œβ”€β”€ transformer.py
β”‚ β”œβ”€β”€ attention.py
β”‚ β”œβ”€β”€ embeddings.py
β”‚ └── layers.py
β”œβ”€β”€ training/
β”‚ β”œβ”€β”€ train.py
β”‚ β”œβ”€β”€ optimizer.py
β”‚ └── loss.py
β”œβ”€β”€ inference/
β”‚ β”œβ”€β”€ beam_search.py
β”‚ └── translate.py
β”œβ”€β”€ utils/
β”‚ β”œβ”€β”€ metrics.py
β”‚ └── visualization.py
β”œβ”€β”€ api/
β”‚ β”œβ”€β”€ app.py (FastAPI)
β”‚ β”œβ”€β”€ routes.py
β”‚ └── middleware.py
β”œβ”€β”€ frontend/
β”‚ β”œβ”€β”€ gradio_app.py (Quick start)
β”‚ β”œβ”€β”€ streamlit_app.py (Alternative)
β”‚ └── web/ (Full React app)
β”œβ”€β”€ deployment/
β”‚ β”œβ”€β”€ Dockerfile
β”‚ β”œβ”€β”€ docker-compose.yml
β”‚ └── kubernetes/
└── configs/
└── config.yaml
```
## Implementation Phases
### Phase 1: Data Pipeline
1. Implement data downloader for Multi30k dataset (German-English)
- Use torchtext.datasets or download directly
- Split: train (29k), valid (1k), test (1k)
2. Create BPE tokenizer
- Use sentencepiece library
- Train on combined source and target text
- Vocabulary size: 37,000 tokens
- Add special tokens: <pad>, <sos>, <eos>, <unk>
3. Build dataset class
- Handle sequence padding dynamically
- Create source and target masks properly
- Implement bucketing by sequence length for efficiency
- Return tensors ready for model input
### Phase 2: Core Transformer Components
1. Scaled Dot-Product Attention
- Implement attention(Q, K, V) = softmax(QK^T / sqrt(d_k))V
- Handle padding mask and future mask correctly
- Store attention weights for visualization
2. Multi-Head Attention
- Split d_model into h heads of size d_k = d_model/h
- Project Q, K, V separately for each head
- Concatenate heads and project output
- Include residual connection and layer norm
3. Positional Encoding
- Use sinusoidal encoding as per paper
- PE(pos, 2i) = sin(pos/10000^(2i/d_model))
- PE(pos, 2i+1) = cos(pos/10000^(2i/d_model))
- Add to embeddings, scale embeddings by sqrt(d_model)
4. Feed-Forward Network
- Two linear layers with ReLU activation
- Inner dimension = 2048 (4x model dimension)
- Include dropout between layers
- Residual connection and layer norm
5. Encoder Block
- Self-attention sublayer
- Feed-forward sublayer
- Each with residual connection and layer norm
- Apply dropout after each sublayer
6. Decoder Block
- Masked self-attention sublayer
- Cross-attention sublayer (attending to encoder output)
- Feed-forward sublayer
- Each with residual connection and layer norm
7. Full Transformer Model
- Stack N=6 encoder and decoder blocks
- Share embeddings between encoder, decoder, and output projection
- Initialize with Xavier/Glorot initialization
### Phase 3: Training Components
1. Noam Learning Rate Schedule
- lrate = d_model^(-0.5) * min(step^(-0.5), step * warmup^(-1.5))
- Warmup steps = 4000
- Use with Adam optimizer (beta1=0.9, beta2=0.98, eps=1e-9)
2. Label Smoothing Loss
- Implement KL divergence loss with smoothing
- Distribute smoothing_value/(vocab_size-1) to non-target tokens
- Keep 1-smoothing_value for target token
- Mask padding tokens from loss calculation
3. Training Loop
- Implement gradient accumulation (4-8 steps)
- Gradient clipping (max norm = 1.0)
- Teacher forcing during training
- Log training metrics to TensorBoard
- Save checkpoints every epoch
- Implement early stopping based on validation BLEU
### Phase 4: Inference and Evaluation
1. Beam Search Decoder
- Beam size = 4
- Length penalty alpha = 0.6
- Coverage penalty to avoid repetition
- Handle <eos> token properly
2. BLEU Score Calculation
- Use sacrebleu library for standard evaluation
- Calculate BLEU-4 with smoothing
- Evaluate on both validation and test sets
3. Attention Visualization
- Extract and save attention weights
- Create heatmaps for each attention head
- Visualize encoder self-attention, decoder self-attention, and cross-attention
### Phase 5: Frontend Development
1. **Quick Demo Interface (Gradio)**
- Create gradio_app.py in frontend/
- Text input/output boxes
- Language selection dropdowns
- Translation button
- Confidence score display
- Attention visualization toggle
- Examples for quick testing
- Share=True for public URL
2. **Alternative Interface (Streamlit)**
- Create streamlit_app.py as backup option
- Two-column layout (source | target)
- Real-time translation option
- History sidebar
- Download translations as file
- Batch upload capability
3. **Production Web App (React)**
- Create full React application in frontend/web/
- Components:
- TranslationBox component
- LanguageSelector component
- AttentionVisualizer component
- HistoryPanel component
- Features:
- Real-time translation debouncing
- Copy to clipboard
- Share translation link
- Dark mode toggle
- Mobile responsive
- State management with Context API or Redux
- Axios for API calls
### Phase 6: Backend API Development
1. **FastAPI Application**
- Create api/app.py
- Endpoints:
- POST /translate - single translation
- POST /batch_translate - multiple texts
- GET /languages - supported language pairs
- GET /model_info - model statistics
- POST /feedback - user feedback collection
- Request/Response models with Pydantic
- CORS middleware for frontend
- Rate limiting (100 requests/minute)
- API key authentication option
- Request logging
2. **Model Serving Optimization**
- Load model once at startup
- Implement caching layer (Redis)
- Batch inference for concurrent requests
- GPU/CPU automatic selection
- Model versioning support
- A/B testing capability
3. **WebSocket Support**
- Real-time translation endpoint
- Stream partial translations
- Low latency for typing experience
### Phase 7: Deployment
1. **Docker Configuration**
- Multi-stage Dockerfile:
- Base: Python 3.9 + PyTorch
- Build: Install dependencies
- Runtime: Minimal image
- docker-compose.yml:
- App service (FastAPI)
- Redis service (caching)
- Nginx (reverse proxy)
- Frontend service
2. **Local Deployment**
- Single command startup: docker-compose up
- Environment variables in .env file
- Volume mounting for model weights
- Hot reload for development
3. **Cloud Deployment Options**
a. **Heroku** (Easiest for demo):
- Procfile configuration
- heroku.yml for container deployment
- Free dyno limitations handling
- Slug size optimization
b. **AWS EC2** (Production):
- EC2 instance with GPU (p2.xlarge)
- Application Load Balancer
- Auto-scaling group
- S3 for model storage
- CloudWatch monitoring
c. **Google Cloud Run** (Serverless):
- Container registry setup
- Automatic scaling configuration
- Cloud Storage for models
- Cloud CDN for static assets
d. **HuggingFace Spaces** (ML-focused):
- Direct Gradio deployment
- Free GPU access
- Community visibility
4. **Kubernetes Deployment**
- Deployment manifests
- Service configuration
- Ingress controller
- Horizontal Pod Autoscaler
- ConfigMaps and Secrets
### Phase 8: Production Features
1. **Monitoring & Logging**
- Prometheus metrics
- Grafana dashboards
- ELK stack for logs
- Sentry for error tracking
- Performance monitoring
2. **Advanced Features**
- Document translation (PDF, DOCX)
- Language auto-detection
- Translation memory/cache
- User accounts & history
- API documentation (Swagger/OpenAPI)
- Webhook support
- Translation confidence scores
- Alternative translations
3. **Performance Optimization**
- Model quantization for faster inference
- ONNX export option
- TorchScript compilation
- Response caching strategy
- CDN for static assets
- Database indexing for history
## Critical Implementation Notes
### Model-Specific
- MASK HANDLING: Always create proper padding masks and causal masks for decoder
- MEMORY MANAGEMENT: Clear cache regularly, use gradient accumulation
- INITIALIZATION: Use Xavier initialization for all linear layers
- EMBEDDING SCALING: Multiply embeddings by sqrt(d_model) before positional encoding
- CROSS-ATTENTION: Decoder must attend to encoder output (not decoder input)
- CHECKPOINT: Save model, optimizer, scheduler, and epoch
### Frontend-Specific
- STATE MANAGEMENT: Use React Context for global state
- ERROR HANDLING: Show user-friendly error messages
- LOADING STATES: Implement skeletons and spinners
- ACCESSIBILITY: ARIA labels, keyboard navigation
- SEO: Meta tags for social sharing
- PERFORMANCE: Lazy loading, code splitting
### API-Specific
- VALIDATION: Validate all inputs with Pydantic
- SECURITY: Rate limiting, input sanitization
- CACHING: Cache frequent translations
- VERSIONING: API versioning from start (/v1/)
- DOCUMENTATION: Auto-generate with FastAPI
- TESTING: Unit tests for all endpoints
## Testing Strategy
1. **ML Model Tests**
- Unit tests for each component
- Integration tests with small data
- Performance benchmarks
- BLEU score validation
2. **API Tests**
- Endpoint unit tests
- Load testing with Locust
- Error handling tests
- Authentication tests
3. **Frontend Tests**
- Component testing (Jest)
- E2E testing (Cypress)
- Accessibility testing
- Cross-browser testing
4. **Integration Tests**
- Full pipeline testing
- Docker compose testing
- Deployment verification
## Development Workflow
1. Start with Gradio for quick prototype
2. Train minimal model (2 layers) for testing
3. Build FastAPI backend
4. Create production model
5. Develop React frontend
6. Dockerize everything
7. Deploy to cloud
8. Add monitoring
## Success Metrics
### Model Performance
- Training loss < 2.0 after 10 epochs
- Validation BLEU > 20 after 20 epochs
- Final BLEU > 26 after full training
- Inference time < 500ms per sentence
### System Performance
- API response time < 1 second
- 99.9% uptime
- Support 100 concurrent users
- < 5% error rate
### User Experience
- Translation accuracy > 85% user satisfaction
- Page load time < 2 seconds
- Mobile responsive design
- Intuitive UI/UX
## Environment Setup
```bash
# Required packages
pip install torch torchtext
pip install fastapi uvicorn
pip install gradio streamlit
pip install sentencepiece sacrebleu
pip install redis celery
pip install pytest locust
pip install docker-compose
# Frontend setup
npm install react axios
npm install @mui/material
npm install recharts # for visualizations
```
## Common Pitfalls to Avoid
### Model Training
- Don't forget to scale embeddings by sqrt(d_model)
- Don't use teacher forcing during evaluation
- Don't mix up src_mask and tgt_mask
### Frontend Development
- Don't make synchronous API calls
- Don't forget loading states
- Don't ignore mobile users
### Deployment
- Don't hardcode credentials
- Don't skip health checks
- Don't forget CORS configuration
- Don't deploy without monitoring
## Code Style Requirements
- Use type hints for all functions
- Add docstrings with examples
- Follow PEP 8 for Python
- Follow Airbnb style guide for JavaScript
- Use meaningful variable names
- Implement proper error handling
- Add comprehensive logging
## Timeline
- Phase 1-4 (Core ML): 15-20 hours
- Phase 5 (Frontend): 8-10 hours
- Phase 6 (API): 5-6 hours
- Phase 7 (Deployment): 6-8 hours
- Phase 8 (Production): 10-12 hours
- Total: ~45-55 hours
## Demo First Approach
1. Start with Gradio interface immediately
2. Use pre-trained model if available for quick demo
3. Show working translation in < 2 hours
4. Iterate and improve from there
5. Deploy to HuggingFace Spaces for instant sharing