Our Alumni Work at Top Companies
LLM Engineering Course Curriculum
It stretches your mind, think better and create even better.
Topics:
Duration: 2 Weeks
0.1 Mathematical and Programming Foundations
Week 1: Core Mathematics and Python
Linear Algebra for LLMs
Matrix Operations and Decomposition
Eigenvalues and Eigenvectors
Tensor Operations
Attention Matrix Mathematics
Dimensionality and Projections
Calculus and Optimization
Gradients and Backpropagation
Chain Rule in Deep Networks
Optimization Landscapes
Convergence Theory
Stochastic Optimization
Probability and Information Theory
Probability Distributions
Bayes’ Theorem
Entropy and Cross-Entropy
KL Divergence
Mutual Information
Advanced Python for LLMs
Object-Oriented Programming
Functional Programming
Async/Await Patterns
Memory Management
Performance Profiling
Debugging Large Systems
0.2 Deep Learning and NLP Foundations
Week 2: Neural Networks and NLP
Deep Learning Essentials
Neural Network Architectures
Backpropagation Algorithm
Activation Functions
Regularization Techniques
Batch Normalization
Gradient Flow
Natural Language Processing
Tokenization Strategies
Word Embeddings (Word2Vec, GloVe)
Sequence Modeling Basics
Language Modeling Fundamentals
Evaluation Metrics
NLP Tasks Overview
PyTorch for LLMs
Tensor Operations
Autograd System
nn.Module Design
Custom Layers
Data Loading Pipeline
Distributed Features
Development Environment
GPU/TPU Setup
CUDA Programming Basics
Docker for ML
Experiment Tracking (W&B, MLflow)
Version Control for ML
Cloud Platforms
Lab Project
Implement a basic transformer from scratch in PyTorch
Topics:
Duration: 2 Weeks
1.1 Attention Mechanisms
Week 1: From RNNs to Attention
Evolution of Sequence Models
RNN and LSTM Limitations
Seq2Seq Models
Attention Revolution
Transformer Breakthrough
Why Attention Works
Self-Attention Mechanism
Query, Key, Value Computation
Scaled Dot-Product Attention
Attention Score Calculation
Softmax and Normalization
Attention Patterns Analysis
Multi-Head Attention
Head Projection Matrices
Parallel Attention Computation
Head Concatenation
Output Projection
Why Multiple Heads Matter
Attention Variants
Cross-Attention
Causal Attention
Bidirectional Attention
Local Attention
Sliding Window Attention
Mathematical Deep Dive
Attention as Matrix Multiplication
Complexity Analysis O(n²)
Memory Requirements
Gradient Flow Through Attention
Numerical Stability
1.2 Complete Transformer Architecture
Week 2: Building Blocks and Variants
Transformer Components
Input Embeddings
Positional Encodings
Layer Normalization
Feed-Forward Networks
Residual Connections
Output Layers
Positional Encoding Strategies
Sinusoidal Encodings
Learned Position Embeddings
Rotary Position Embeddings (RoPE)
ALiBi (Attention with Linear Biases)
Relative Position Encodings
Length Extrapolation
Architecture Variants
Encoder-Only (BERT-style)
Decoder-Only (GPT-style)
Encoder-Decoder (T5-style)
Prefix LM
UniLM Architecture
Implementation Details
Initialization Strategies
Gradient Checkpointing
Mixed Precision Training
Attention Masking
Padding Strategies
Project
Build a complete transformer model with configurable architecture
Topics:
Duration: 2 Weeks
2.1 State-of-the-Art Models
Week 1: Commercial and Open Models
GPT Family Evolution
GPT-3 Architecture (175B)
GPT-3.5 Improvements
GPT-4 Multimodal Design
GPT-4 Turbo Optimizations
Context Length Scaling
Claude Architecture
Constitutional AI Design
Context Window (100K+)
Safety Mechanisms
Training Innovations
Performance Characteristics
Google Models
PaLM and PaLM 2
Gemini Architecture
Mixture of Experts in Gemini
Multimodal Integration
Efficiency Improvements
Open Source Revolution
LLaMA/LLaMA 2 Architecture
Mistral and Mixtral MoE
Falcon Design Choices
Qwen Technical Details
Yi and DeepSeek Models
2.2 Architectural Innovations
Week 2: Cutting-Edge Techniques
Mixture of Experts (MoE)
Sparse Activation
Expert Routing
Load Balancing
Switch Transformers
Mixtral Implementation
Efficient Attention Mechanisms
Flash Attention v1/v2
Linear Attention
Performer Architecture
Reformer Techniques
Memory-Efficient Attention
Long Context Innovations
RoPE Scaling
YaRN Method
Positional Interpolation
Context Length Extension
Streaming Transformers
Model Scaling Strategies
Depth vs Width Scaling
Compound Scaling
Chinchilla Optimal
Training Compute Laws
Inference Compute Trade-offs
Lab: Implement and benchmark different attention mechanisms
Topics:
Duration: 3 Weeks
3.1 Data Engineering for LLMs
Week 1: Dataset Preparation
Data Collection
Web Crawling (Common Crawl)
Wikipedia and Books
Code Repositories
Scientific Papers
Multilingual Data
Domain-Specific Corpora
Data Processing Pipeline
Text Extraction
Deduplication Strategies
Quality Filtering
Language Detection
PII Removal
Format Standardization
Data Quality and Filtering
Perplexity Filtering
Repetition Detection
Toxicity Filtering
Domain Classification
Length Filtering
Statistical Analysis
Tokenization
BPE (Byte-Pair Encoding)
SentencePiece
WordPiece
Unigram Model
Custom Tokenizers
Multilingual Tokenization
Dataset Mixing
Domain Weighting
Upsampling Strategies
Curriculum Learning
Data Scheduling
Validation Set Creation
3.2 Training Objectives and Losses
Week 2: Learning Strategies
Language Modeling Objectives
Causal Language Modeling (CLM)
Masked Language Modeling (MLM)
Prefix Language Modeling
Span Corruption (T5)
Fill-in-the-Middle (FIM)
UL2 Unified Framework
Loss Functions
Cross-Entropy Loss
Perplexity Calculation
Label Smoothing
Focal Loss
Contrastive Losses
Training Dynamics
Learning Rate Schedules
Warm-up Strategies
Cosine Annealing
Gradient Clipping
Weight Decay
Optimization Algorithms
AdamW Optimizer
LAMB Optimizer
Adafactor
8-bit Optimizers
Sharpness-Aware Minimization
3.3 Distributed Training
Week 3: Training at Scale
Parallelism Strategies
Data Parallelism (DDP)
Model Parallelism
Pipeline Parallelism
Tensor Parallelism
3D Parallelism
Distributed Frameworks
PyTorch FSDP
DeepSpeed ZeRO
FairScale
Megatron-LM
JAX/Flax for TPUs
Memory Optimization
Gradient Checkpointing
CPU Offloading
Mixed Precision (FP16/BF16)
Gradient Accumulation
Activation Checkpointing
Training Infrastructure
Multi-GPU Setup
Multi-Node Training
NCCL Communication
InfiniBand Networks
Cloud Training (AWS, GCP, Azure)
Monitoring and Debugging
Loss Curves Analysis
Gradient Statistics
Training Instabilities
Checkpoint Management
Experiment Tracking
Project
Pre-train a small language model from scratch
Topics:
Duration: 2 Weeks
4.1 Training Infrastructure
Week 1: Hardware and Systems
Hardware Considerations
GPU Selection (A100, H100, MI300)
TPU Architecture and Usage
Memory Requirements
Interconnect Bandwidth
Storage Systems
Cluster Management
SLURM Job Scheduling
Kubernetes for ML
Resource Allocation
Queue Management
Fault Tolerance
Data Infrastructure
High-Performance Storage
Data Loading Optimization
Streaming Datasets
Caching Strategies
Prefetching
Cost Optimization
Spot Instance Training
Preemptible VMs
Resource Utilization
Training Efficiency
Budget Management
4.2 Engineering Best Practices
Week 2: Production Training
Code Organization
Modular Architecture
Configuration Management
Dependency Management
Testing Strategies
Documentation
Reproducibility
Seed Management
Deterministic Operations
Environment Versioning
Dataset Versioning
Result Validation
Continuous Training
Incremental Training
Online Learning
Model Updates
A/B Testing
Rollback Strategies
Team Collaboration
Code Review Practices
Experiment Sharing
Knowledge Transfer
Model Registry
Documentation Standards
Lab: Set up a distributed training pipeline
Topics:
Duration: 2 Weeks
5.1 Full Fine-tuning
Week 1: Traditional Approaches
Fine-tuning Strategies
Task-Specific Fine-tuning
Multi-Task Learning
Sequential Fine-tuning
Continual Learning
Domain Adaptation
Dataset Preparation
Instruction Datasets
Task Formatting
Data Augmentation
Synthetic Data Generation
Quality Control
Training Considerations
Learning Rate Selection
Batch Size Effects
Epochs vs Steps
Early Stopping
Overfitting Prevention
Catastrophic Forgetting
Problem Analysis
Elastic Weight Consolidation
Progressive Neural Networks
Memory Replay
Regularization Techniques
5.2 Parameter-Efficient Fine-tuning
Week 2: PEFT Methods
LoRA (Low-Rank Adaptation)
Mathematical Foundation
Rank Selection
Alpha Parameter
Dropout Strategies
LoRA Variants (QLoRA, LongLoRA)
Other PEFT Methods
Prefix Tuning
Prompt Tuning
Adapter Layers
IA³ (Infused Adapter)
BitFit
Advanced Techniques
LoRA Composition
Multi-LoRA Systems
Dynamic Rank Allocation
Task-Specific Adapters
Modular Fine-tuning
Efficiency Analysis
Memory Savings
Training Speed
Inference Overhead
Quality Trade-offs
Scaling Properties
Project: Implement and compare different PEFT methods
Topics:
Duration: 2 Weeks
6.1 Instruction Tuning
Week 1: Creating Instruction-Following Models
Instruction Datasets
Dataset Collection
Instruction Templates
Task Diversity
Quality Metrics
Data Mixing Strategies
Training Methodology
Supervised Fine-tuning (SFT)
Instruction Formatting
System Prompts
Multi-Turn Dialogues
Chain-of-Thought Training
Popular Datasets
Alpaca Dataset
ShareGPT
OpenAssistant
Dolly
FLAN Collection
Evaluation
Instruction Following
Task Generalization
Zero-Shot Performance
Benchmark Suites
Human Evaluation
6.2 Reinforcement Learning from Human Feedback
Week 2: RLHF Pipeline
RLHF Components
Reward Model Training
Policy Optimization
PPO Algorithm
Value Functions
KL Divergence Penalty
Reward Modeling
Preference Data Collection
Bradley-Terry Model
Ranking Losses
Reward Model Architecture
Calibration
Policy Optimization
PPO Implementation
Advantage Estimation
Clipping Strategies
Entropy Bonus
Training Stability
Alternative Approaches
DPO (Direct Preference Optimization)
RLAIF (RL from AI Feedback)
Constitutional AI
SLiC (Sequence Likelihood Calibration)
Rejection Sampling
Lab: Implement a complete RLHF pipeline
Topics:
Duration: 1 Week
7.1 Domain Adaptation
Vertical Fine-tuning
Medical LLMs
Legal LLMs
Financial LLMs
Scientific LLMs
Code LLMs
Multilingual Adaptation
Cross-lingual Transfer
Language-Specific Tuning
Code-Switching
Translation Tasks
Cultural Adaptation
Multimodal Extensions
Vision-Language Models
Audio Integration
Document Understanding
Video Processing
Embodied AI
7.2 Advanced Adaptation
Continual Learning
Lifelong Learning
Task Incremental Learning
Experience Replay
Dynamic Architecture
Knowledge Consolidation
Few-Shot Adaptation
In-Context Learning
Meta-Learning
Prompt Engineering
Demonstration Selection
Task Instructions
Project: Fine-tune an LLM for a specialized domain
Topics:
Duration: 2 Weeks
8.1 Quantization
Week 1: Reducing Model Size
Quantization Fundamentals
Float32 to Lower Precision
Symmetric vs Asymmetric
Per-Channel vs Per-Tensor
Dynamic vs Static
Quantization-Aware Training
Quantization Methods
INT8 Quantization
INT4 and Below
GPTQ Method
AWQ (Activation-aware)
SmoothQuant
Mixed Precision
FP16 Training and Inference
BF16 Advantages
Automatic Mixed Precision
Tensor Cores Utilization
Numerical Stability
Implementation
PyTorch Quantization
ONNX Quantization
TensorRT INT8
llama.cpp Quantization
bitsandbytes Library
8.2 Model Compression
Week 2: Advanced Optimization
Pruning Techniques
Magnitude Pruning
Structured Pruning
Gradual Pruning
Lottery Ticket Hypothesis
Movement Pruning
Knowledge Distillation
Teacher-Student Setup
Distillation Loss
Temperature Scaling
Feature Matching
Progressive Distillation
Model Architecture Search
Neural Architecture Search
Efficient Architectures
Compound Scaling
Hardware-Aware Design
AutoML for LLMs
Compilation and Optimization
TorchScript
TensorRT
ONNX Runtime
Graph Optimization
Kernel Fusion
Lab: Optimize an LLM for edge deployment
Topics:
Duration: 2 Weeks
9.1 Efficient Inference
Week 1: Speed Optimization
Attention Optimization
Flash Attention Implementation
PagedAttention (vLLM)
Multi-Query Attention
Grouped-Query Attention
Sparse Attention Patterns
KV Cache Management
Cache Structure
Memory Layout
Cache Compression
Sharing Strategies
Dynamic Allocation
Speculative Decoding
Draft Models
Verification Process
Acceptance Rates
Speed-Quality Trade-offs
Implementation Strategies
Batch Processing
Dynamic Batching
Continuous Batching
In-Flight Batching
Padding Strategies
Memory Management
9.2 Serving Infrastructure
Week 2: Production Deployment
Serving Frameworks
vLLM Architecture
TGI (Text Generation Inference)
TensorRT-LLM
Triton Inference Server
Ray Serve
API Design
REST API Design
Streaming Responses
WebSocket Support
gRPC Services
GraphQL Integration
Scaling Strategies
Horizontal Scaling
Model Replication
Load Balancing
Request Routing
Auto-scaling
Performance Monitoring
Latency Metrics
Throughput Analysis
GPU Utilization
Memory Monitoring
Cost Tracking
Project
Build a high-performance LLM serving system
Topics:
Duration: 2 Weeks
10.1 Application Architectures
Week 1: Building LLM Apps
RAG Systems
Architecture Design
Retrieval Strategies
Context Management
Answer Generation
Citation Systems
Agent Systems
Tool Integration
Memory Systems
Planning Modules
Action Execution
Multi-Agent Coordination
Conversational AI
Dialogue Management
Context Tracking
Persona Consistency
Turn-Taking
Error Recovery
Code Generation Systems
IDE Integration
Context Collection
Code Completion
Refactoring Support
Test Generation
10.2 Advanced Applications
Week 2: Complex Systems
Hybrid Systems
LLM + Traditional ML
Rule-Based Integration
Knowledge Graph Integration
Database Integration
External API Calls
Streaming Applications
Real-time Processing
Event-Driven Architecture
Stream Processing
WebSocket Implementation
Server-Sent Events
Multimodal Applications
Image + Text
Audio + Text
Video Understanding
Document Processing
Cross-Modal Search
Enterprise Integration
Authentication/Authorization
Audit Logging
Compliance Features
Data Privacy
Security Measures
Lab
Build a production LLM application
Topics:
Duration: 1 Week
11.1 Model Evaluation
Automatic Evaluation
Perplexity Measurement
BLEU, ROUGE Scores
BERTScore
Task-Specific Metrics
Benchmark Suites (MMLU, HellaSwag)
Human Evaluation
Evaluation Protocols
Inter-Rater Agreement
Bias Mitigation
Cost Management
Platform Design
A/B Testing
Experiment Design
Statistical Significance
Metric Selection
User Segmentation
Result Analysis
11.2 Quality Assurance
Testing Strategies
Unit Testing for LLMs
Integration Testing
Regression Testing
Performance Testing
Security Testing
Safety Evaluation
Toxicity Detection
Bias Measurement
Hallucination Detection
Factuality Checking
Adversarial Testing
Project
Implement comprehensive evaluation for an LLM system
Topics:
Duration: 1 Week
12.1 Cutting-Edge Research
Emerging Architectures
Mamba and State Space Models
RWKV Architecture
Retentive Networks
Hyena Hierarchy
Linear Transformers
Training Innovations
Flash Attention 3
Ring Attention
Blockwise Parallel
Sequence Parallelism
Pipeline Bubbles Reduction
Efficiency Research
1-bit LLMs
Extreme Quantization
Sparse Models
Conditional Computation
Early Exit Strategies
12.2 Future Directions
AGI Progress
Reasoning Capabilities
Planning and Agency
Tool Use Evolution
Multi-step Reasoning
Self-Improvement
Technical Challenges
Hallucination Mitigation
Consistency Improvement
Context Length Scaling
Multimodal Integration
Embodied Intelligence
Lab
Implement a research paper’s novel technique
TOOlS & PLATFORMS
Our AI Programs
Build a complete multi-agent customer service system with: - Natural language understanding - Intent recognition and routing - Knowledge base integration - Escalation handling - Sentiment analysis - Performance monitoring
Develop an AI research agent capable of: - Literature review automation - Data collection and analysis - Report generation - Citation management - Collaborative research - Quality validation
Create an agent system for business process automation: - Workflow orchestration - Document processing - Decision automation - Integration with enterprise systems - Compliance checking - Performance optimization
LEARNERS
BATCHES
YEARS
SUPPORT
100000+ uplifted through our hybrid classroom & online training, enriched by real-time projects and job support.
Come and chat with us about your goals over a cup of coffee.
2nd Floor, Hitech City Rd, Above Domino's, opp. Cyber Towers, Jai Hind Enclave, Hyderabad, Telangana.
3rd Floor, Site No 1&2 Saroj Square, Whitefield Main Road, Munnekollal Village Post, Marathahalli, Bengaluru, Karnataka.