Home / Programs / Data Analytics / Data Science & AI Agents

Class 014 · DATA SCIENCE & AI AGENTS · RESEARCH + AGENTIC AI

Data Science
+ AI Agents

Master end-to-end Data Science with Agentic AI. Build Python and PostgreSQL data foundations, ship statistical and ML experiments with proper cross-validation, engineer Deep Learning and NLP models with interpretability, and deploy a Data Science AI Agent that runs autonomous EDA and hypothesis generation.

3mo

duration

50+

modules

4.7/5

class rating

100k+

enrolled

What you'll ship by week 24

DSA-014 · 13 JUL

A rigorous data science portfolio

EDA reports with data quality assessment, hypothesis tests with p-values, A/B analyses, and regression models backed by statistical rigour.

A complete ML modelling portfolio

Supervised, unsupervised, and reinforcement learning with scikit-learn — Optuna tuning, MLflow tracking, and SHAP/LIME interpretability.

A Deep Learning + NLP applied portfolio

CNNs, RNNs, and Transformers for vision and sequence modelling, plus a BERT-based NLP pipeline deployed via FastAPI with attention visualisation.

A deployed Data Science AI Agent

A production agent on LangGraph + Claude Agent SDK + MCP that runs autonomous EDA, generates hypotheses, and proposes experiments with LangSmith observability.

Where our Data Science alumni work

Microsoft

Amazon

Salesforce

ServiceNow

Deloitte

Infosys

Accenture

TCS

Wipro

Capgemini

Cognizant

HCL

Microsoft

Amazon

Salesforce

ServiceNow

Deloitte

Infosys

Accenture

TCS

Wipro

Capgemini

Cognizant

HCL

What you leave with

Four things every Data Scientist grad walks away with.

Agent-Ready data science skills

Statistical rigour plus the full ML/DL/NLP and GenAI stack — NumPy, Pandas, PyTorch, TensorFlow, LangGraph, Claude Agent SDK, MCP — production-grade, not notebook demos.

A shipped project

A production-deployed Data Science AI Agent built with LangGraph + Claude Agent SDK + MCP that runs autonomous EDA, statistical tests, and feature engineering with a public verification URL.

Verifiable credential

2026 Agent-Ready rubric mapped to DP-100, AWS ML Specialty, Google Pro ML Engineer, and TensorFlow Developer, graded 1–5, with a public verification URL recruiters can check in 30 seconds.

Direct placement pipeline

GitHub + LinkedIn portfolio rewrite, DS-tuned resume rebuild, and warm intros into our 1,000+ hiring partners actively staffing Data Scientist and ML/AI Research roles.

3 MONTHS · FOUR PHASES · ONE DATA SCIENCE AGENT

From “plots a scatter chart” to ships intelligent data agents..

Weeks 1–2 · Data Foundations

Python + PostgreSQL for Data Scientists

Python fundamentals, OOP, decorators, generators, and context managers
Data structures and file handling for messy real-world CSV/JSON
PostgreSQL — DDL, DML, JOINs, window functions, CTEs, PL/pgSQL
Database design and query optimisation with EXPLAIN ANALYZE

YOU SHIPA production-quality Python codebase plus a normalised PostgreSQL schema with analytics queries — the foundation every later ML run sits on.

Weeks 3–7 · Scientific Toolkit

Power BI + Math/Stats + Python Libraries

Power BI with DAX, time intelligence, and Row-Level Security
Linear algebra, calculus, probability, and Bayes' theorem foundations
Hypothesis testing, A/B tests, power analysis, and Bayesian inference
NumPy, Pandas, Matplotlib, Seaborn, and Plotly for data work

YOU SHIPA rigorous EDA portfolio with hypothesis tests, A/B analyses with power calculations, and a Power BI dashboard suite.

Weeks 8–12 · ML + Deep Learning + NLP

Machine Learning + Deep Learning + NLP

Regression, classification, and tree ensembles including XGBoost
Unsupervised learning, clustering, PCA, t-SNE, and UMAP
Model evaluation, Optuna tuning, MLflow tracking, SHAP and LIME
Deep learning with CNNs, Transformers, and modern NLP pipelines

YOU SHIPA complete ML/DL/NLP portfolio — cross-validated models, a CNN image classifier, and a Transformer NLP model with attention.

Weeks 12–14 · GenAI + Agentic AI

Master the 2026 GenAI + Agentic AI stack — and ship a Data Science AI Agent that runs autonomous EDA, proposes experiments, and explains results in plain English.

Engineer with LLM APIs from OpenAI, Anthropic, Google GenAI, and DeepSeek. Master prompt engineering (zero-shot, few-shot, CoT, ReAct) and context engineering — the 2026 frontier discipline. Build production RAG pipelines with ChromaDB, Pinecone, Qdrant, and pgvector over your datasets, notebooks, and research papers. Master the 2026 production agent stack — LangGraph 1.0 (#1 production default), Claude Agent SDK (#2 MCP-native), CrewAI (#3 multi-agent crews). Wire it all through the Model Context Protocol (MCP) — 200+ server implementations, 97M+ monthly SDK downloads. Final project — a deployed Data Science AI Agent with MCP servers exposing your data sources, trained models, and statistical libraries. The data scientist’s force multiplier.

Partner orgs (2026)54

Data Science projects deployed300+

←’ Placement offers86%

Course curriculum

Seven sections. 65+ modules. The AI-native data science stack.

Python for AI & Data

The dominant language for data science. Master Python syntax, data structures, and advanced programming concepts essential for AI and data science applications. 10 modules from environment setup through advanced OOP — the language fluency that makes every later section possible.

10 MODULES
SECTION 1

Python interpreter installation for Windows and Mac

Visual Studio Code + Jupyter for data science workflows

Python's 35 essential keywords

Identifiers and naming conventions

Variables and memory management fundamentals

Simple and complex data types

Type conversion and casting

Arithmetic, comparison, and logical operators

User input with input() function

Conditional statements — if, elif, else, match-case

while and for loops with range() function

break, continue, and pass statements

String definition and syntax rules

Positive and negative indexing

Slicing with start:end:step notation

Concatenation and repetition operations

f-strings and .format() for analysis output formatting

Immutability concept and implications

String methods — case conversion, search, checking, trimming, replacement, split/join

Critical for text data preprocessing in NLP analyses

Lists — creation, indexing, slicing, modification operations

List comprehensions for elegant data transformation

Sorting, reversing, copying patterns

Tuples — creation and basic operations

Tuple packing and unpacking

Performance advantages over lists

Use cases for immutability in scientific computing

Dictionaries — creation, access, operations

Dictionary comprehensions

Nested dictionaries for structured experimental data

Essential for representing experimental configurations and results

Sets and the UUU properties (Unique, Unordered, Unindexed)

Mathematical operations — union, intersection, difference

Subset and superset checks

Applications in deduplication and set algebra

Collections module — namedtuple, Counter, defaultdict, deque

Iteration protocol and custom iterators

Generators using yield for memory-efficient processing of large datasets

Generator expressions and pipelines

Lambda functions for anonymous operations

Higher-order functions — map(), filter(), reduce()

Functional patterns for reproducible data transformations

Function definition, parameters, return values

Default arguments, *args, **kwargs

Variable scope (LEGB rule)

First-class functions and higher-order patterns

Recursion and recursive design patterns

Type hints (Python 3.5+) — essential for self-documenting analysis code

Documenting functions with docstrings (PEP 257)

Built-in modules, user-defined modules, packages

pip for package management

requirements.txt for reproducible analyses

Virtual environments for project isolation

Reproducibility is non-negotiable in data science — invest here

CRUD operations with open()

File modes and pathlib

CSV files — csv module for tabular data ingestion

JSON files — json module for API responses and config files

The most common data input formats for data science work

Exception Handling — robust error handling for unreliable data sources

Decorators — for logging experiment runs, timing functions, caching results

Generators Deep Dive — memory efficiency for processing larger-than-RAM datasets

Context Managers — proper resource management for database connections, file handles

These four patterns are what every senior data science role expects fluency in

Object-Oriented Programming models real-world entities through classes and objects — essential for structuring data pipelines, custom estimators, and reproducible analysis workflows

Classes, objects, methods, special methods

Instance variables vs class variables

Encapsulation — access modifiers control data visibility

Inheritance — single, multi-level, and multiple inheritance

Abstraction — abstract classes and methods

Polymorphism — method overriding and duck typing

scikit-learn's estimator pattern relies on OOP — master it before Section 6

SQL for AI & Data

The data backbone of every data scientist's daily workflow. Five modules covering PostgreSQL from foundations through advanced analytics queries — window functions, CTEs, programming with PL/pgSQL — the data layer that powers your experiments and your A/B test analyses.

5 MODULES
SECTION 2

Databases, DBMS, RDBMS — concepts and terminology

ACID properties — Atomicity, Consistency, Isolation, Durability

PostgreSQL setup, psql, pgAdmin 4, DBeaver

Data types — numeric, character, date/time, boolean, JSON, arrays

Constraints — PRIMARY KEY, FOREIGN KEY, UNIQUE, NOT NULL, CHECK, DEFAULT

SELECT statements and column projection

WHERE clauses with operators and conditions

Built-in functions — string, numeric, date, conditional

Aggregates — COUNT, SUM, AVG, MIN, MAX

GROUP BY and HAVING for aggregation

Window functions — ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD — essential for cohort analysis and time-series work

JOIN operations — INNER, LEFT, RIGHT, FULL OUTER, CROSS, SELF

Subqueries — scalar, row, table subqueries

CTEs (Common Table Expressions) — the data scientist's favourite for readable analytical queries

Recursive CTEs for hierarchical data

Set operators — UNION, UNION ALL, INTERSECT, EXCEPT

DML — INSERT, UPDATE, DELETE patterns

Transactions — atomicity for data integrity

ALTER TABLE for schema evolution

Indexes — B-tree, Hash, GiST, GIN

Views — virtual tables, materialized views for caching analytical aggregates

PL/pgSQL — variables, control structures, exception handling

Stored procedures vs functions

Triggers — automation for data quality checks

ER modelling, normalization (1NF, 2NF, 3NF)

OLTP vs analytics workload patterns

Star schema design for analytics

Query plan analysis with EXPLAIN and EXPLAIN ANALYZE

Index strategies

VACUUM, ANALYZE, partitioning

A scientist who can't read EXPLAIN ANALYZE is at the mercy of slow queries forever — invest here

Power BI for Data Analysis

The data scientist's tool for stakeholder communication. Power BI provides a comprehensive platform for connecting to data sources, creating interactive visualisations, and sharing insights across organisations through intuitive dashboards and reports — turning your analysis into the kind of executive dashboards that drive business decisions.

10 MODULES
SECTION 3

BI fundamentals and modern analytics approaches

Power BI components — Desktop, Service, Mobile, Gateway

Interface navigation, workspace setup, first report creation

Desktop versus Service capabilities

File, database, cloud, web sources

Connection modes — Import, DirectQuery, Live Connection

Performance considerations for analytical workloads

M language fundamentals

Step-by-step transformations

Data profiling and quality assessment

Reshaping operations

Star schema design for analytical reporting

Relationships — one-to-many, many-to-many

Hierarchies — date, geography, organisational

Calculated columns vs measures

Core visualisations — bar, line, scatter, pie, treemap, waterfall

Choosing the right chart for the right insight

Interactive elements — slicers, filters, bookmarks, drill-through

DAX syntax and structure

Aggregation, logical, text, date/time functions

CALCULATE and FILTER — the heart of DAX

Variables for performance and readability

Year-over-year, quarter-over-quarter comparisons

Period-over-period growth calculations

Custom calendar handling

Iterator functions (SUMX, AVERAGEX, COUNTX)

ALL, ALLEXCEPT for filter manipulation

AI visuals — Q&A, Key Influencers, Decomposition Tree

Workspaces, apps for distribution

Sharing and access management

Row-Level Security (RLS) — dynamic security with USERPRINCIPALNAME()

Sensitivity labels

DAX performance tuning, composite models

Foundation for the Microsoft Certified: Power BI Data Analyst Associate (PL-300) certification

Math & Stats for AI & Data

The mathematical foundation that distinguishes Data Scientists from data analysts. Four modules covering linear algebra, calculus, probability theory, and inferential statistics — including Bayesian methods — the rigorous foundation behind every meaningful analytical recommendation.

4 MODULES
SECTION 4

Set theory and logical operations

Functions and graphs — linear, polynomial, exponential, logarithmic

Understanding mathematical relationships

Vector operations, dot product, magnitude

Matrix creation, operations, transpose, inverse, determinant

Matrix algebra and applications in PCA, regression

Systems of linear equations

Gaussian elimination method

Eigenvalues and eigenvectors

Diagonalisation and matrix decomposition

Introduction to PCA intuition

Limits, derivatives, partial derivatives

The chain rule — the mathematics of model training

Optimisation foundations

Definite and indefinite integrals

Probability density functions

Critical points and extrema

Gradient descent — the engine of ML training

Higher-dimensional optimisation

Sample spaces, events, probability axioms

Conditional probability and Bayes' Theorem

Independent vs dependent events

The probability chain rule

Discrete distributions — Bernoulli, Binomial, Poisson

Continuous distributions — Normal, Uniform, Exponential

Distribution properties — mean, variance, skewness

Central Limit Theorem

Joint and marginal distributions

Covariance and correlation

Multivariate Normal distribution

Measures of central tendency, spread

Distribution shape — skewness, kurtosis

The most important module for data scientists — get this right and your analyses are trustworthy

Null and alternative hypotheses

Type I and Type II errors

p-values, significance levels, statistical power

t-tests (one-sample, two-sample, paired)

ANOVA for comparing multiple groups

Chi-square tests for categorical data

Non-parametric tests — Mann-Whitney, Kruskal-Wallis

A/B Testing — proper experimental design

Power analysis and sample size calculation

Multiple testing correction (Bonferroni, FDR)

Sequential testing pitfalls

p-hacking and reproducibility

Confidence intervals — construction and interpretation

Bootstrap confidence intervals

Bayesian vs frequentist statistics

Priors, likelihoods, and posteriors

Bayesian A/B testing — modern best practice

Introduction to MCMC concepts

PyMC and Stan ecosystems

The hypothesis testing and Bayesian methods you learn here are what make every later ML evaluation defensible to scientific scrutiny

Python Libraries for AI & Data

The Python ecosystem that powers daily data science work. Five modules covering NumPy for vectorised computing, Pandas for data manipulation, and the visualisation trio (Matplotlib, Seaborn, Plotly) — your daily-driver libraries.

5 MODULES
SECTION 5

N-dimensional arrays — the foundation of all numerical Python

Array creation, indexing, slicing

Vectorised operations and broadcasting

Aggregations along axes

Linear algebra with np.linalg

Why vectorisation is 100x faster than Python loops

Series and DataFrame structures

Index manipulation and MultiIndex

Data loading — read_csv, read_excel, read_sql, read_json

head, tail, info, describe

Handling missing data — multiple imputation strategies

Removing duplicates, type conversion, string operations

groupby operations — the split-apply-combine pattern

Pivot tables and crosstabs

Merging and joining DataFrames

Reshaping — pivot, melt, stack, unstack

Time series — DateTime indexing, resampling, rolling windows

Time zone handling

Figure and Axes architecture

Object-oriented vs pyplot interfaces

Plot types — line, scatter, bar, histogram, box, heatmap

Customisation, annotations, legends

Subplots with subplots and GridSpec

Saving figures at publication quality

Distribution plots — histplot, kdeplot, boxplot, violinplot

Relational plots — scatterplot, lineplot, relplot

Categorical plots — barplot, countplot, pointplot

Matrix plots — heatmap, clustermap

FacetGrid for multi-panel comparative plots

PairPlot and JointPlot for distribution comparisons

Plotly Express for quick interactive plots

Plotly Graph Objects for custom control

Subplots and dashboards

Animation frames for time-series visualisations

Plotly Dash for web-deployed analytical dashboards

Machine Learning

The heart of data science — 10 modules covering supervised, unsupervised, and reinforcement learning, with proper cross-validation, hyperparameter tuning, and model interpretability (SHAP, LIME). Every module emphasises scientific rigour, reproducibility, and evidence-based modelling decisions.

10 MODULES
SECTION 6

What is Machine Learning? AI vs ML vs Deep Learning

The scientific method applied to learning algorithms

The ML workflow — data ←’ features ←’ model ←’ evaluation ←’ interpretation

Supervised learning with labelled data

Unsupervised learning for pattern discovery

Reinforcement learning through interaction

Models, parameters, and hyperparameters

Role of optimisers in training

Train/validation/test splits — proper experimental design

Handling missing values — multiple imputation strategies

Encoding categorical variables — one-hot, label, target encoding

Feature scaling — StandardScaler, MinMaxScaler, RobustScaler

Feature engineering — polynomial features, binning, datetime features

Class imbalance — SMOTE, undersampling, class weights, threshold tuning

Simple and multiple linear regression

Ordinary Least Squares (OLS) estimation

Coefficient interpretation

Assumption checking — linearity, normality, homoscedasticity, independence

Diagnostic plots — residual plots, Q-Q plots

Train/test split and cross-validation

Regression metrics — MAE, MSE, RMSE, R², MAPE

Bias-variance tradeoff

Learning curves for model diagnosis

Gradient descent — batch, stochastic, mini-batch

Learning rate selection and convergence diagnostics

Multivariate regression with multicollinearity (VIF)

Interaction terms and their interpretation

Polynomial regression and feature transformation

Bias-Variance Tradeoff fundamentals

Understanding overfitting vs underfitting

Ridge Regression (L2) and Lasso Regression (L1)

Elastic Net combining both

Cross-validation for hyperparameter selection

GridSearchCV, RandomizedSearchCV, Optuna for Bayesian optimisation

Binary vs multi-class classification

Logistic regression and sigmoid function

Log Loss (Binary Cross-Entropy)

Decision boundary visualisation

Multi-class strategies (One-vs-Rest, One-vs-One, Softmax)

Naive Bayes — probabilistic classification based on Bayes' Theorem

Types of Naive Bayes (Gaussian, Multinomial, Bernoulli)

Laplace smoothing for zero probabilities

Applications in spam detection and sentiment analysis

Tree construction algorithms (ID3, C4.5, CART)

Splitting criteria — Gini impurity, entropy, information gain

Tree pruning to prevent overfitting

Interpretability advantages — trees you can explain to executives

Random Forests — ensemble methods, bagging and bootstrap aggregation

Feature importance ranking

Out-of-bag (OOB) evaluation

Gradient Boosting — XGBoost, LightGBM, CatBoost — the production champions

Boosting vs bagging

Early stopping and learning rate scheduling

Feature importance in boosted models

Linear and non-linear SVMs

Kernel trick — RBF, polynomial, sigmoid kernels

Soft-margin classification

K-Nearest Neighbours — distance metrics and weighting

Curse of dimensionality

Optimal k selection

Classification metrics — accuracy, precision, recall, F1-score

ROC curve and AUC

Precision-recall curves for imbalanced data

Confusion matrix interpretation

Calibration curves and threshold tuning

SHAP (SHapley Additive exPlanations) — game-theoretic feature attribution

LIME (Local Interpretable Model-agnostic Explanations)

Partial dependence plots

Permutation importance

Interpretable models build stakeholder trust — and meet regulatory requirements (EU AI Act)

Discover hidden patterns in unlabelled data through clustering and dimensionality reduction

K-Means Clustering and elbow method

Hierarchical Clustering and dendrograms

DBSCAN (density-based clustering)

Gaussian Mixture Models (GMM)

Clustering evaluation metrics — silhouette, Davies-Bouldin

Principal Component Analysis (PCA) for feature extraction

t-SNE for visualisation

UMAP for non-linear manifold learning

Curse of dimensionality

Feature extraction vs feature selection

RL fundamentals and paradigm

Agent, Environment, State, Action, Reward

Markov Decision Process (MDP)

Q-Learning Algorithm and Bellman equation

Exploration vs Exploitation tradeoff

Model persistence (pickle, joblib, ONNX)

Creating prediction APIs (Flask/FastAPI)

Model versioning and A/B testing

Model monitoring and drift detection

ML best practices and ethics

Responsible AI principles

MLflow for reproducible experiment tracking

Section Project — build, evaluate, and deploy a complete ML model with SHAP interpretability, proper cross-validation, and a publication-quality results report

Deep Learning & NLP

Where data science meets the frontier. 10 modules covering neural networks from scratch through PyTorch/TensorFlow, CNNs for computer vision, RNNs and Transformers for sequence work, and complete NLP pipelines including named entity recognition — all with the interpretability and rigour data scientists demand.

10 MODULES
SECTION 7

Introduction to artificial neural networks and perceptrons

Activation functions — sigmoid, tanh, ReLU, and variants

Forward propagation and backpropagation algorithms

Gradient descent and optimisation techniques (SGD, Adam, RMSprop)

Loss functions and performance metrics

Building ANNs from scratch in NumPy — understand before you use

Overfitting, underfitting, and regularisation

Introduction to PyTorch and TensorFlow

Tensors, computational graphs, and automatic differentiation

Building neural networks with PyTorch nn.Module

Data loading, batching, and augmentation

Training loops, validation, and checkpointing

GPU acceleration and performance optimisation

Debugging neural networks

Image representations and pixel manipulation

Image augmentation techniques

Transfer learning fundamentals

CNN architecture — convolution, pooling, padding, stride

Famous architectures — LeNet, AlexNet, VGG, ResNet, Inception

Transfer learning with pre-trained models

Image classification end-to-end

Attention visualisation and feature maps for interpretability

Object detection introduction (YOLO, Faster R-CNN)

RNN architecture and challenges

Vanishing/exploding gradients

LSTM (Long Short-Term Memory) networks

GRU (Gated Recurrent Units)

Bidirectional RNNs

Time series prediction with RNNs

Attention mechanism fundamentals

Self-attention and multi-head attention

Transformer architecture — encoder, decoder, positional encoding

Why Transformers replaced RNNs for most NLP

Introduction to BERT and GPT architectures

Text preprocessing and tokenisation

Bag of Words and TF-IDF

Word embeddings — Word2Vec, GloVe, FastText

Training custom word embeddings

Language modelling fundamentals

N-gram models and neural language models

Evaluation metrics for NLP

Text classification pipeline design

Feature extraction for text data

CNN for text classification

RNN and LSTM for text classification

Sentiment analysis techniques

Multi-class and multi-label classification

Handling imbalanced text datasets

Model evaluation and error analysis

Sequence-to-sequence architecture deep dive

Machine translation fundamentals

Encoder-decoder models

Handling long sequences

Beam search and decoding strategies

Evaluation metrics (BLEU, ROUGE)

Training strategies

Named Entity Recognition (NER) fundamentals

Part-of-speech tagging

Sequence labelling with RNNs and LSTMs

BiLSTM-CRF for NER

Information extraction

Relation extraction

Domain-specific NER systems

Evaluation metrics

Section Project — a computer vision model (CNN) plus an NLP model (BERT-based), both deployed with proper evaluation, attention visualisation, and a publication-quality results report

Generative AI & Agentic AI

The frontier — and the culmination of the data science programme. 10 modules covering the complete 2026 GenAI engineering stack tuned for data science work: frontier models, prompt engineering, RAG over scientific literature, agent frameworks, and the Model Context Protocol. The named Data Science AI Agent project lives here.

10 MODULES
SECTION 8

Narrow AI — image classifiers, speech recognition (pre-2022)

Generative AI — LLMs, image/video/audio generation (post-2022)

Agentic AI — Plan / Reason / Act / Learn loops (post-2024)

2022 inflection point — ChatGPT launch

2024 inflection point — Agentic emergence

For Data Scientists — AI assistants that draft EDA reports

Hypothesis generation from data summaries

Autonomous experiment design (with human approval)

GPT-5.5 — Terminal-Bench 2.0 leader at 82.7%. Best for autonomous research agents

Claude Opus 4.7 — SWE-bench Pro leader at 64.3%, lowest hallucination rate at 36%. Best for long-form analysis writing and statistical reasoning

Gemini 3.1 Pro — 2M+ token context window. Best for ingesting entire research papers and codebases

Open-source frontier — Llama 4, DeepSeek, Mistral, Qwen — for sensitive data workflows

Intelligent routing — Opus 4.7 for analytical writing and interpretation

GPT-5.5 for autonomous research agents

Gemini 3.1 Pro for massive-context analysis

Perplexity — citation-grounded research

NotebookLM — long-document analysis for papers

Eight lessons culminating in a portfolio-ready 30+ prompt library for data science work

Fundamentals — Context + Task + Examples + Format + Constraints

Core Techniques — Zero-shot, few-shot, CoT, ReAct, Tree-of-Thought

System Prompts — persistent persona design, guardrails

Multimodal — reading plots, dashboards, hand-drawn diagrams

Hallucination & Context — grounding for accurate analyses

Domain & Library — data science prompt patterns

Context Engineering — the 2026 frontier discipline; managing what enters the LLM's context window for accurate scientific reasoning

ChatGPT, Claude, Gemini for daily data science work

AI for analytical writing — reports, summaries, executive memos

Research with Perplexity for literature reviews

AI for code — GitHub Copilot, Cursor for analysis acceleration

Building data-science-specific AI workflows

Reading plots and dashboards with vision models

Analysing whiteboard sketches and hand-drawn DAGs

OCR for legacy research documents

Image generation for paper figures

Audio — Whisper for interview transcription

Multimodal LLMs now read your scatter plots and identify outliers better than novice analysts

Hallucination — when an LLM invents a statistical fact

Prompt injection through documents

Privacy — keeping research data out of public LLMs

Regulatory landscape — EU AI Act, India DPDP Act

Validating AI-generated analyses — the data scientist's new responsibility

Streamlit — rapid prototyping for internal data science apps

FastAPI — production-grade Python API for AI services

Building chatbots for analytical Q&A

Building diagnostic agents for data quality

Build and deploy a Streamlit + FastAPI internal tool

LLM APIs in production — OpenAI, Anthropic, Google GenAI, DeepSeek Python SDKs

API patterns — completions, streaming, function calling, structured outputs

Rate limits, retries, exponential backoff

Function calling & structured outputs — the 2026 production pattern for reliable JSON

Pydantic-validated structured outputs

Embeddings — OpenAI text-embedding-3-large, Voyage, Cohere

Vector databases — ChromaDB, Pinecone, Qdrant, pgvector

HNSW, IVF indexing strategies

RAG pipeline for data science — canonical flow applied to research literature, dataset documentation, and analytical reports

Chunking strategies — fixed-size, semantic, hierarchical

Hybrid search (BM25 + embeddings)

Re-ranking with cross-encoders

Agentic RAG — self-improving retrieval over scientific literature

Multi-step retrieval — find relevant papers, extract methods, compare findings

Project — Internal Data Science RAG App: RAG over your team's analyses, datasets, and research papers

LangGraph 1.0 — complex stateful workflows — the production default for autonomous data science

Claude Agent SDK — deepest MCP integration, extended thinking for complex analysis

CrewAI — multi-agent crews; use case: a "data science team" of agents (EDA Lead, Hypothesis Generator, Modeller, Reviewer)

Semantic Kernel / Microsoft Agent Framework

Pydantic AI — type-safe Python

ReAct — investigate a dataset, then propose an analysis

Plan-and-Execute — generate a multi-step analysis plan

Reflection loops — agent reviews its own analysis before finalising

Multi-agent collaboration — EDA agent proposes, Modeller agent builds, Reviewer agent critiques

Human-in-the-loop checkpoints — humans approve every interpretation-critical step

MCP — open standard for connecting agents to tools, data, and systems

Proposed by Anthropic in late 2024

Stewarded by Linux Foundation — 200+ servers, 97M+ monthly SDK downloads

Build an MCP server exposing your trained models for agent inspection

Build an MCP server exposing your data sources and notebooks

Connect LangGraph agents to multiple MCP servers

Use Claude Agent SDK's deepest native MCP integration

DATA SCIENCE AI AGENT CAPSTONE — multi-agent Data Science AI Agent using LangGraph + Claude Agent SDK with MCP servers exposing your data sources, trained models, and analysis notebooks

The agent runs autonomous EDA, suggests appropriate statistical tests, proposes feature engineering, and explains results in plain English — with human approval gates for every interpretation-critical step

Frontend with Streamlit, backend with FastAPI, observability via LangSmith — public verification URL with the 2026 Agent-Ready rubric — the named project for the entire Data Science & AI programme

Tools you'll master

32+ ML, DL & AI tools, one production project.

Real-time projects

You don't watch videos. You ship software.

Three full-production projects, each threaded through the entire curriculum. By the project, you've built the whole stack around them.

Hero project · weeks 3–12

Production ML system: train ←’ serve ←’ monitor

Build an end-to-end production ML pipeline — reproducible training, model serving, drift monitoring, and an LLM augmentation layer that explains predictions and answers analyst questions.

01Reproducible training pipeline — DVC-versioned data & features, MLflow experiments, Optuna hyperparameter search, Weights & Biases tracking.

02Model-serving stack — Triton or BentoML server packaging your model + ONNX runtime, with FastAPI gateway, autoscaling on Kubernetes.

03Drift + quality monitoring — Evidently dashboards for data & concept drift, alerting hooks into Slack, retraining triggers.

04LLM augmentation layer — a small Hugging Face pipeline + a LangChain RAG layer that explains predictions and answers analyst questions.

Outcome: AUC +12 pts vs baseline

p95 inference: <120ms

Reviewer: ML Engineering panel

PyTorchMLflowTritonEvidentlyLangChain

Enterprise · weeks 6–11

NLP & LLM fine-tuning

Train a domain-tuned transformer end-to-end — Hugging Face PEFT/LoRA fine-tuning, evaluation on golden datasets, ONNX export, served via Triton with token-level latency dashboards.

Hugging FacePEFTONNXTriton

Real-time · weeks 8–12

Real-time recommendation system

Build a candidate-generation + re-ranking recommender on Spark + Feast feature store, served on SageMaker / BentoML, with online evaluation and Evidently drift monitoring.

SparkFeastBentoMLSageMaker

Project · weeks 11–12

Your ML system in a real partner org.

Pick a real partner ML problem. Deploy a production system end-to-end — feature store, training pipeline, model serving, drift monitoring, LLM explanation layer — into a partner team that's running it for real users.

Download the real world project

Full scope, sample partner orgs, weekly milestones, and grading rubric — PDF, 14 pages.

2026: 220+ deployed76% ←’ placement offers

Your instructor

Taught by engineers who shipped agentic AI to production.

Manikanta Kona

Founder, Digital Lync · AI & Data Science Architect

Python · PyTorch · TensorFlow · scikit-learn · Hugging Face · LangChain

"A 2026 data scientist doesn't stop at notebooks. They ship the training pipeline, stand up the model behind FastAPI, monitor drift in production, and wire an LLM into the loop so the business actually understands what the model is saying. That's the bar I teach to, every cohort."

15 yrs

AI & DATA SCIENCE

2,400+

LEARNERS

4.9 /5

RATING

Manikanta is the founder of Digital Lync and brings 15 years of applied AI & data science from AT&T, Salesforce, Cox Communications, and Broadcom — where he led recommendation, fraud, forecasting, NLP and computer-vision systems for Fortune-500 banks, telcos, and insurers. Most recently he architected production ML pipelines that pair classical and deep models with an LLM augmentation layer that explains predictions to business stakeholders.

His classes get you two things other programs don't give you: a founding architect who still ships production ML, and a curriculum rewritten every quarter to match what hiring managers actually ask about — credentials like AWS Machine Learning Specialty, Azure AI Engineer, Databricks ML Associate, TensorFlow Developer, and Pragmatic AI Engineer included. M.S. in Engineering, Purdue University.

Ravi Krishna

Chief Technologist, Digital Lync · ML Engineering & MLOps Lead

PyTorch · MLflow · Triton · ONNX · Spark · MLOps · LLM Fine-tuning

"MLOps is where data science stops being a notebook and starts being a system — reproducible training pipelines you can re-run a year later, model serving you can stake an SLA on, drift monitoring that's quiet on purpose, and an LLM layer that explains decisions to the people who own the business outcome. That's what I teach."

10 yrs

ML ENGINEERING

1,800+

LEARNERS

4.8 /5

RATING

Ravi is Chief Technologist at Digital Lync, where he leads the ML engineering and MLOps practice. After 8 years building and running production ML pipelines, he stepped into the Chief Technologist seat to wire MLflow, Triton, Evidently, and Hugging Face into the way ML teams actually work — feature stores that stay accurate through retrains, drift monitoring that filters noise before it hits humans, and serving stacks that on-call engineers don't fight with.

His MLOps modules are built from real production post-mortems, not slide decks. Expect to leave with working training pipelines, model serving on Triton/BentoML, drift dashboards in Evidently, and an LLM fine-tuning workflow you can stake an SLA on. Ten years at Digital Lync, eight of them shipping production ML — Hyderabad-based, hands-on, and known for the unglamorous parts of data science that everyone else skips.

HIRING PARTNERS · INDUSTRY VOICES

What ML & data science employers say about Digital Lync grads.

Real feedback from data and ML leaders at AI-first companies and the firms hiring our AI & Data Science graduates.

Digital Lync grads ramp 40% faster on production ML deploys than typical data science hires. Best AI & Data Science pipeline in India.

Aakash Mehta, Engineering Director, Microsoft

We've onboarded 80+ Digital Lync alumni in 18 months. Lowest ramp time we've seen for production ML systems and drift monitoring practices.

Anita Sharma, Senior Manager, Deloitte

The AI & Data Science programme is comprehensive — PyTorch, MLflow, Triton, MLOps. Grads come pre-trained for production ML & LLM engineering.

Rahul Bhatt, Solutions Lead, Mphasis

Their MLOps + drift monitoring track produces PMs who ship production-grade ML systems on day one. Rare combination of modeling rigor and engineering craft.

Deepak Pillai, Senior Architect, TCS

What sets Digital Lync apart is the production ML layer baked into the data science track. Our enterprise clients ask for exactly this profile.

Suresh Menon, Practice Lead, Accenture

Their AWS ML Specialty + Pragmatic AI Engineer prep is rigorous, and the shipped project — training pipeline, model serving, drift monitoring — is what closes interviews for us.

Vikram Iyer, Director, Infosys

Digital Lync's Data scientists ship production ML systems twice as fast in the first 90 days. Our internal modeling metrics back this up clearly.

Lakshmi Nair, VP Engineering, Wipro

Best AI & Data Science pipeline we've sourced from in India. Their projects are real production deploys, not notebooks.

Karthik Subramanian, Engineering Director, Cognizant

Strong PyTorch and MLOps foundation. Their AI/Data grads need almost zero ramp time on enterprise ML engagements with us.

Arun Joshi, Practice Director, Capgemini

We've placed 40+ Digital Lync alumni across our ML and watsonx engineering teams. Strong fundamentals, sharp on eval and drift monitoring.

Sanjay Verma, Talent Director, IBM

ML systems + drift monitoring is exactly the talent gap we've been struggling to close. Digital Lync is filling it for us reliably.

Anjali Desai, Practice Head, LTIMindtree

Their AI/Data Science track delivers engineers who navigate PyTorch, MLflow, and Triton on customer engagements unsupervised.

Ramesh Iyer, Senior Manager, Tech Mahindra

Hired 25+ Digital Lync graduates for our ML engineering practice. Strong on PyTorch, sharp on MLflow, fluent in MLOps.

Geetha Pillai, Talent Acquisition Lead, Cyient

Digital Lync grads who blend ML systems with Azure ML evals land production-ready on day one. Rare combination, well-trained.

Priya Reddy, Talent Lead, Microsoft

03Program certifications

An Agent‑Ready credential, not a participation trophy.

Digital Lync · Institute Certificate

Agent‑Ready AI & Data Scientist

Presented to

Spandana Bala

For the successful training, deployment, and monitoring of a production ML system — reproducible training pipeline, model-serving stack, drift monitoring, and LLM augmentation — evaluated against the AWS ML Specialty, Azure AI Engineer, Databricks ML Associate, and Pragmatic AI Engineer credential rubrics.

Manikanta Kona

CEO · Digital Lync

AGENT
READY
2026

Industry‑recognized

Co‑branded with the ML engineering community and mapped to AWS ML Specialty and Pragmatic AI Engineer credentials — names that hiring managers already scan for on resumes.

Project artifact included

Every certificate carries your shipped project — training pipeline, model serving, drift monitoring, LLM augmentation — with a link to the live partner-org deployment. Proof, not a promise.

Enhanced skill validation

Graded against the 2026 Agent‑Ready rubric: training pipelines, model serving, drift monitoring, LLM augmentation, MLOps automation. No pass/fail — a level 1‑5 band.

Verifiable on a public URL

Each credential has a public verification page recruiters can check in 10 seconds — no PDF back‑and‑forth.

04Job placement support

Your first AI/Data Science offer isn't a lottery ticket. It's a built process.

GitHub, LinkedIn, resume — and most importantly, warm intros into AI labs and ML-heavy product orgs. Our placement team works your search like an account, not a helpdesk.

01 / GITHUB & PORTFOLIO

A portfolio, not a graveyard.

Guidance on building a portfolio that showcases your training pipeline, model server, drift dashboard, LLM augmentation layer, and a public verification URL — reviewed 1:1, not via template.

02 / RESUME PREP

Rewrite, don't proofread.

A one-page resume rebuilt around the ML systems you shipped (training pipelines, model serving, drift monitoring), the partner-org project, and the business outcome. Reviewed by ML engineers who've read 10,000+ resumes.

03 / LINKEDIN + INTROS

Where most opportunities actually live.

Profile tuning plus direct warm introductions into AI labs and ML-heavy product orgs — Microsoft, Anthropic, OpenAI partners, Hugging Face, Databricks, Snowflake, Scale AI, NVIDIA, Stripe, Razorpay, plus services that staff data science teams (Deloitte, Accenture, Cognizant, TCS). You leave with recruiter contacts, not a generic "good luck."

AI & Data Science alumni

Hundreds of ML & data science careers launched — here are eight.

Spandana Bala

AI/Data Scientist

Hyderabad · India

Now at · Microsoft

Naveen Vedala

Senior ML Engineer

Hyderabad · India

Now at · Atlassian

Tejashwini Addla

Staff Data Scientist (LLM)

Hyderabad · India

Now at · Salesforce

Tharunesh Dillikar

Principal ML Engineer

Seattle · United States

Now at · Scale AI

Mujahed Mohammed

NLP Engineer

Hyderabad · India

Now at · Databricks

Bhargav Kumar Murala

MLOps Engineer

Hyderabad · India

Now at · Adobe

Sai Manasa Leburi

Computer Vision Engineer

New York · United States

Now at · Hugging Face

Rahul Dhamma

Recommendation Systems Engineer

Hyderabad · India

Now at · NVIDIA

Our locations

Come chat with us — over coffee, or over Zoom.

One flagship campus in Hyderabad, plus online Principal ML Engineer cohorts running on Indian and US timezones.

Flagship campus

Hyderabad

2nd Floor, Hitech City Road · Above Domino's · Opp. Cyber Towers, Jai Hind Enclave · Hyderabad, Telangana

Call

+91 81858 87766

US desk

+1 346 588 7766

Hours

Mon–Sat · 9am–9pm

Online class

Global

Weekend and evening AI/Data Science cohorts running on IST and PST. Every online cohort ships the same shipped project — training pipeline, model serving, drift monitoring, LLM augmentation — as the on‑campus track.

Timezones

IST & PST

Format

Live + 1:1 mentorship

Next class

15 JUL 2026

FAQ

Questions we actually get — answered honestly.

Straight answers on prerequisites, the ML/DL stack, certifications, and placement. If something's missing, book a 20-minute advisor call — no slides, no pitch.

Do I need a CS or stats background?+

No on both counts. Roughly 40% of every class comes from non-CS streams — engineering, math, physics, BCom, BBA, and self-taught modelers. Weeks 1–2 cover the NumPy/Pandas fundamentals, scikit-learn patterns, and the ML training loop from scratch. What you do need: consistency and 12–15 hours a week.

Will I actually ship production ML, or only do notebooks?+

You actually ship to production. Every learner builds a reproducible training pipeline with DVC + MLflow + Optuna, packages the model with ONNX + Triton/BentoML, deploys it on Kubernetes with autoscaling, and sets up Evidently drift monitoring. The project runs in a partner org — not a notebook.

Which frameworks and tools will I use?+

Core ML: Python, NumPy, Pandas, Polars, scikit-learn. Deep learning: PyTorch, TensorFlow, JAX, Hugging Face. MLOps: MLflow, Weights & Biases, DVC, Airflow, Prefect, Feast. Serving: Triton, BentoML, ONNX, SageMaker, Modal. Monitoring: Evidently, Arize.

Will I prep for AIPMM AI/Data Scientist and Pragmatic Principal ML Engineer certs?+

Yes. The curriculum is mapped to the AIPMM AI/Data Scientist track and the Pragmatic Principal ML Engineer credential. We run two full mock exams and reimburse the voucher fee on first-attempt pass.

What's the time commitment per week?+

Plan for 12–15 hours: 2 live classes × 2 hours, 1 lab × 3 hours training models on your dev tenant, and ~5 hours of project work (training pipelines, serving, drift). Saturday office hours with the TA team are optional, but most learners use them.

Is placement support really 1:1, and which companies hire data scientists / ML engineers?+

Yes — a dedicated placement advisor from week 8, not a helpdesk. AI product hiring partners include Microsoft, Adobe, Salesforce, Atlassian, Notion, Linear, Anthropic, Hugging Face, Databricks, Snowflake, Stripe, Razorpay, Freshworks, Zoho, and Postman. Resume, LinkedIn, mock interviews, and warm intros are individual.

Online, weekend, or on-campus?+

All three. On-campus at the Hyderabad flagship, live online (IST and PST cohorts), and a weekend track for working professionals. Every format ships the same shipped project — training pipeline, model serving, drift monitoring, LLM augmentation — only the schedule changes.

What if I fall behind, or can't continue mid-class?+

Freeze your seat for up to 90 days and rejoin the next class — no extra fee. TAs run catch-up sessions every Saturday for anyone more than a week behind, and recordings of every live session are available for the lifetime of your account.

Still have a question? Talk to an advisor — no slides, no pitch.

Class ADS-023 starts 13 JUL 2026.
40 seats. 12 already claimed.

Book a 20-minute advisor call. We'll walk through the curriculum, match it to your current role, and show you two real projects from class 022.

CLASS ADS-023 3 MONTHS STARTS 13 JUL ⚡ONLY 13 SEATS LEFT · 17 / 30 CLAIMED

Call us Chat with us

Data Science+ AI Agents

Four things every Data Scientist grad walks away with.

From “plots a scatter chart” to ships intelligent data agents..

Python + PostgreSQL for Data Scientists

Power BI + Math/Stats + Python Libraries

Machine Learning + Deep Learning + NLP

Master the 2026 GenAI + Agentic AI stack — and ship a Data Science AI Agent that runs autonomous EDA, proposes experiments, and explains results in plain English.

Seven sections. 65+ modules. The AI-native data science stack.

Python for AI & Data

SQL for AI & Data

Power BI for Data Analysis

Math & Stats for AI & Data

Python Libraries for AI & Data

Machine Learning

Deep Learning & NLP

Generative AI & Agentic AI

32+ ML, DL & AI tools, one production project.

You don't watch videos. You ship software.

Production ML system: train ←’ serve ←’ monitor

NLP & LLM fine-tuning

Real-time recommendation system

Your ML system in a real partner org.

Taught by engineers who shipped agentic AI to production.

What ML & data science employers say about Digital Lync grads.

An Agent‑Ready credential, not a participation trophy.

Your first AI/Data Science offer isn't a lottery ticket. It's a built process.

A portfolio, not a graveyard.

Rewrite, don't proofread.

Where most opportunities actually live.

Hundreds of ML & data science careers launched — here are eight.

Come chat with us — over coffee, or over Zoom.

Questions we actually get — answered honestly.

Class ADS-023 starts 13 JUL 2026.40 seats. 12 already claimed.

Get Skilled

Data Science
+ AI Agents

Class ADS-023 starts 13 JUL 2026.
40 seats. 12 already claimed.