MasterLLM▌Engineering.
A 4-week intensive workshop taught live by Dr. Raj Dandekar (MIT PhD) with 9 industry experts from Anthropic, NVIDIA, Apple & more.Can't attend live? All sessions are recorded for lifetime access.
Two phases.
One complete education.
14 lectures across 4 weeks. Each phase is self-contained — take one or both.
Apr 27 – May 10, 2026
May 11 – May 25, 2026
The tools that power
production inference.
You won't just learn theory — you'll build with the same frameworks used at Anthropic, NVIDIA, and Apple.
vLLM
High-throughput serving with PagedAttention, continuous batching & scheduler internals
SGLang
Fast inference with RadixAttention, structured generation & compiler-driven optimization
Ray Serve
Distributed serving & batch processing at scale — covered by Suman Debnath (AnyScale)
Megatron-LM
Large-scale model parallelism, distributed training & GRPO for online RL
FlashAttention
IO-aware attention kernels — memory-efficient tiling & online softmax
TensorRT-LLM
NVIDIA's optimized inference engine with INT4/INT8 quantization & kernel fusion
Build something
you can actually ship.
Each phase culminates in a hands-on capstone project that ties together everything you've learned.
Build a Speed-Optimized LLM Inference Server
Combine every optimization from L1–L7 into one deployable pipeline
Take a 7B model from raw weights to a fully optimized inference server — then benchmark every layer of the stack live.
OpenClaw-RL: Self-Improving WhatsApp AI Assistant
A full RL pipeline where your everyday messages become training data
Build and deploy a personal AI assistant that improves from every conversation using reinforcement learning — no labeling, no datasets.
Don't just learn it.
Run it on real hardware.
Dedicated lab days included in every phase. Every device has a different bottleneck — and you'll benchmark each one live.
Your Own Laptop / PC
Set up llama.cpp, run your first inference, benchmark tok/s across model sizes on your own machine.

Raspberry Pi 4
Quantization experiments on ARM. Compare INT4 vs INT8 latency. Power-aware inference on a 1.5GHz Quad-core Cortex-A72.
Android Device
SmolChat-Android live session with Shubham Panchal. Deploy a real LLM on your phone.

Jetson Orin Nano
CUDA inference on edge GPU. TensorRT-LLM on Jetson. GPU vs CPU throughput battle. Demo by Dr. Raj — not yet confirmed.
Lab Day 3 (Android · Shubham Panchal) is a confirmed 3-hour workshop. Jetson Orin Nano demo is to be decided. All other labs conducted by Dr. Raj Dandekar.
What you need to get started
Most labs run on hardware you already own. Only the Raspberry Pi 4 needs to be purchased separately. The Jetson Orin Nano is optional — Dr. Raj will demo it live if the session is confirmed.
Your Own Laptop / PC
RequiredRequired · Any OS
Lab Day 1 uses llama.cpp and benchmarking tools that run on any modern laptop or desktop — macOS, Windows, or Linux.
You already have this
Raspberry Pi 4 Model B
RequiredRequired · 1.5GHz Quad-core, up to 8GB RAM
Used in Lab Day 2 for ARM inference and quantization experiments. Broadcom BCM2711 SoC, dual-band Wi-Fi, Bluetooth 5.0, Gigabit Ethernet, 2x USB 3.0, 2x USB 2.0.
Android Phone
RequiredRequired · Any Android 10+
Lab Day 3 with Shubham Panchal. Any Android 10+ phone with at least 6GB RAM works. You almost certainly already own one.
You likely already own this
Jetson Orin Nano
OptionalOptional · Demo to be decided
This is a bit expensive, so if you don't have it, it's fine — Dr. Raj will demo this live in the workshop. NVIDIA Ampere GPU, 8GB RAM, 1024-core CUDA. Demo session not yet confirmed.
Hardware labs are included in Phase 1 and Phase 2 — no extra fee for lab access. Hardware devices must be purchased separately. Prices and availability vary by region.
Built for engineers
who want to go deep.
Engineers transitioning into ML infrastructure or AI engineering
Students targeting roles at Anthropic, NVIDIA, Microsoft, Apple, Amazon
Engineers who want to go beyond using LLMs — to building inference systems
Researchers who need production engineering depth alongside theory
Leave
interview-ready.
Top-company interview question:
"Design a low-latency, high-throughput LLM inference system handling millions of requests. Walk me through the engineering trade-offs."
Asked at Anthropic, NVIDIA, Microsoft, Meta, Google DeepMind. You will have a complete answer.
Answer end-to-end inference system design questions in any ML interview
Explain low-latency, high-throughput, cost-optimised LLM serving at scale
Deploy real LLMs on your own laptop, Raspberry Pi 4, Android & Jetson Orin Nano
Build industry-level portfolio projects from hands-on hardware lab days
Get career insights directly from engineers at Anthropic, NVIDIA & Microsoft
Learn from engineers
at the frontier.
9 industry experts from Anthropic, NVIDIA, Microsoft, Apple, AnyScale, Red Hat, Amazon and more. Sessions at Anthropic, NVIDIA, Microsoft and Apple include a dedicated career insights segment.
Start your research
with a head start.
Don't start from scratch. Tell us your topic of interest and we'll generate a personalised research roadmap and an initial version of your research paper — delivered asynchronously, so you can hit the ground running from day one.
What's in the kit
Personalised Research Roadmap (PDF)
You tell us your topic of interest. We generate an 8-week structured plan with milestones, deliverables, and acceptance criteria — tailored to your specific research area. Includes literature review scope, data pipeline design, experiment matrix, and manuscript timeline. Delivered asynchronously.
Initial Research Paper Draft
We generate an initial version of your research paper — research questions framed, methodology outlined, related work surveyed, and experiment setup defined. You don't start with a blank page — you start with a 6–8 page scaffold ready to build on. Delivered asynchronously based on your topic.
Curated Paper Reading List
12–15 handpicked papers relevant to your topic with reading order, key takeaways, and connections between papers. Includes a literature matrix template for systematic tracking.
Starter Code Template
A clean, documented codebase scaffold for your research project — data loading, training loop, evaluation pipeline, and experiment config. Ready to run on day one.
Example research topics
Your roadmap is personalised to your background and goals. Here are some topics our students have worked on:
Vision-Language Planning for Autonomous Navigation with Nano-Scale Models
Knowledge Distillation for Edge LLM Deployment on Jetson
Efficient Speculative Decoding for On-Device Inference
RT-DETR for Real-Time BEV Perception in Driving Simulators
KV Cache Compression for Memory-Constrained Serving
Multimodal Inference Pipelines with Sub-200M Parameter Models
Quantization-Aware Training for Mobile LLM Deployment
Cache-Aware Routing for Multi-Model Inference Systems
Personal guidance from
industry & research leaders.
Two months of 1:1 mentorship with Yash Dixit and Dr. Raj Dandekar. One live call every two weeks — where they review your progress, guide your next steps, and help you work towards a publishable research paper. Get both industrial and research exposure from mentors at Apple, McKinsey, and MIT.

Yash Dixit
AI/ML Product Manager · Apple
View LinkedIn ProfileApple
AI/ML Product Manager — on-device intelligence, ML product strategy, CoreML
McKinsey
Management Consultant — data-driven strategy for Fortune 500 clients
MIT
Graduate research in AI/ML systems and applied machine learning
IIT
Undergraduate engineering — top-tier technical foundation

Dr. Raj Dandekar
Founder, Vizuara AI Labs · MIT PhD
View LinkedIn ProfileMIT
PhD in AI/ML — scientific machine learning, neural ODEs, physics-informed models
Vizuara AI Labs
Founder — building AI education and inference infrastructure
Published Researcher
Multiple publications in top ML venues — NeurIPS, AAAI, Nature
What mentorship includes
1:1 Call Every Two Weeks
4 live sessions over 2 months. Yash and Dr. Raj personally review your progress, give feedback, and set the direction for your next two-week sprint.
Target: Publishable Paper
The goal is a research paper. Your mentors guide you from topic selection through experiments to a publication-ready manuscript.
Every Step Guided
Literature review, experiment design, ablation studies, writing — your mentors walk you through every step of the research process so you never feel stuck.
Industry + Research Exposure
Get career strategy from Yash (Apple, McKinsey) and deep research guidance from Dr. Raj (MIT PhD, published researcher). Both perspectives in one mentorship.
Paper Reading Guidance
Curated reading lists, paper discussion, and feedback on how to extract and apply insights from the literature.
Actionable Next Steps
Every session ends with clear deliverables and deadlines. You always know exactly what to do next.
Choose your workshop.
Select what you need. Everything adjusts instantly.
Step 1 — Choose your base
Foundations & Optimization
Apr 27 – May 10, 2026 · 7 lectures
Production & Edge Deployment
May 11 – May 25, 2026 · 7 lectures
Step 2 — Add-ons (optional)
Guest Speaker Pass
All 9 sessions — Anthropic, NVIDIA, Microsoft, Apple, AnyScale, Red Hat, Amazon and more
+₹50,000
Research Roadmap + Code Starter
Personalised roadmap PDF + starter code template for your research project
+₹15,000
1:1 Research Mentorship — 2 Months
with Yash Dixit, AI/ML Product Manager at Apple · 4 bi-weekly sessions
+₹70,000

Dr. Raj Dandekar
MIT PhD · Vizuara AI Labs
Dr. Raj Dandekar
MIT PhD · Co-founder & Director, Vizuara AI Labs
Dr. Raj holds a PhD from MIT and is the co-founder and director of Vizuara AI Labs. He has built a 50,000+ subscriber YouTube channel dedicated to teaching LLMs from first principles, and has taught 200+ engineers across previous workshop cohorts.
His teaching philosophy: visual intuition first, mathematical rigour second, hands-on implementation always. Every concept is taught from scratch — no hand-waving.
Common questions.
About the Workshop
Research Starter Kit
Research Mentorship with Yash Dixit
Guest Speakers