Chris Lattner (of Modular, focused on building a next-generation AI infrastructure stack) describes the DeepSeek moment as a wake-up call that showcased how working at the low-level PTX stack (Nvidia’s undocumented intermediate GPU instruction set), i.e. beyond CUDA, enables true control over compute and drives innovation. He praised DeepSeek’s team for pushing AI research forward with impressive engineering (advancing Multi-head Linear Attention/MLA, low-precision training, and reverse-engineering undocumented PTX tensor core instructions), openly publishing their methods, and revealing capabilities previously thought exclusive to trillion-dollar companies. This openness, he emphasized, temporarily caught the industry off guard and effectively accelerated global AI progress by about six months. Main industry lesson was: real power lies in controlling the compute stack, not just the models. Modular’s bet is to generalize compute infrastructure across hardware via Mojo (a new programming language designed to combine Python usability with systems-level performance; partially open source), i.e. it “gets rid of all of CUDA” and operates directly at the PTX or even lower level, replacing the entire stack, which is currently brittle because model kernels are tightly tied to GPU generation.
Here is the entire interview with Chris Lattner:
Share this post