I am Wenbo Ji, a second M.Sc. student at TUM working on video diffusion, world models, and robotic manipulation. I write about research, work, and lasting thoughts.

-
Embodied AI World Models
A robotics-friendly tutorial map of world models for embodied AI, explaining method families, robot data tuples, action grounding, closed-loop deployment, and how to start experimenting.
-
A Token's Journey Across the LLM Stack
Part 4 of The Life of a Token Across the LLM Stack: a synthesis of token, compute, memory, and request lifecycles for diagnosing LLM inference systems.
-
The Life of a Request Inside vLLM
Part 3 of The Life of a Token Across the LLM Stack: a serving-level walkthrough of how vLLM-style engines schedule prefill, decode, continuous batching, KV cache, and streamed tokens.
-
Why Transformer Performance Is Mostly a Matmul Problem
Part 2 of The Life of a Token Across the LLM Stack: a kernel-level explanation of how Transformer linear layers become GPU matmul, tiling, memory movement, and tensor-core execution.
-
The Life of a Token Inside a Transformer
Part 1 of The Life of a Token Across the LLM Stack: a systems-oriented explanation of how tokens move through embeddings, attention, MLPs, positional encoding, KV cache, and logits.
-
LLM Inference Is a Full-Stack Systems Problem
Part 0 of The Life of a Token Across the LLM Stack: an entry-point map of LLM inference across Transformer computation, GPU kernels, KV cache, and serving systems.
-
From π0 to π0.7: A Tutorial on Open-pi and Robot Foundation Models
A tutorial reading of Physical Intelligence's open-pi line — how VLA robot policies turn language into action, generalize through richer context, and become steerable through experience, memory, and world-model-generated visual subgoals.
-
Work Is Important Enough Not to Become Everything
A reflection on Simone Stolzoff's The Good Enough Job, and why work should be allowed to matter without becoming the whole measure of a life.
-
The Age of Auto Research, and Other Certainties
A reflection on auto research, AI scientist systems, and why automated local search makes human problem framing and research taste more important rather than obsolete.
-
On 3D, Video World Models, and the Approaching ImageNet Moment of Perception-Action Learning
How can perception-action learning reach its own ImageNet moment, and what role might 3D and video world models play in building causal, physically accurate representations of the world that are useful for action?