Wenbo Ji

Master’s student and researcher at TUM, working on video diffusion, world models, and robotic manipulation.

I write technical notes on research and engineering, with occasional reflections on work and life.

Snow-covered Zugspitze summit beneath a blue sky. — Zugspitze2,962 m · Bavaria

When Execution Gets Cheap, What Still Compounds?

What remains scarce when AI makes execution cheap—and why wealth should include ownership, dignity, and the freedom to learn, care, belong, and choose.

28 Jul 2026 · 20 min read
Human Motion as a Control Interface for Interactive World Models

How 2D pose, SMPL bodies, and tracked head-and-hand motion become causal guidance for responsive video world models rather than offline animation controls.

23 Jul 2026 · Updated 23 Jul 2026 · 38 min read
Reasoning Effort Is an Inference-Time Systems Knob

Reasoning effort changes hidden token use, latency, cost, context pressure, request residency, and serving capacity across the LLM inference stack.

22 Jul 2026 · 16 min read
From Human Video to Robot Action: Five Interfaces Across the Embodiment Gap

Five interfaces that turn human video into robot supervision: latent actions, interaction tokens, digital twins, contact topology, and robotized demonstrations.

18 Jul 2026 · Updated 20 Jul 2026 · 38 min read
AI Infrastructure Is Not Plumbing

Why AI infrastructure determines whether experiments are trustworthy, inference is economical, and AI products remain reliable under real workloads.

13 Jul 2026 · 22 min read
Streaming Autoregressive Video Generation

A research review of streaming autoregressive video generation: causal distillation, self-forced rollouts, visual memory, few-step sampling, and world models.

23 Jun 2026 · Updated 13 Jul 2026 · 29 min read
Embodied AI World Models

A practical map of embodied world models covering prediction targets, action grounding, robot data, uncertainty, planning, and closed-loop deployment.

09 Jun 2026 · Updated 13 Jul 2026 · 50 min read
A Token's Journey Across the LLM Stack

A unified diagnostic model connecting token computation, GPU kernels, KV-cache memory, request scheduling, latency, and throughput across LLM inference.

06 Jun 2026 · Updated 13 Jul 2026 · 17 min read
The Life of a Request Inside vLLM

How vLLM-style serving engines coordinate prefill, decode, continuous batching, paged KV cache, scheduling, speculative decoding, and streaming.

04 Jun 2026 · Updated 13 Jul 2026 · 18 min read
Why Transformer Performance Is Mostly a Matmul Problem

How Transformer linear layers become tiled GPU matrix multiplications shaped by memory movement, tensor-core utilization, batch shape, and arithmetic intensity.

02 Jun 2026 · Updated 13 Jul 2026 · 17 min read
The Life of a Token Inside a Transformer

How token IDs become contextual representations through embeddings, normalization, attention, MLPs, positional encoding, KV cache, and logits.

31 May 2026 · Updated 13 Jul 2026 · 17 min read
LLM Inference Is a Full-Stack Systems Problem

A full-stack map of LLM inference across Transformer computation, GPU kernels, KV-cache memory, request scheduling, and streamed token delivery.

29 May 2026 · Updated 13 Jul 2026 · 16 min read
From π0 to π0.7: A Tutorial on Open-pi and Robot Foundation Models

How Physical Intelligence's open-pi models turn language and vision into continuous robot actions, then add context, memory, feedback, and steering.

25 May 2026 · Updated 13 Jul 2026 · 30 min read
Work Is Important Enough Not to Become Everything

A reflection on The Good Enough Job and why meaningful work should develop craft and responsibility without becoming the sole measure of a life.

20 May 2026 · 10 min read
The Age of Auto Research, and Other Certainties

Where AI scientist systems automate bounded experimental search, where their verification loops fail, and why human framing and judgment remain consequential.

06 Apr 2026 · Updated 13 Jul 2026 · 12 min read
On 3D, Video World Models, and the Approaching ImageNet Moment of Perception-Action Learning

How 3D structure and video world models could support an ImageNet-scale shift from isolated vision tasks toward causal perception-action learning.

30 Mar 2026 · Updated 13 Jul 2026 · 7 min read
Humans Are Imperfect, but Worthy of Love

A reading of The Stories of Ibis about machine love, human imperfection, and why coexistence may depend on acceptance without complete understanding.

14 Jan 2024 · 8 min read

Wenbo Ji

When Execution Gets Cheap, What Still Compounds?

Human Motion as a Control Interface for Interactive World Models

Reasoning Effort Is an Inference-Time Systems Knob

From Human Video to Robot Action: Five Interfaces Across the Embodiment Gap

AI Infrastructure Is Not Plumbing

Streaming Autoregressive Video Generation

Embodied AI World Models

A Token's Journey Across the LLM Stack

The Life of a Request Inside vLLM

Why Transformer Performance Is Mostly a Matmul Problem

The Life of a Token Inside a Transformer

LLM Inference Is a Full-Stack Systems Problem

From π0 to π0.7: A Tutorial on Open-pi and Robot Foundation Models

Work Is Important Enough Not to Become Everything

The Age of Auto Research, and Other Certainties

On 3D, Video World Models, and the Approaching ImageNet Moment of Perception-Action Learning

Humans Are Imperfect, but Worthy of Love