On 3D, Video World Models, and the Approaching ImageNet Moment of Perception-Action Learning
How can perception-action learning reach its own ImageNet moment, and what role might 3D and video world models play in building causal, physically accurate representations of the world that are useful for action?
· 7 min read
· Wenbo Ji