New research with @Tsinghua_Uni: Spatial-TTT. A framework for streaming visual-based spatial intell...

- Spatial-TTT利用快速权重构建亚线性增长的紧凑空间记忆,处理7000+帧视频。
- 引入3D时空卷积的TTT层,有效捕捉几何对应与时间连续性。
- 在VSI-Bench长时序视频空间理解任务上取得SOTA结果。
A framework for streaming visual-based spatial intelligence with test-time training (TTT). Spatial-TTT adapts fast weights to capture and organize spatial evidence from long video streams, enabling models to build structured 3D https://t.co/mP8A49SgLC" / X
Post
Conversation

New research with
: Spatial-TTT. A framework for streaming visual-based spatial intelligence with test-time training (TTT). Spatial-TTT adapts fast weights to capture and organize spatial evidence from long video streams, enabling models to build structured 3D spatial memory over time. Highlights: !Image 2: 🔹Efficient streaming memory. Fast weights act as compact spatial memory with sublinear memory growth over 7000+ frames and more than 40% lower compute. !Image 3: 🔹Spatial-predictive mechanism. TTT layers with 3D spatiotemporal convolution capture geometric correspondence and temporal continuity. !Image 4: 🔹SOTA results on long-horizon video spatial understanding (VSI-Bench). The paper ranked #1 on
Daily Papers on March 13. Project page: liuff19.github.io/Spatial-TTT/ GitHub: github.com/THU-SI/Spatial Paper: huggingface.co/papers/2603.12 Model & Data: huggingface.co/THU-SI
