返回首页
Hunyuan(@TXhunyuan)

New research with @Tsinghua_Uni: Spatial-TTT. A framework for streaming visual-based spatial intell...

8.5Score
New research with @Tsinghua_Uni: Spatial-TTT.

A framework for streaming visual-based spatial intell...
AI 深度提炼
  • Spatial-TTT利用快速权重构建亚线性增长的紧凑空间记忆,处理7000+帧视频。
  • 引入3D时空卷积的TTT层,有效捕捉几何对应与时间连续性。
  • 在VSI-Bench长时序视频空间理解任务上取得SOTA结果。
#计算机视觉#3D空间智能#测试时训练#视频理解#腾讯
打开原文

A framework for streaming visual-based spatial intelligence with test-time training (TTT). Spatial-TTT adapts fast weights to capture and organize spatial evidence from long video streams, enabling models to build structured 3D https://t.co/mP8A49SgLC" / X

Post

Conversation

![Image 1: Square profile picture](https://x.com/TencentHunyuan)

New research with

: Spatial-TTT. A framework for streaming visual-based spatial intelligence with test-time training (TTT). Spatial-TTT adapts fast weights to capture and organize spatial evidence from long video streams, enabling models to build structured 3D spatial memory over time. Highlights: !Image 2: 🔹Efficient streaming memory. Fast weights act as compact spatial memory with sublinear memory growth over 7000+ frames and more than 40% lower compute. !Image 3: 🔹Spatial-predictive mechanism. TTT layers with 3D spatiotemporal convolution capture geometric correspondence and temporal continuity. !Image 4: 🔹SOTA results on long-horizon video spatial understanding (VSI-Bench). The paper ranked #1 on

Daily Papers on March 13. Project page: liuff19.github.io/Spatial-TTT/ GitHub: github.com/THU-SI/Spatial Paper: huggingface.co/papers/2603.12 Model & Data: huggingface.co/THU-SI

Image 5