一些关于 3D Robotic 和 Vision-Language-Action (VLA) 大模型的论文选读
3D-MVP: 3D Multiview Pretraining for Robotic Manipulation [arxiv] [note]
3D Diffuser Actor: Policy Diffusion with 3D Scene Representations [arxiv] [note]
3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations [arxiv] [note]
Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding [arxiv] [note]
Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation [arxiv] [note]
Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders [arxiv] [note]
Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning [arxiv] [note]
PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation [arxiv] [note]
ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation [arxiv] [note]
SUGAR: Pre-training 3D Visual Representations for Robotics [arxiv] [note]
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation [arxiv] [note]
RVT: Robotic View Transformer for 3D Object Manipulation [arxiv] [note]
RVT-2: Learning Precise Manipulation from Few Demonstrations [arxiv] [note]
3D-VLA: A 3D Vision-Language-Action Generative World Model [arxiv] [note]
AffordDP: Generalizable Diffusion Policy with Transferable Affordance [arxiv] [note]
Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding [arxiv] [note]
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation [arxiv] [note]
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution [arxiv] [note]
DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control [arxiv] [note]
FAST: Efficient Action Tokenization for Vision-Language-Action Models [arxiv] [note]
Improving Vision-Language-Action Models via Chain-of-Affordance [arxiv] [note]
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression [arxiv] [note]
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation [arxiv] [note]
Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation [arxiv] [note]