
Research Scientist- Vision-Language-Action Models for Autonomous Systems
- Sunnyvale, CA
- $160,000-200,000 per year
- Permanent
- Full-time
- Reinvent yourself: At Bosch, you will evolve.
- Discover new directions: At Bosch, you will find your place.
- Balance your life: At Bosch, your job matches your lifestyle.
- Celebrate success: At Bosch, we celebrate you.
- Be yourself: At Bosch, we value values.
- Shape tomorrow: At Bosch, you change lives.
- Conduct research and engineering in core AI and machine learning fields to enable Embodied AI (including computer vision, autonomous planning, open-world learning, and so on) for related AIoT (AI+IoT) business domains of autonomous driving, industrial automation, robotics etc.
- Push the boundaries in (modular) end-to-end perception and planning for automated driving, incorporating advancements in large vision-language-(action) models to aid reasoning capabilities and explainability.
- Collaborate with a global team to transfer cutting-edge research findings to Bosch's operational units.
- Implement research results to solve real-world challenges, ensuring high-quality system integration within Bosch's existing platforms.
- Stay abreast of the latest technological advancements and market trends by attending academic conferences, technical events, and seminars.
- Document and disseminate research findings through high-caliber publications and/or patent submissions.
- Ph.D. in Computer Science, Robotics or a related discipline or Master’s degree with >= 1 / 3 years industry experience after graduation.
- A minimum of 3 years of R&D experience, or an equivalent graduate research background, primarily in AI technologies including Computer Vision and Robotic or Automotive Motion and Behavioral Planning.
- Proficiency in one or more programming languages commonly used in machine learning (e.g., Python, C++, Rust).
- Strong interpersonal, communication, and teamwork capabilities.
- Knowledge of major machine learning frameworks like TensorFlow or PyTorch.
- Hands-on experience building and applying multimodal transformer-based sequence-to-sequence models.
- Familiarity with concepts in vision-language-action models like MoE, GRPO, LoRA, etc.
- Experience with real-world product development and deployment of autonomous systems.
- Hands-on experience in computer vision and deep learning, with work in any of the following areas: multimodal transformers, multimodal language models, diffusion models, NeRF, gaussian splatting, object detection / segmentation, 3D scene understanding, sensor calibration, SfM, voxel/BEV grid-based feature representation.
- A strong portfolio of publications in premier machine learning, deep learning, robotics and computer vision journals and conferences.