
Software Engineer Intern (Data Ecosystem) - 2026 Summer (BS/MS)
- Seattle, WA
- Training
- Full-time
- May 11th, 2026
- May 18th, 2026
- May 26th, 2026
- June 8th, 2026
- June 22nd, 2026Responsibilities
1. Design and implement real-time and offline data architecture for large-scale recommendation systems.
2. Build scalable and high-performance streaming Lakehouse systems that power feature pipelines, model training, and real-time inference.
3. Collaborate with ML platform teams to support PyTorch-based model training workflows and design efficient data formats and access patterns for large-scale samples and features.
4. Own core components of our distributed storage and processing stack, from file format to stream compaction to metadata management.Qualifications:Minimum Qualifications:
- Currently pursuing an Undergraduate/Master in Computer Science or a related technical discipline
- Able to commit to working for 12 weeks during Summer 2026
- Must obtain work authorization in country of employment at the time of hire, and maintain ongoing work authorization during employment.
- Familiarity with modern Lakehouse technologies such as Apache Paimon, Iceberg, Delta Lake, or Hudi, especially around incremental ingestion, schema evolution, and snapshot isolation.
- Experience in designing and optimizing Flink + Paimon architectures for unified batch/stream processing.
- Familiarity with feature storage and training data pipelines, and their integration with PyTorch, especially for large-scale model training.
- Knowledge of columnar file formats (Parquet, ORC, Lance) and how they are used in feature engineering or ML data loading.
- Proficiency in Java/Scala/C++, and strong debugging/performance tuning ability.
- Previous experience in Lakehouse metadata management, compaction scheduling, or data versioning is a plus.
- Solid understanding of Apache Flink internals, with hands-on experience in state management, connectors, or UDFs.