
Principal Software Engineer, ML Platform - Game Tech Group
- Los Angeles, CA
- Permanent
- Full-time
- Architect and implement Riot's core ML inference infrastructure, with a focus on both live inference and nearline batch inference for scalable model serving, CPU and GPU-aware orchestration, and automated deployment pipelines.
- Partner with researchers, game teams, and platform engineers to understand product needs and deliver generalizable, reusable solutions.
- Define and build CI/CD workflows for ML artifacts-supporting rapid iteration and safe promotion from dev to production and MLOps practices.
- Own tooling for environment and dependency management strategies (e.g., Conda/Poetry lock files, secure image builds) for ML runtimes.
- Instrument and emit platform metrics for observability, model monitoring, drift detection, CPU/GPU utilization, and latency SLAs.
- Establish patterns and tooling for multi-version model support, blue/green and shadow deployments, and rollback.
- Be thoughtful on developer UX and incorporate an iterative approach to improving.
- Serve as the technical founding voice for a new platform-defining long-term architecture, mentoring incoming engineers, and collaborating on hiring.
- Contribute upstream to shared infra initiatives and build a feedback loops and collaboration models with other Riot platform teams
- 10+ years of experience in software engineering, with substantial time spent in platform or infrastructure teams
- Proven technical leadership in building large scale distributed systems, production ML systems or model serving infrastructure at scale
- Deep experience with cloud-native systems (e.g., Kubernetes, containerization, autoscaling, observability stacks)
- Experience with one or more inference serving frameworks (e.g., NVIDIA Triton, KServe, TorchServe, BentoML, Seldon Core etc)
- Familiarity with GPU orchestration, performance tuning, and cost-aware scheduling
- Strong background in CI/CD automation, IaC tools (e.g., Terraform), and artifact management
- Hands-on experience with Python ML ecosystems, package management (Poetry, Conda etc), and vulnerability scanning
- Ability to mentor engineers, write clear documentation, and influence cross-functional stakeholders
- Experience building ML infrastructure within a real-time, or latency-sensitive environment
- Familiarity with ML workflow tools (MLFlow, DVC, LakeFS, etc) and drift monitoring strategies
- Exposure to AB testing and experimentation frameworks, especially in online model evaluation
- Prior success in founding or greenfield platform work, especially building toward multi-tenancy or self-service capabilities
- Passion for player experience, game systems, or creative technology development
- Familiarity/experience with technical deployments in China, particularly in Tencent.
- Safeguarding confidential and sensitive Company data
- Communication with others, including Rioters and third parties such as vendors, and/or players, including minors
- Accessing Company assets, secure digital systems, and networks
- Ensuring a safe interactive environment for players and other Rioters