
Senior Software Engineer, ML Platform
- San Francisco, CA
- $170,000-230,000 per year
- Permanent
- Full-time
- Expand, mature, and optimize our ML platform built around cutting edge tooling like Ray, MLFlow, Metaflow, Argo, and Spark to support traditional, deep learning, and reinforcement learning ML models
- Build and mature capabilities to support CPU / GPU clusters, model performance monitoring, drift detection, automated roll-outs, and improved developer experience
- Build, operate, and maintain a low-latency, high volume ML serving layer covering both online and batch inference use cases
- Orchestrate Kubernetes and ML training / inference infrastructure exposed as an ML platform
- Expose and manage environments, interfaces, and workflows to enable ML engineers to develop, build, and test ML models and services
- You have been working in the areas of ML Platform / MLOps / Platform Engineering / DevOps / Infrastructure for 5+ years, and have an understanding of gold standard practices and best in class tooling for ML
- Your passion is exposing platform capabilities through interfaces that enable high performance ML practices, rather than designing ML experiments (this team does not directly develop ML models)
- You understand the key differences between online and offline ML inference and can voice the critical elements to be successful with each to meet business needs
- You understand the importance of CI/CD in building high-performing teams and have worked with tools like Jenkins, CircleCI, Argo Workflows, and ArgoCD
- You are passionate about observability and worked with tools such as Splunk, Nagios, Sensu, Datadog, New Relic
- Design and implement an online inference pipeline with champion/challenger shadow model testing
- Scale real-time feature streaming use cases to handle low-latency, high-volume RL use cases
- Build a universal data access layer (DAL) and serving interface to expose predictions to different parts of Attentive's products
- Mature platform interfaces toward full self-service for stakeholders
- Improve existing build and release pipelines for better reliability and Python package management