
Lead MLOps Engineer
- Menomonee Falls, WI
- Permanent
- Full-time
- Architect and implement scalable MLOps infrastructure across Databricks, Azure, and AWS.
- Build and maintain CI/CD/CT pipelines for model training, validation, deployment, and monitoring.
- Collaborate with developers to establish best practices for model versioning, reproducibility, and governance.
- Implement observability tools to monitor model performance, drift, latency, and system health.
- Collaborate with ML engineers, data scientists, electrical engineers and DevOps to integrate models into production systems.
- Work with ML community to define and enforce security, compliance, and privacy standards across the ML lifecycle.
- Document and promote MLOps best practices and tooling.
- Continuously evolve ML ops pipelines to support the ML pipelines for both internal IT and end user solutions.
- Drive awareness and adoption of existing ML tools and platforms across the organization through documentation, training, and internal community engagement.
- 5+ years of experience in MLOps, DataOps, DevOps, or backend engineering roles.
- Experience with Databricks ML services, SageMaker, Azure ML.
- Strong Python skills and familiarity with ML frameworks (e.g., PyTorch, MLflow).
- Experience with infrastructure-as-code (Terraform, Spacelift, GitHub Actions, Databricks Asset Bundles, Azure Pipelines) and container orchestration (Docker, Kubernetes).
- Proven ability to build CI/CD pipelines and model registries from scratch.
- Familiarity with monitoring tools (e.g., Azure Synapse Monitoring, Azure ML Studio monitoring, Databricks Lakehouse Monitoring, CloudWatch, CloudTrail).
- Hands-on experience with model and data quality monitoring.
- Strong understanding of the ML lifecycle, from data ingestion to model deployment strategies and retraining.
- Experience supporting multi-cloud environments and cross-functional collaboration.
- Experience maintaining ops pipelines for end user facing solutions, ideally in situations where access to data and/or deployed models may be limited.
- Experience with GenAI/LLMOps workflows and prompt management.
- Knowledge of security and compliance in regulated environments.
- Experience deploying ML models to edge devices and working with C/static datatypes in embedded environments.
- Familiarity with ML service Citrine, data governance and lineage tools.
- Experience with performance testing, observability, and cost optimization for ML workloads.
- Familiarity with transformer-based architectures and LLM frameworks (e.g., Hugging Face, OpenAI, LangChain) including prompt orchestration and autonomous agent flows.
- Frequently required to stand, walk, bend, stretch, reach, and effectively communicate with others in the workplace
- Sitting for prolonged periods of time
- Prolonged exposure to computer screens
- Repetitive use of hands and fingers to operate office equipment, machinery, hand tools and/or power tools
- Specific vision abilities required by this job include close vision, color vision, peripheral vision, depth perception, and ability to adjust focus
- May require to wear personal protective equipment which includes, but is not limited to, safety glasses, gloves, and hearing protection
- May work in laboratories and/or controlled, enclosed, restricted areas
- Noise levels range from moderate to loud
- Must be able to lift up to 50 pounds at a time
- May require travel dependent on company needs
- Robust health, dental and vision insurance plans
- Generous 401 (K) savings plan
- Education assistance
- On-site wellness, fitness center, food, and coffee service
- And many more, check out our benefits site