
Senior Manager, DevOps & AI/MLOps
- Boston, MA
- Permanent
- Full-time
- Proven experience as a DevOps or MLOps manager/leader, or as an architect-level engineer with management responsibilities supporting AI/ML teams or large-scale production systems.
- A strategic communicator who can translate the upper management and business goals into clear team priorities, directions, deliverable achievements and keep alignment across stakeholders
- Background in building, scaling, or operating ML systems (preferably from ingestion to deployment and serving), or direct experience with public ML applications and their unique challenges.
- Deep understanding of cloud computing at the architect level (any major cloud; vendor specifics less important than breadth and adaptability).
- Expertise in containerized environments and infrastructure as code (IaC); must be able to articulate the organizational value and management interactions of IaC and Policy as Code (PaC).
- Track record of infrastructure minimalism—prioritizing simplicity, efficiency, and automation.
- Project management skills, with familiarity in Agile methodologies; openness to adapting processes to fit the team's needs.
- Strong history of hands-on delivery, rapid learning, and adaptability—what you’ve built and shipped matters more than degrees or certifications alone.
- Strong technical background, comfortable to engage in architecture discussions, evaluate tradeoffs and keen to share guidance.
- Support and empower the team. Clear blockers, protect and respect focus time, trust the team to make technical decisions, give autonomy while keeping us aligned with the broader vision.
- Experience with GitHub, Terraform, OpenShift and Argo CD is highly desirable
- Team Leadership: Manage and mentor a global, cross-functional DevOps/MLOps team supporting AI/ML engineers and data scientists.
- Infrastructure Strategy: Guide the selection, design, and deployment of minimal, reliable, and secure cloud-native infrastructure for AI/ML projects as well as hybrid cloud and on-prem with RedHat OpenShift.
- Workflow Automation: Oversee the development and automation of ML pipelines from data ingestion to model serving and monitoring.
- Collaboration: Serve as a bridge between IT, software developers, and ML practitioners, ensuring alignment and successful delivery of production-grade ML applications.
- Agile Execution: Drive projects using Agile or alternative methodologies, adapting to the needs of the team and the organization.
- Cloud Architecture: Architect and oversee cloud solutions using containerization, GitOps and Infrastructure as Code principles, while enabling the team’s autonomy in tooling selection.
- Process Improvement: Continually assess and evolve DevOps/MLOps processes for better efficiency, scalability, and security.