
Sr. Infrastructure and Ops Engineer - Remote
- Eden Prairie, MN
- $89,900-160,600 per year
- Permanent
- Full-time
- Automation & DevOps: Implement automation across the infrastructure lifecycle, leveraging Infrastructure as Code (IaC) and DevOps principles and best practices to streamline deployment and management processes for agentic workflows on UAIS.
- Systems Monitoring & Performance Tuning: Develop and implement agent monitoring frameworks for infrastructure, identifying areas for performance improvement, optimization, and ensuring high availability
- Continuous support: Provide SRE support to geographically distributed users on the UAIS platform: respond to tickets, triage support, liaise with customers
- Disaster Recovery & Business Continuity: Design, test, and implement disaster recovery and business continuity plans to ensure minimal downtime and data integrity
- Security & Compliance: Collaborate with enterprise cybersecurity and AI security teams to ensure all systems and operations comply with industry standards and are secure against evolving threats
- AI Builder- Design, develop, and deploy AI-powered solutions using no-code, low-code, and advanced platforms, translating business needs into scalable applications that enhance products, workflows and decision-making
- Bachelor's degree in computer science, information technology
- 6+ years of infrastructure experience: Proven experience working on large-scale, cloud-based, enterprise-level software platforms and hands-on and deep understanding of multi-cloud architectures, specifically Azure, AWS, and GCP
- 4+ years of practical experience in Infrastructure-as-Code and CI/CD tools like Terraform, Git Actions and alike
- 3+ years of practical experience in containerization technologies (Kubernetes, Docker), observability and orchestration
- 3+ years of practical experience in Scripting & Automation Skills: Advanced proficiency in scripting languages such as Python and Bash to support automation and system integration efforts
- 1+ years of experience building infrastructure for (Gen)AI platforms and systems
- Security & Compliance Knowledge: Strong understanding of security best practices and experience ensuring compliance with relevant regulatory frameworks
- Machine Learning and LLM Operations: Exposure to modern tools and techniques in MLOps and LLMOps fields
- Exposure to AI/ML-specific infrastructure tools (e.g., MLflow, Kubeflow) for managing and deploying models at scale
- Exposure to a Regulated Industry: Experience working within a healthcare or regulated industry, with solid understanding of the unique challenges and compliance requirements
- Ability to work independently, manage multiple projects simultaneously, and adapt to changing priorities in a fast-paced environment