
Observability and Automation Engineer
- Chicago, IL
- $125,000-145,000 per year
- Permanent
- Full-time
- Design, implement, and maintain observability integrations across hybrid environments (on-prem and cloud), including metrics collection, logging, alerting, and distributed tracing.
- Support and extend centralized dashboards and alerting logic to provide actionable insights and reduce noise across platforms such as Dynatrace, ServiceNow ITOM, and LogicMonitor.
- Integrate observability instrumentation into automation workflows and infrastructure deployments for real-time telemetry and proactive monitoring.
- Provide support, root cause analysis, and iterative improvement for automation and observability incidents, acting as a bridge between engineering and operations.
- Work cross-functionally to evaluate processes and procedures for automation opportunities. Identify process improvement opportunities to allow successful conversion to an automated process.
- Define, implement, and manage toolsets required to enable self-service, service desk, and infrastructure automation.
- Build, maintain, and extend reusable Ansible roles/playbooks and Terraform modules in alignment with enterprise Infrastructure as Code (IaC) standards and governance models.
- Collaborate with Cloud, Infrastructure, and Network teams to automate full-stack lifecycle, from provisioning to decommissioning, using IaC and CI/CD pipelines.
- Develop and manage Jenkins pipelines and other CI/CD automation tools to ensure reliable, test-driven deployment of automation artifacts.
- Build and maintain automation APIs, CLI tools, and scripts to reduce manual work across infrastructure and operations.
- Create framework for other IT teams to utilize in automating jobs, processes, and tasks. Act as consultant to those teams to train staff and define automation best practices.
- Participate in code reviews, documentation, and knowledge sharing for new observability and automation capabilities across IT teams.
- Experience working with observability and automation platforms, including ServiceNow ITOM, Dynatrace, and LogicMonitor, with a strong understanding of telemetry standards such as OpenTelemetry, SNMP, and WMI.
- Experience with Microsoft Windows and Red Hat Enterprise Linux (RHEL) operating systems in hybrid cloud environments.
- Proficiency with Infrastructure as Code (IaC) tooling such as Ansible, Terraform, and Jenkins, including hands-on experience creating reusable modules, libraries, and pipeline integrations.
- Deep knowledge of CI/CD practices, with experience designing and managing pipelines that support infrastructure lifecycle automation and automated testing.
- Proficiency in version control systems such as GitHub and Bitbucket, supporting secure, efficient collaboration and code lifecycle management.
- Strong understanding of programming concepts and automation design, with experience building CLI tools and working with REST APIs.
- Languages: Python, .NET, PowerShell, GoLang
- Frameworks: Django, Flask, FastAPI
- Concepts: Software Development Life Cycle (SDLC), Test Driven Development (TDD)
- Experience with identity platforms including On-prem and Azure Active Directory and Office 365.
- Excellent troubleshooting skills with a focus on documentation, collaboration, and continuous improvement across cross-functional teams.
- Familiarity with cloud platforms (AWS, Azure, GCP) and modern cloud-native observability and automation practices.
- Familiarity of On-Prem virtualization technologies.