
Staff Reliability Engineer – Application Owner
- Chicago, IL
- $126,160-189,240 per year
- Permanent
- Full-time
- Serve as the technical owner for one or more applications, ensuring their reliability, scalability, and performance.
- Drive adoption of best practices in observability, automation, and incident prevention.
- Ensure compliance with enterprise architecture, security, and regulatory standards.
- Design and implement automation to reduce manual toil and improve operational efficiency.
- Build and maintain monitoring, alerting, and self-healing capabilities using tools like Dynatrace, Splunk, and CloudWatch.
- Lead root cause analysis and implement long-term fixes for recurring issues.
- Collaborate with DevOps teams to enhance CI/CD pipelines for secure and efficient deployments.
- Integrate security and compliance checks into the software delivery lifecycle.
- Promote infrastructure-as-code (IaC) practices using tools like Terraform or CloudFormation.
- Lead triage and resolution of high-severity incidents, minimizing business impact.
- Improve incident response processes and reduce mean time to recovery (MTTR).
- Maintain accurate documentation, runbooks, and operational metadata.
- Partner with development, QA, and infrastructure teams to drive reliability initiatives.
- Contribute to the Reliability Engineering Community of Practice.
- Mentor junior engineers and promote a culture of continuous improvement.
- 7+ years of experience in software engineering, SRE, or application support.
- Strong knowledge of AWS services (EC2, Lambda, S3, CloudWatch, IAM).
- Proficiency in scripting (Python, NodeJS Bash, PowerShell) and automation.
- Experience with observability tools (Dynatrace, Splunk, Prometheus, Grafana).
- Familiarity with CI/CD tools (Jenkins, GitHub Actions, Azure DevOps).
- Hands-on experience with containerization (Docker, Kubernetes, ECS/EKS).
- Proficiency in infrastructure-as-code (Terraform, CloudFormation).
- Proven ability to lead incident response and root cause analysis.
- Experience implementing SLIs, SLOs, and SLAs.
- Ability to design and implement runbooks, playbooks, and automated health checks.
- Understanding of DevSecOps principles and secure software delivery.
- Familiarity with compliance frameworks (SOC2, HIPAA, PCI-DSS).
- Strong cross-functional collaboration and communication skills.
- Ability to explain technical concepts to non-technical stakeholders.
- Experience mentoring or leading technical discussions.
- AWS Certified DevOps Engineer, CKA, or Google SRE certification.
- Experience in financial services or insurance, especially contact center or claims operations.
- Exposure to hybrid cloud environments and legacy modernization.