
Site Reliability Engineer, Infrastructure and Assurance Services - USDS
- Seattle, WA
- Permanent
- Full-time
- Drive infrastructure automation and tooling: Design, develop, and maintain solutions for efficient operation, optimization, and comprehensive monitoring of global infrastructure, minimizing manual intervention.
- Collaborate on service lifecycle management: Partner with engineering teams to design, deploy, operate, and continuously improve robust and scalable systems and services, from inception to refinement.
- Ensure service reliability and performance: Proactively monitor system health, conduct performance testing, and manage incidents to maximize uptime, availability, and adherence to defined SLAs/SLOs.
- Execute core SRE practices: Perform on-call duties and production operations, including change management, capacity planning, and disaster recovery, while contributing to documentation and process improvements across teams.Qualifications:Minimum Qualifications:
-Bachelor's degree in Computer Science, a related field, or equivalent practical experience.
-Demonstrated experience in software development with one or more programming languages.
-Experience in Linux Operating Systems, Networking, Database concepts, Monitoring, Shell scripting.
-Superb analytical ability, problem solving and critical thinking skills.
-Excellent communicator, team-player, self-starter and fast learner.Preferred Qualifications:
-Master's degree in Computer Science, Engineering or a related field.
-Proficient in any of the following languages: Python, GoLang, C++.
-Expertise in any of the following: SRE philosophy, AIOPS, APM, Disaster Recovery.
-Expertise in any of these tech stacks: Kubernetes, ElasticSearch, ClickHouse, Message Queue, OpenTSDB, Service Mesh.As a condition of employment, all successful candidates must be able to establish authorization to work in the United States. For this position, the Company does not provide sponsorship for any immigration-related benefits.