Staff Cloud Site Reliability Engineer

Dearborn, MI
Permanent
Full-time

1 month ago

Collaborate with development teams as a software engineer to design, build, and operate scalable and resilient cloud infrastructure Guide teams in implementing best practices for designing safe rollout strategies for critical Cloud Infrastructure, automating the ability to scale into a resilient architecture pattern, and ultimately improving software delivery for our Cloud based shared services Perform root cause analysis of production incidents and implementing preventive measures Establish on-call practices for the team, and serve in rotation as escalation point for critical incidents. Enable/guide Cloud Operation teams to regularly review key site technical metrics such as transactions errors, logging, response times, caching strategies, capacity & resource utilization. Established and active employee resource groups Experience with automated testing, unit/integration/load and/or test-driven development Experience implementing deployment strategies at scale including rollout and rollback automation Experience in solving complex architecture/design & business problems, work to simplify, optimize, remove bottlenecks, etc. Familiarity with DevSecOps practices and integrating security into CI/CD pipelines Experience with SCA, SAST, DAST, Vulnerability Management, and CSPM tools to assist customers deliver secure services Proficiency in CI/CD and DevOps / GitOps practices Experience with GCP cloud services Demonstrable experience as a Site Reliability Engineer or similar role SRE Certification(s) is a plus Kubernetes experience is a plus

Ford

Apply Now