
Software Engineer
- Atlanta, GA
- Permanent
- Full-time
- Contributes to defining system reliability goals through Service Level Objectives (SLOs) and enhancing production posture with targeted improvements in observability and operability (telemetry, alerting, incident/change management, safe deployment practices).
- Builds reusable automation and processes that help multiple teams meet their reliability goals. With guidance, influences product architecture and roadmaps to ensure customer-experienced reliability is a core design principle.
- Works directly on product code to achieve reliability outcomes. Leverages AI to proactively detect anomalies, predict incidents, and automate operational workflows - scaling reliability efforts across complex systems.
- With guidance, supports the design and development of large-scale distributed software services and solutions. Delivers “best-in-class” engineering by ensuring services are modular, secure, reliable, testable, diagnosable, observable, and reusable.
- Collaborates with internal and external partners to support team goals. Balances pragmatism with vision - driving continuous improvements in process and codebase. Builds automation to prevent or remediate service issues before they impact users.
- Applies cutting-edge AI tools and techniques to reduce operational toil and scale reliability engineering across complex systems.
- Gains a working understanding of Microsoft businesses and contributes to cohesive, end-to-end user experiences.
- Bachelor's Degree in Computer Science, or related technical discipline with proven experience coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python,
- OR equivalent experience.
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
- Familiarity with modern distributed software design patterns and cloud systems architecture, including microservices, containers, load balancing, queuing, caching.
- Experience in building, shipping, and operating reliable solutions.
- Experience with data technologies (SQL, NoSQL, etc.).
- Experience with Azure.
- Experience in AI adoption with tools like GitHub Copilot, Azure OpenAI, and custom Copilots to streamline development and reduce toil.