Sr Site Reliability Engineer - Remote
Allscripts
- Washington DC
- Permanent
- Full-time
- Reduce the administrative burden associated with ever-changing regulatory and reimbursement requirements
- Improve practice financial performance and take advantage of the benefits of health information technology innovations
- Enhance patient satisfaction by reducing high costs and long wait times common to many prescriptions
- Get patients all their specialty medications faster and more easily
- Serve as an on-call engineer, responsible for managing and resolving incidents that affect the availability and performance of our systems.
- Collaborate with teams with many years of experience in development, operations, and infrastructure to design, implement, and maintain robust, scalable, and reliable systems.
- Proactively monitor and analyze system metrics to identify potential issues and take necessary actions to prevent or mitigate them.
- Conduct thorough root cause analysis of incidents, identifying underlying issues and implementing long-term solutions to prevent recurrence.
- Automate manual processes and tasks to improve efficiency and reduce human error.
- Participate in capacity planning and performance optimization efforts to ensure system scalability and reliability.
- Stay updated with the latest industry trends and emerging technologies related to cloud services and Site Reliability Engineering.
- Bachelor's degree in computer science, engineering, or a related field (or equivalent work experience).
- 4-7 years of experience in development, operations, and infrastructure, with a current or most recent role as a Site Reliability Engineer, DevOps Engineer, or an equivalent position for at least 2-3 years.
- Coding proficiency in a high-level programming language (C# preferred) and applied knowledge of Object-Oriented Programming: Java, Objective-C, C#, C/C++, Python.
- Proficient in scripting and automation using languages such as Python, Bash, or PowerShell.
- 3+ years of experience with service-oriented architectures and microservices.
- Brings a solid understanding of Site Reliability Engineering principles, with a substantiated history of successfully applying SLAs, SLIs, and SLOs to enhance and quantify system dependability and efficiency.
- Extensive experience in incident management and on-call support, preferably in a high-availability production environment.
- Strong knowledge of cloud services, particularly in Azure and AWS, including virtual machines, networking, storage, and load balancing.
- Excellent troubleshooting and problem-solving skills, with a keen attention to detail.
- Self-driven and motivated, with the ability to work independently and prioritize tasks effectively.
- Strong communication and interpersonal skills, with the ability to collaborate and communicate effectively with cross-functional teams.
- Familiarity with DevOps practices and tools, such as CI/CD pipelines and infrastructure-as-code.
- Experience with monitoring and logging tools, such as Splunk, Prometheus, Grafana, ELK stack, or similar.
- Certifications in Azure, AWS, Terraform, Kubernetes