
Site Reliability Engineer (SRE)
- Morgantown, WV
- $85,150-153,925 per year
- Permanent
- Full-time
- Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding of an microservice enterprise system (cloud and on-premises)
- Partner with development teams to improve services through rigorous testing and release procedures
- Participate in system design consulting, platform management, and capacity planning
- Create sustainable systems and services through service automation
- Design, develop, troubleshoot, and debug mission critical infrastructure
- Manage on-premises and private/public cloud environments via infrastructure-as-code (IaC).
- Participate in the design of reusable infrastructure components for scalable, highly available, secure architectures for cloud native applications.
- Enable the continuous integration and continuous delivery of our diverse suite of software products by applying best practices for infrastructure provisioning, configuration and automated software deployments.
- Continually evaluate fielded system deployments and apply best practices to facilitate continuous improvement that can be applied across teams.
- Work closely with other engineers to develop the best technical design and approach for new product installation and field service activities (software patches, cyber updates, etc.)
- Develop solutions to complex technical issues and problems that impact multiple area or disciplines.
- Communicate with internal team members across multiple areas and coordinate completion of key deliverables across teams.
- Liaise with external and internal customer stakeholders on technical design decisions and trade-offs and ensure software solution will meet required functional, performance, and SLA thresholds.
- Mentor other SREs in the art of building deploying and maintaining production mission critical microservice enterprise systems.
- Resolve roadblocks for the field service team, working collaboratively with the product engineering, technical leadership, and others.
- Bachelor’s degree in computer science or computer engineering with 4+ years of experience in a relevant field or Master's degree and 2+ years experience. May consider additional years of experience in lieu of a degree.
- Must have the ability to obtain a Public Trust clearance (US citizenship required).
- Experience delivering entire projects or processes spanning multiple technical areas.
- Experience serving as a technical lead managing large projects or processes.
- Working knowledge of Agile Development and continuous integration and continuous delivery methodologies and tools.
- Expertise with Linux and Windows operating systems, network administration, and networking protocols/functions (e.g., HTTP, HTTPS, SSL/TLS, SMTP, DNS)
- Expertise provisioning and managing resources within IaaS/Cloud infrastructures (e.g., Azure, AWS, Google Cloud Platform, etc.)
- Expertise with Infrastructure as Code technologies
- Expertise with Terraform, Ansible, Helm, BASH Scripting, CloudFormation, Chef, Puppet, and/or similar technologies
- Expertise with container technologies such as Docker and container orchestration tools like Kubernetes
- Expertise with Kubernetes kubectl
- Expertise of a version control system (e.g., Git).
- Experience with API Gateways such as ISTIO
- Experience with GitOps tools such as Argo CD, Flux CD, Fleet or similar
- Strong, self-motivated desire to learn new tools, frameworks, and techniques.
- Ability to complete tasking independently with minimal direct supervision.
- Ability to work and collaborate effectively within a multi-disciplined engineering team.
- Experience with Enterprise Event Brokers Technologies (Kafka, NATS)
- Experience with Rancher Harvester virtualization
- Experience with VMWare vSphere
- Experience with monitoring and alerting tools such as Grafana, Prometheus
- Professional cybersecurity certification such as Security+, or similar.
- Knowledge of Agile Development methodologies.
- Familiarity with at least one Relational Database Management System (Oracle, MySQL, PostgreSQL, SQL Server, etc.).