Infrastructure & Platform SRE
KBR
- Sioux Falls, SD
- Permanent
- Full-time
- Works with the ITSD team and stakeholders to ensure the organization can deliver quality IT products and services through its computing environment.
- Applies education and experience in performing complex analysis, including the consideration of viable innovative alternatives.
- Conducts studies, including requirements definition, feasibility, trade, and enterprise architectures.
- Identifies, tracks, reports, and works on incident prevention tasks.
- Performs deep dives into both systemic and latent reliability issues.
- Builds and / or enhances existing monitoring solutions to address early detection of issues, notifications, and corrective actions.
- Identifies, establishes, and implements key performance indicators and metrics for IT technology monitoring and develops alerting and notifications for various operational issues.
- Standardizes and automates routine operational tasks like VM, OS, and storage provisioning.
- Drives standardization efforts across multiple disciplines and services in conjunction with stakeholders across EROS.
- Leads continuous improvement initiatives for optimizing support and delivery of IT technology solutions.
- Establishes and enforces the systems engineering management plan and corresponding support plan templates to integrate quality first service delivery practices.
- Actively leads and/or participates in Root Cause Analysis (RCA) for OS platform and infrastructure systems problems (recurring incidents) and works across teams to define, recommend, and implement corrective actions.
- Assists with the technology selection for core computing, hardware, and software infrastructure.
- Assists with performance testing and optimization of applications and IT technology.
- Defines and implements quality control tools, processes, and measures to ensure the quality delivery of IT technology and systems.
- Excellent interpersonal, organizational, leadership, and communication skills.
- Ability to work independently and as part of a team.
- Advanced experience in systems engineering methodology and processes.
- Strong understanding and experience with security concepts and methods.
- Strong understanding of Enterprise Architecture principles and methods.
- Experience with testing and CM tools such as SVN
- Experience with virtualization, storage, network, and compute technology configuration, monitoring, trending, alerting, and high availability architectures.
- Experience with incident management, problem management and root cause analysis techniques.
- 5+ years’ experience in one or more of the Unix, Linux, and Windows server operating system platforms automating provisioning, application layering, and configuration.
- 5+ years’ experience with collecting metrics and generating reporting and dashboards for performance, capacity, and availability key performance indicators related to VMware, storage solutions, operating systems, and applications.
- Practical knowledge of shell scripting and at least one higher-level language.
- Proficiency with scripting languages such as PowerShell, JSON etc.
- Experience configuring and using monitoring and event management platforms and tools.
- Experience with workflow automation using platform management tools and / or service management platforms.
- Monitoring platforms: SolarWinds, Xymon
- Log collection and analytics platforms: LogRhythm, Splunk
- NetApp storage appliance
- Linux systems management and patch platforms eg. Foreman Katello, Spacewalk
- Versity and/or Scout AM Hierarchical Storage Management (HSM) platform
- CommVault installation and configuration.
- VMware and VMware Horizon implementation and administration
- Cisco MDS and Brocade storage networking
- Amazon Cloud Services Iaas, PaaS, SaaS configuration, provisioning, maintenance, and support
- Tape library implementation and configuration
- ITIL v3/v4 foundation certified.
- PMP certification
- VMware, NetApp, Cisco Cloud certifications
- Certified Data Center Professional (CDCP®)
- Certifications in ITIL Asset Management and ITIL Configuration Management
- Experience using Jira for Agile Scrum
- Experience using BMC Remedy for request and incident management
- Experience using Microsoft productivity tools such as Excel, Word, Outlook, Teams, Visio
- Experience using management tools such as Nlyte DCIM, SolarWinds, BigFix, BMC Remedy
- Awareness and/or experience with AWS and Azure
- Advanced experience with code debugging methods and tools
- Three years of continuous residency in the US for issuance of a Government Security credential
- The candidate must be able to obtain and maintain a national agency check and background investigation after hire to obtain a badge for government facility access and user account