
Systems Engineer, HPC
- Everett, WA
- $130,000-165,000 per year
- Permanent
- Full-time
- Manage the availability and performance of Linux-based HPC systems across on-premises and cloud environments (AWS, Azure, boutique GPU providers).
- Administer system software, including patching, updates, and OS upgrades; troubleshoot and resolve hardware, software, and third-party issues
- Automate system configuration, monitoring, and updates using DevOps tools such as Ansible and GitHub
- Install, configure, and maintain a wide range of COTS, open-source, and in-house applications; package and deploy software using environment modules
- Plan and implement HPC infrastructure enhancements; communicate system issues and propose solutions to management
- Collaborate with scientist and engineers on system roadmaps and feature requirements; support end users with HPC usage and performance inquiries
- Bachelors degree in Computer Science, Computer Engineering, or equivalent experience
- 5+ years of experience managing and administering production HPC clusters, with hands-on expertise in job schedulers (preferably Slurm)
- Proficient in scripting (Bash, Python) and experienced with programming languages such as Fortran, C++, or R
- Familiar with GPU-aware MPI, high-speed interconnects (e.g., NVLink), and supporting NVIDIA GPU-accelerated computing and GPU Direct RDMA
- Experienced with HPC storage and I/O technologies (e.g., Lustre, ZFS, Parallel HDF5, ADIOS) and containerization tools (e.g., Docker, Singularity, Apptainer)
- Strong understanding of Linux systems, networking, and high-performance applications, with excellent collaboration skills and a team-oriented mindset
- Medical, Dental, and Vision plans for employees and their families
- 31 Days of PTO (21 vacation days and 10 sick days)
- 10 Paid holidays, plus company-wide winter break
- Up to 5% employer 401(k) match
- Short term disability, long term disability, and life insurance
- Paid parental leave and support (up to 16 weeks)
- Annual wellness stipend