
Advisor - AI HPC Platform Engineering
- Indianapolis, IN
- Permanent
- Full-time
- Hands-on experience in HPC and AI platforms, including in-depth knowledge of accelerators (e.g., GPU), HPC schedulers (e.g., Altair Grid Engine, Slurm), Kubernetes platforms, and containers technologies (Docker, Apptainer).
- 6+ years of demonstrated experience in AI/ML and HPC workloads, infrastructure, and cluster architectures.
- Expertise in Linux system and HPC administration, including experience with platform observability (e.g., alerting, logging, and metrics).
- Knowledge of Run:ai core concepts, including roles, departments, projects, workloads, quotas, GPU fractions, and pre-emptible vs non-preemptible jobs.
- Experience with writing, building and running containers. Understanding of container registry management and using NGC images.
- Experience with machine learning frameworks such as PyTorch, Keras, and TensorFlow
- Passion for continual learning and staying informed of new technologies, infrastructure trends, and approaches in the AI/ML field.
- Strong programming and scripting skills in languages such as Python or Bash.
- Bachelor’s degree in computer science, Information Technology, or related technical field.
- 10+ years’ experience HPC Platform Engineer.
- Demonstrated experience leading a global large-scale infrastructure project.
- Hybrid role located in Indianapolis, IN (relocation required)