
AIML - ML Infrastructure Engineer, ML Platform & Technology - ML Compute
- San Francisco, CA
- Permanent
- Full-time
- Bachelors in Computer Science, engineering, or a related field
- 1+ years of hands-on experience in building scalable backend systems for training and evaluation of machine learning models
- Proficient in relevant programming languages, like Python or Go
- Strong expertise in distributed systems, reliability and scalability, containerization, and cloud platforms
- Proficient in cloud computing infrastructure and tools: Kubernetes, Ray, PySpark
- Ability to clearly and concisely communicate technical and architectural problems, while working with partners to iteratively find solutions
- Advance degrees in Computer Science, engineering, or a related field
- Proficient in working with and debugging accelerators, like: GPU, TPU, AWS Trainium
- Proficient in ML training and deployment frameworks, like: JAX, Tensorflow, PyTorch, TensorRT, vLLM