
Sr. Engineer, AI Model Evaluation
- San Jose, CA
- $22,000-30,000 per year
- Permanent
- Full-time
- United States of America - California - San Jose
- Design, implement, and evaluate comprehensive evaluation pipelines for large generative AI models, encompassing various metrics and methodologies.
- Evaluate the performance of publicly available models, and discuss their relative advantages and disadvantages.
- Establish and maintain benchmarks for evaluating model performance across a range of tasks and datasets.
- Conduct thorough error analysis to identify patterns in model failures and provide actionable insights for improvement.
- Design and implement methods to detect and mitigate biases in model outputs, ensuring fairness and equitable performance.
- Develop and execute robustness tests to assess model resilience against adversarial inputs, noise, and variations in real-world data.
- Assess model safety, including identifying and mitigating harmful or inappropriate outputs.
- Experiment with various evaluation techniques, metrics, and datasets to optimize model quality and reliability.
- Contribute to the development and refinement of evaluation metrics that accurately reflect model performance and desired characteristics.
- Clearly communicate evaluation results and insights to engineers, researchers, and stakeholders.
- Identify potential partnerships with third parties.
- Develop and maintain evaluation tools and infrastructure.
- Monitor and analyze model performance in production environments, identify degradation, and propose solutions.
- Stay up-to-date with the latest advancements in large language and multi-modal models, model evaluation techniques, metrics, and related technologies.
- Contribute to the development of internal tools and infrastructure for model evaluation and monitoring.
- Bachelor's or Master's degree in Computer Science, Machine Learning, or a related field.
- 12+ years of development experience
- Strong programming skills in Python and experience with deep learning frameworks like PyTorch.
- Deep understanding of machine learning evaluation principles, including various metrics (e.g., BLEU, ROUGE, perplexity, F1-score) and methodologies.
- Proven ability to design and conduct rigorous experiments, analyze data, and draw meaningful conclusions.
- Familiarity with large language models, transformer architectures, and related concepts.
- Experience with data processing tools and techniques (e.g., Pandas, NumPy).
- Experience working with Linux systems and/or HPC cluster job scheduling (e.g., Slurm, PBS).
- Ph.D. in Computer Science, Machine Learning, or a related field.
- Excellent communication, collaboration, and problem-solving skills.
- Experience with automated model evaluation frameworks and tools.
- Experience with techniques for detecting and mitigating bias in AI models.
- Experience with safety and alignment evaluation methodologies.
- Experience with A/B testing and online evaluation techniques.
- United States of America - California - San Jose
- United States of America
- United States of America - California
- United States of America - California - San Jose