
Advisor - Federated Learning Data Scientist
- Indianapolis, IN
- Permanent
- Full-time
- Multi-Task Model Design: Architect and implement advanced multi-task learning (MTL) models that effectively leverage shared representations across tasks to improve predictive performance and data efficiency in a federated ecosystem.
- Handling Data Heterogeneity: Develop novel algorithms specifically designed to address extreme task and feature heterogeneity across clients. This includes creating personalized models, implementing meta-learning approaches, or designing gradient aggregation methods that are robust to non-IID data.
- Knowledge Transfer & Regularization: Investigate and apply techniques to manage the balance between shared and task-specific learning. Implement regularization methods to prevent negative transfer (where learning one task hurts the performance of another) and encourage positive knowledge sharing.
- Problem Formulation: Collaborate closely with domain experts and stakeholders to define complex biological or chemical endpoints. Translate these scientific problems into a well-posed multi-task learning framework, identifying relevant tasks and data sources.
- Model Validation in MTL: Establish rigorous validation and evaluation frameworks for federated multi-task models. This includes defining appropriate metrics for each task and developing strategies to assess overall model performance and fairness across different clients and tasks.
- Interpretability and Explainability (XAI): Implement XAI techniques to understand and explain the predictions of complex multi-task models. Uncover the relationships between different endpoints as learned by the model to generate novel scientific insights.
- Code & Model Governance: Write clean, high-quality, and reproducible code. Contribute to internal libraries and ML platforms. Implement version control for data, code, and models to ensure robust and transparent research.
- Cross-Functional Collaboration: Work in a collaborative, multi-disciplinary team alongside software engineers, MLOps specialists, privacy experts, and domain scientists to translate research concepts into practical, impactful solutions.
- Literature Review & Innovation: Maintain a thorough understanding of the latest advancements in federated learning, deep learning, and related fields to drive innovation and contribute to the team's research strategy.
- PhD in a data science field such as Biostatistics, Statistics, Machine Learning, Computational Biology, Computational Chemistry, Physics, Applied mathematics, or related field from an accredited college or university
- Minimum of 2 years of experience in the biopharmaceutical industry or related fields, with demonstrated expertise in drug discovery and early development.
- Experience in developing statistical and machine learning models for complex endpoints.
- Broad understanding of emerging scientific and technical breakthroughs.
- Exceptional interpersonal and communication skills, with a keen ability to understand, empathize, and navigate complex relationships and dynamics
- Outstanding EQ, problem-solving, analytical, project management skills.
- Highly self-motivated and organized.
- Demonstrated ability to connect and influence at various levels across disciplines, both externally and internally.
- Learning Agility: Ability to quickly adapt to changing circumstances, learn from past experiences, and apply those learnings to new situations.
- Portfolio Mindset: Strong ability to think with a portfolio-level mentality, ensuring that individual program decisions align with the overall goals of Catalyze360.
- Independent, self-starter, work without supervision
- This is a site-based role in Indianapolis (preferred) or San Diego (preferred) or San Francisco or Boston and relocation is provided.