Lead Data Platform Architect
Vir Biotechnology
- San Francisco, CA
- Permanent
- Full-time
- Lead the architecture, design, and deployment of an enterprise healthcare data lake.
- Help design core data platforms to allow the building of bioinformatics pipelines, machine learning models, data science applications and custom applications and dashboards.
- Participate as a member of the VIR data architecture forum group.
- Evaluate and select AWS technologies including data governance technologies to support building an enterprise healthcare data lake.
- Develop a scalable and reliable infrastructure that supports bioinformatics pipelines and machine learning model deployment.
- Be a liaison between IT, Machine Learning, Bioinformatics and the Data Engineering Team
- Review and enhance documentation and the Vir knowledgebase.
- 12+ years development experience in Python or similar object-oriented languages.
- 5+ years of experience as a full stack developer.
- Experience as lead or principal data engineer.
- Deep experience with AWS technologies, APIs, MLOps.
- Building data lakes using AWS or similar technologies.
- Designing and building data processing pipelines using Next flow, Airflow, or similar technologies.
- Enterprise data governance, the machine learning lifecycle and LLMs and their use.
- Code/Build/Deployment: GIT, Docker, Jenkins, Kubernetes.
- Scientific pipelines, i.e. Bioinformatics (genomic processing) pipelines, on AWS or other cloud vendors.
- Executive level dashboard using Tableau or Spotfire.
- Strong database experience, i.e. opensource vector databases like Milvus, Pinecone.
- Relevant certifications would include AWS Certified Data Analytics or AWS Certified Machine Learning.
- BS in Computer Science or related discipline, or equivalent industry experience.