
Senior Data Scientist (NLP)
- Ann Arbor, MI
- $117,000-147,000 per year
- Permanent
- Full-time
- Bachelor’s degree in Computer Science, Data Science, Computational Linguistics, or a related field
- At least 5 years of hands-on experience in data science, focused on natural language processing (NLP)
- At least 5 years of experience using Python, with expertise in NLP libraries such as LangChain, LangGraph, or other “Lang”-based toolkits
- Proven experience in model development and applying machine learning techniques to real-world problems
- Expertise in retrieval-based LLM workflows (RAG, VRAG, GraphRAG)
- Deep understanding of embedding models, semantic search, and vector stores (e.g., FAISS, Pinecone)
- Experience with document loaders and text splitters/document splitting strategies
- Familiarity with MLOps practices and production-level deployment of AI pipelines
- Experience with cloud platforms (e.g., AWS, Azure, or GCP)
- Experience applying Graph Neural Networks (GNNs) to retrieval-enhanced generation
- Knowledge of LangSmith and vector orchestration platforms
- Familiarity with multilingual NLP and cross-lingual embeddings
- Exposure to real-time knowledge graphs and stream-based RAG systems
- A Master’s or PhD in a technical field (Computer Science, Data Science, etc.)
- Design NLP Workflows: Develop scalable pipelines for text ingestion, cleaning, normalization, and tokenization to support downstream applications.
- Implement Indexing and Vectorization Strategies: Architect and maintain robust indexing systems and vector databases for semantic search and retrieval.
- Develop Prompting and Finetuning Frameworks: Create reusable prompting strategies and lead fine-tuning initiatives for LLMs tailored to business-specific tasks.
- Build LangChain/LangGraph Applications: Construct dynamic knowledge systems and agentic workflows using LangChain and LangGraph.
- Integrate Advanced RAG Architectures: Apply VRAG and GraphRAG design patterns to enrich information retrieval and contextual understanding.
- Conduct Performance Optimization: Perform benchmark testing and model evaluations to improve accuracy, efficiency, and scalability of NLP systems.
- Collaborate Across Teams: Work closely with engineering, product, and research stakeholders to deliver integrated AI-driven features.
- Provide Technical Leadership: Mentor junior data scientists, guide best practices, and drive innovation across AI projects.
- Full-time permanent position, primarily working core business hours in your time zone, with flexibility to adjust to various global time zones as needed
- Fully remote position based in the US