
Staff Data Engineer
- Highlands Ranch, CO
- $211,300 per year
- Permanent
- Full-time
- Build MLOps pipelines to support model development, model production, model validation, model performance monitoring, model recalibration, continuous integration, and continuous delivery of Al/ML models.
- Responsible for new model development and existing model re-training, performance evaluation and score optimization.
- Design, develop, and implement Deep Learning methodologies, and newer ML model implementations at scale.
- Build and maintain high performing ETL processes including data quality and testing aligned across technology, internal reporting and other functional teams.
- Build ETL pipelines in Spark, Python, HIVE or Scala that process transaction and account level data and standardize data fields across various data sources.
- Create data dictionaries, setup/monitor data validation alerts and execute periodic jobs like performance dashboards, predictive models scoring for client's deliverables.
- Define and build technical/data documentation.
- Ensure data accuracy, integrity and consistency.
- Position reports to the Highlands Ranch, CO office and may allow for partial telecommuting.
- Master's degree in Computer Science, Engineering, Data Science, Business Analytics, or related field and 2 years of experience in the job offered or in a data engineer-related occupation.
- Alternatively, employer will accept a Bachelor's degree in Computer Science, Engineering, Data Science, Business Analytics, or related field, followed by 5 years of progressive, post-baccalaureate experience in the job offered or in a data engineer-related occupation.
- Position requires experience in the following skills:
- Deep Learning techniques.
- Federated learning and transfer learning.
- Programming in Spark, Python, HIVE, SQL, Presto and Scala.
- Working with Airflow and GitHub for building and maintaining ETL pipelines.
- Linear & Logistic Regression, Decision Trees, XG Boost, Random Forests, K-Nearest Neighbors, Markov Chain, Monte Carlo, Gibbs Sampling, Evolutionary Algorithms, and Support Vector Machines.
- Advanced data mining and statistical modeling techniques, including Predictive modeling, Classification techniques, and Decision Tree techniques.
- Working with large scale data ingestion, processing and storage in distributed computing environments or big data platforms (Hadoop) as well as common database systems and value stores (Parquet, Avro, or HBase).
- Linux or Shell Scripting.
- Working in building and integrating the code in the defined CI/CD framework using Git.
- Working and onboarding machine learning models to the MLOps framework.
- Maintenance and performance monitoring as part of the MLOps lifecycle using industry best practices and tools (MLflow, and Evidently).
- Experience with data visualization techniques and common industry data visualization tools: Tableau, PowerBI, and JS visualization libraries (D3.js).