
Curated Data Integration Engineer
- Minneapolis, MN
- $70,000-110,000 per year
- Permanent
- Full-time
Employment Type: Full-Time | Long-Term
Location: Hybrid from the South area of Minneapolis.
Time Zone: The client operates in the ET time zone.
Industry: Airport ManagementAt Kopius, we are seeking a Curated Data Integration Engineer to join our team! The project operates in the Airport Management sector and is based in Romulus, MI. This organization oversees airport facilities and operations, providing safe, efficient, and reliable air travel services.About the role We are seeking a Curated Data Integration Engineer who will focus on turning cleansed data into business-ready, Curated layer datasets for analytics and reporting. This role centers on combining and refining data from the Enriched zone - which may include applying master data management rules and integrating geospatial context - to produce trusted, analytics-friendly tables and views. The engineer will work closely with our Master Data Management system and the Enterprise GIS team to ensure that Curated data aligns with master data definitions and supports situational awareness use cases. In this position, you will build Delta Lake tables and SQL views that serve as the single source of truth for various business domains, accessible via Synapse serverless SQL and Power BI. You will also implement data quality checks, complex joins, and data harmonization logic to guarantee that the Curated zone data is consistent, accurate, and ready for consumption.Responsibilities:
- Combine data from multiple Enriched zone tables to create business-ready Curated (Gold) datasets.
- Design and build denormalized tables, star-schema models, or consolidated views that meet specific reporting and analytics requirements (e.g. situational awareness dashboards, enterprise KPIs).
- Ensure these Curated Delta tables are optimized (partitioning, indexing as appropriate) for query performance via Synapse serverless SQL pools and Power BI.
- Apply master data management (MDM) and data quality rules during transformation.
- Use Profisee MDM information (master records, hierarchies, reference data) to merge and reconcile records from different sources, achieving a single version of truth.
- Implement data cleansing, deduplication, and consistency checks so that Curated data is reliable and free of significant errors or mismatches.
- Work with the Enterprise GIS team to integrate geospatial attributes or ensure that Curated datasets align with spatial data (e.g., location IDs, facility maps) for situational awareness.
- While the GIS team manages ArcGIS, you will coordinate to incorporate any necessary GIS keys or outputs into the Curated tables so that dashboards can link tabular data with maps or spatial visualizations seamlessly.
- Engage with business stakeholders (operations, safety, finance, etc.) to understand the information needs for dashboards and reports.
- Translate those needs into Curated data models, ensuring that the resulting tables capture the required metrics and dimensions.
- Collaborate with the Data Architect to align Curated layer design with the overall lakehouse architecture and follow best practices for schema design and documentation.
- Develop and maintain the data pipelines (using Synapse Spark notebooks, SQL scripts, or Synapse data flows) that transform Enriched zone data into Curated. This includes writing complex SQL/Spark code for joins, aggregations, and handling incremental updates.
- Ensure all transformation logic is parameterized and documented within metadata-driven framework so that changes in upstream metadata (new columns, schema changes) can be accommodated with minimal code change.
- Implement validation checks at the end of pipelines to verify row counts, data distributions, and referential integrity between Curated tables. When issues are identified, work with source data owners or adjust transformation logic to address data quality gaps.
- Continuously monitor query performance on Curated tables and optimize as needed (through caching, aggregations, or refinement of the data model) to meet SLAs for report refresh and responsiveness.
- Self-motivated, quick learner, and adaptable to new technologies and legacy systems.
- 4+ years of experience in data engineering or BI development, with a focus on building data transformations and integrations for analytics.
- Solid understanding of data warehouse/lakehouse concepts and hands-on experience crafting fact and dimension tables, aggregate tables, or materialized views.
- Proficient in using Azure Synapse Analytics for data pipeline development - including Spark notebooks for complex processing and serverless SQL pools for creating views or querying data.
- Experience with Delta Lake format on Azure Data Lake Storage (ADLS) - able to read/write Delta tables, handle schema evolution, and use Delta time-travel or ACID features in transformations.
- Strong SQL skills for data manipulation, analysis, and performance tuning (e.g., writing efficient joins, window functions).
- Experience with PySpark or Spark SQL in a data lake environment to perform transformations that go beyond SQL capabilities when needed (using Python for custom business logic in notebooks).
- Familiarity with Master Data Management principles and experience integrating master data or reference data into data pipelines.
- Able to implement data quality checks, identify anomalies, and reconcile data from different sources.
- Understanding of how to harmonize data (e.g., resolving duplicate entities, applying consistent codes and definitions) across a large organization.
- Ability to understand business context and requirements, especially in operational reporting or situational awareness scenarios.
- Good communication skills to work with business users and translate their KPI or reporting needs into technical data solutions.
- Capable of writing clear documentation and performing user demos of Curated data outputs.
- Prior experience working in a medallion architecture (Raw/Enriched/Curated) or lakehouse environment is highly preferred.
- Understanding of metadata-driven pipelines and how each zone (Raw, Enriched, Curated) is used in such frameworks.
- Experience with Profisee MDM or other MDM tools (e.g., Microsoft MDS, Informatica MDM) in a data engineering context.
- Exposure to GIS data or spatial analytics (ArcGIS, PostGIS, etc.) is a plus, indicating an ability to understand and join geospatial data with enterprise data.
- Knowledge of Power BI report development or optimization, and familiarity with how Power BI interacts with Azure Synapse (via serverless SQL or DirectQuery to Delta).
- Azure Data Engineer Associate certification or similar.
- Experience in government, airport operations, transportation, or other regulated industries which provides relevant domain context is beneficial.
- Applicants should be eligible to work in the US without requiring sponsorship.
- This is a full-time, long-term and hybrid opportunity from the US.
- This role is will share time between billable projects and internal solution and innovation work and will heavy communication customers.
- Pay range: $70,000- $110,000 anually. Individual pay is determined by multiple factors, including job-related knowledge, skills, experience, and relevant education or training.
- Flex PTO, plus 11 paid holidays!
- 12 weeks paid Parental Leave.
- Flexible working hours!
- Fully paid medical/dental/vision benefits for you and your family
- Numerous company and team building events.
- Professional Development Fund for continuing education and training!
- Short & Long-Term Disability fully paid for Life Insurance.
- 401(k)- with employer matching up to 4% of salary contributions.
- Community and Connection: Foster team connection; contribute to a collaborative environment.
- Exceptional Experiences: Strive for outstanding communication experiences; align actions with delivering excellence.
- Embrace New Ideas: Demonstrate openness to innovation; adapt and learn quickly.
- Ownership Mentality: Take responsibility and ownership of tasks and projects; drive issues to resolution.
- Walk the Talk: Consistently align actions, communication, and problem-solving with company values.