Position Title: Data Engineer
Position Type: Regular - Full-Time
Position Location: Gurgaon
Grade: Grade 05
Requisition ID: 33488
Position Summary
Data engineers are mainly responsible for designing, building, managing, and operationalizing data pipelines to support key data and analytics use cases. They play a crucial role in constructing and maintaining a modern, scalable data platform that utilizes the full capabilities of a Lakehouse Platform.
You will be a key contributor to our data-driven organization, playing a vital role in both building a modern data platform and maintaining our Enterprise Data Warehouse (EDW). You will leverage your expertise in the Lakehouse Platform to design, develop, and deploy scalable data pipelines using modern and evolving technologies. Simultaneously, you will take ownership of the EDW architecture, ensuring its performance, scalability, and alignment with evolving business needs. Your responsibilities will encompass the full data lifecycle, from ingestion and transformation to delivery of high-quality datasets that empower analytics and decision-making.
Duties and responsibilities
Build data pipelines using Azure Databricks:
- Build and maintain scalable data pipelines and workflows within the Lakehouse environment.
- Transform, cleanse, and aggregate data using Spark SQL or PySpark.
- Optimize Spark jobs for performance, cost efficiency, and reliability.
- Develop and manage Lakehouse tables for efficient data storage and versioning.
- Utilize notebooks for interactive data exploration, analysis, and development.
- Implement data quality checks and monitoring to ensure accuracy and reliability.
Drive Automation:
- Implement automated data ingestion processes using functionality available in the data platform, optimizing for performance and minimizing manual intervention.
- Design and implement end-to-end data pipelines, incorporating transformations, data quality checks, and monitoring.
- Utilize CI/CD tools (Azure DevOps/GitHub Actions) to automate pipeline testing, deployment, and version control.
Enterprise Data Warehouse (EDW) Management:
- Create and maintain data models, schemas, and documentation for the EDW.
- Collaborate with data analysts, data scientists and business stakeholders to gather requirements, design data marts, and provide support for reporting and analytics initiatives.
- Troubleshoot and resolve any issues related to data loading, transformation, or access within the EDW.
Educate and train: The data engineer should be curious and knowledgeable about new data initiatives and how to address them. This includes applying their data and/or domain understanding in addressing new data requirements. They will also be responsible for proposing appropriate (and innovative) data ingestion, preparation, integration and operationalization techniques in addressing these data requirements. The data engineer will be required to train counterparts in these data pipelining and preparation techniques.
Ensure compliance with data governance and security: The data engineer is responsible for ensuring that the data sets provided to users are compliant with established governance and security policies. Data engineers should work with data governance and data security teams while creating new and maintaining existing data pipelines to guarantee alignment and compliance.
Qualifications
Education
Bachelor or Masters in Computer Science, Information Management, Software Engineering, or equivalent work experience.
Work Experience
At least four years or more of working in data management disciplines including: data integration, modeling, optimization and data quality, and/or other areas directly relevant to data engineering responsibilities and tasks.
At least three years of experience working in cross-functional teams and collaborating with business stakeholders in support of a departmental and/or multi-departmental data management and analytics initiative.
Technical knowledge, Abilities, and skills
Ability to design, build and manage data pipelines for data structures encompassing data transformation, data models, schemas, metadata, and workload management. The ability to work with both IT and business in integrating analytics and data science output into business processes and workflows.
Strong knowledge of database programming languages and hands on experience with any RDBMS.
Ability to work with large, heterogeneous datasets, build and optimize data pipelines, pipeline architectures and integrated datasets using traditional data integration technologies. These should include ETL/ELT, data replication, CDC, API design and access, and other data ingestion and integration technologies such as data streaming and data virtualization.
Good knowledge of advanced analytics languages like R, Python, and others.
Basic knowledge of popular data discovery, analytics and BI tools like Power BI, Tableau, Qlik, MicroStrategy, and alike.
Ability to work with data science teams in refining and optimizing data science and machine learning models and algorithms.
Able to work with large, heterogeneous datasets to extract business value using popular data preparation tools.
Strong knowledge of data warehousing architecture, able to design, build, and implement relational and multi-dimensional data models.
Strong understanding of industry data management standards like Data Mesh, Lakehouse, and Medallion architecture.
Strong understanding of data governance, data stewardship, data quality, data privacy, and data security.
Ability to work across multiple deployment environments including cloud, on-premises and hybrid.
Strong understanding of agile methodologies and capable of applying it.
Strong problem solving skills.
Interpersonal Skills and Characteristics
Able to collaborate with both the business and IT teams to define the business problem, refine the requirements, and design and develop data deliverables accordingly.
Good judgment, a sense of urgency, and commitment to high standards of ethics, regulatory compliance, customer service and business integrity.
Strong drive to stay current with industry best practices and trends on data acquisition, data modeling, data warehousing, and Big Data technologies.
Core Competencies
- Ensures accountability
- Collaborates
- Courage
- Customer focus
- Being resilient
- Drives results
McCain Foods is an equal opportunity employer. We see value in ensuring we have a diverse, antiracist, inclusive, merit-based, and equitable workplace. As a global family-owned company we are proud to reflect the diverse communities around the world in which we live and work. We recognize that diversity drives our creativity, resilience, and success and makes our business stronger.
McCain is an accessible employer. If you require an accommodation throughout the recruitment process (including alternate formats of materials or accessible meeting rooms), please let us know and we will work with you to meet your needs.
Your privacy is important to us. By submitting personal data or information to us, you agree this will be handled in accordance with the Global Privacy Policy
Job Family: Information Technology
Division: Global Digital Technology
Department: Data and Analytics
Location(s): IN - India : Haryana : Gurgaon
Company: McCain Foods(India) P Ltd