Thermo Fisher Scientific Jobs

Lead Data Engineer

Thermo Fisher Scientific

Lead Data Engineer

Posted 9 Days Ago

Be an Early Applicant

In-Office

Bengaluru, Bengaluru Urban, Karnataka

Senior level

In-Office

Bengaluru, Bengaluru Urban, Karnataka

Senior level

Lead Data Engineer to design, build, and maintain scalable ETL pipelines and cloud data platforms, optimize data models/warehouses, ensure data quality and governance, and collaborate with analytics and business teams.

The summary above was generated by AI

Work Schedule

Other

Environmental Conditions

Office

Job Description

Summarized Purpose:

We are seeking a Lead Data Engineer to own the complete lifecycle of enterprise data pipelines from development to production, including roadmap planning, scalable ETL architecture, AWS data services, secure PHI/PII handling, healthcare data standards, AI-assisted mapping automation, data quality, transformation, catalog standards, and RAG-enabled data solutions.

Education/Experience:

Bachelor's degree or equivalent in Computer Science, Information Technology, Data Engineering, or related field
7+ years of experience in data engineering, ETL development, cloud data platforms, healthcare or regulated data environments, and production data pipeline delivery

Major Job Responsibilities:

Design, develop, deploy, and operate scalable ETL and data pipelines using PySpark, Python, advanced SQL, and AWS data services
Own data pipeline lifecycle from requirements, mapping, development, testing, deployment, monitoring, production support, release management, and future roadmap planning
Build ingestion and transformation pipelines for flat files, relational databases, APIs, data warehouses, healthcare data sources, and enterprise data platforms
Implement mapping automation, preferably using AI, along with LLM-assisted data cleaning, transformation, data quality checks, and RAG use cases
Implement secure handling of PHI/PII data including encryption, access controls, auditability, retention, masking, de-identification, governance, and operational readiness

Knowledge, Skills, and Abilities:

Advanced expertise in PySpark, Python, advanced SQL, ETL best practices, data modeling, and large-scale data processing
Strong hands-on experience with AWS services including S3, Glue, Lambda, Step Functions, ECS, DynamoDB, Redshift, RDS/PostgreSQL, and related data services
Experience with PostgreSQL, SQL Server, Redshift, flat files, complex source-to-target mappings, HL7, claims data, EMR extracts, and clinical trial data
Knowledge of data cataloging, metadata management, transformation standards, orchestration, monitoring, data quality, CI/CD, automated testing, and production support practices
Ability to lead technical design, mentor engineers, guide delivery decisions, troubleshoot complex issues, and communicate with cross-functional teams

Must Have Skills:

Advanced PySpark, Python, advanced SQL, ETL design, and data pipeline engineering expertise
AWS data services experience including S3, Glue, Lambda, Step Functions, ECS, DynamoDB, Redshift, PostgreSQL, and SQL Server integration
Secure PHI/PII handling, flat-file ingestion, source-to-target mapping, transformation, data catalog, governance, and healthcare data standards experience
CI/CD, GitHub workflows, automated testing, release management for data pipelines and database changes, and dev-to-prod pipeline ownership

Good to Have Skills:

AI-assisted mapping automation and use of LLMs for data cleaning, data quality checks, transformation logic, documentation, and patient de-identification support
Experience with RAG patterns, embeddings, vector databases, semantic search, or AI-enabled data discovery solutions
Familiarity with infrastructure as code such as Terraform or CloudFormation, plus streaming, Databricks, Snowflake, observability, and DevOps practices

Working Hours:

India: 05:30 PM to 02:30 AM IST
Philippines: 08:00 PM to 05:00 AM PHT

Similar Jobs

Brillio

Lead Data Engineer

2 Days Ago

Hybrid

Senior level

Information Technology

Design, build, and optimize scalable Spark/PySpark data pipelines on Databricks. Develop ETL/ELT workflows using AWS EMR, S3, and Hadoop/Hive. Build and maintain data lake and warehouse solutions, integrate APIs, orchestrate workflows with Airflow/Autosys, ensure data quality and governance, and tune performance. Collaborate with analytics, product, and engineering teams.

Top Skills: AirflowSparkAPIsAthenaAuroraAutosysAws Ec2Aws EmrAws LambdaCloudfrontData ModelingData WarehousingDatabricksEbsEfsElasticsearchETLGitGlueHadoopHiveHTMLLake FormationModern Data PlatformPl/SqlPysparkPythonS3ScalaShell ScriptingSQLStep FunctionsSvnUnix

Brillio

Lead Data Engineer

2 Days Ago

Hybrid

Senior level

Information Technology

Hands-on Lead Data Engineer to design, build, and scale Databricks-based ETL/ELT pipelines and lakehouse architectures using PySpark and Delta Lake. Responsibilities include ingestion (batch & real-time), Delta features, pipeline orchestration, Spark optimization, data quality/governance, production support, and migrating legacy platforms to cloud.

Top Skills: AirflowAws EmrAws GlueAws S3Azure Data FactoryAzure Data Lake StorageAzure SynapseCi/CdDatabricksDelta LakeGitPl/SqlPysparkPythonSpark SqlSQLT-Sql

Weekday, Inc.

Lead Data Engineer

14 Days Ago

In-Office

Senior level

Artificial Intelligence • HR Tech • Professional Services • Software

Design, develop, and maintain scalable cloud-based data pipelines and ETL/ELT processes. Build and optimize data solutions on GCP (BigQuery, Dataflow), ensure data quality, governance, monitoring, and high availability. Develop backend data processing services in Java, support BI, analytics, and ML teams, troubleshoot pipeline issues, and contribute to data platform architecture and modernization.

Top Skills: BigQueryData WarehousingDataflowEtl/EltGoogle Cloud Platform (Gcp)JavaMonitoring/Automation ToolsSQLWorkflow Orchestration

What you need to know about the Kolkata Tech Scene

When considering the industries shaping India's tech scene, gaming might not immediately come to mind. However, in the last decade, increased internet usage and greater access to mobile devices have catapulted the industry to new heights, with Kolkata-based companies like Virtualinfocom, Red Apple Technologies and Digitoonz, at the forefront, driving the design and animation of new gaming titles for players.