Thermo Fisher Scientific Logo

Thermo Fisher Scientific

Lead Data Engineer

Posted 9 Days Ago
Be an Early Applicant
In-Office
Bengaluru, Bengaluru Urban, Karnataka
Senior level
In-Office
Bengaluru, Bengaluru Urban, Karnataka
Senior level
Lead Data Engineer to design, build, and maintain scalable ETL pipelines and cloud data platforms, optimize data models/warehouses, ensure data quality and governance, and collaborate with analytics and business teams.
The summary above was generated by AI

Work Schedule

Other

Environmental Conditions

Office

Job Description

Summarized Purpose:

We are seeking a Lead Data Engineer to own the complete lifecycle of enterprise data pipelines from development to production, including roadmap planning, scalable ETL architecture, AWS data services, secure PHI/PII handling, healthcare data standards, AI-assisted mapping automation, data quality, transformation, catalog standards, and RAG-enabled data solutions.

Education/Experience:

  • Bachelor's degree or equivalent in Computer Science, Information Technology, Data Engineering, or related field
  • 7+ years of experience in data engineering, ETL development, cloud data platforms, healthcare or regulated data environments, and production data pipeline delivery

Major Job Responsibilities:

  • Design, develop, deploy, and operate scalable ETL and data pipelines using PySpark, Python, advanced SQL, and AWS data services
  • Own data pipeline lifecycle from requirements, mapping, development, testing, deployment, monitoring, production support, release management, and future roadmap planning
  • Build ingestion and transformation pipelines for flat files, relational databases, APIs, data warehouses, healthcare data sources, and enterprise data platforms
  • Implement mapping automation, preferably using AI, along with LLM-assisted data cleaning, transformation, data quality checks, and RAG use cases
  • Implement secure handling of PHI/PII data including encryption, access controls, auditability, retention, masking, de-identification, governance, and operational readiness

Knowledge, Skills, and Abilities:

  • Advanced expertise in PySpark, Python, advanced SQL, ETL best practices, data modeling, and large-scale data processing
  • Strong hands-on experience with AWS services including S3, Glue, Lambda, Step Functions, ECS, DynamoDB, Redshift, RDS/PostgreSQL, and related data services
  • Experience with PostgreSQL, SQL Server, Redshift, flat files, complex source-to-target mappings, HL7, claims data, EMR extracts, and clinical trial data
  • Knowledge of data cataloging, metadata management, transformation standards, orchestration, monitoring, data quality, CI/CD, automated testing, and production support practices
  • Ability to lead technical design, mentor engineers, guide delivery decisions, troubleshoot complex issues, and communicate with cross-functional teams

Must Have Skills:

  • Advanced PySpark, Python, advanced SQL, ETL design, and data pipeline engineering expertise
  • AWS data services experience including S3, Glue, Lambda, Step Functions, ECS, DynamoDB, Redshift, PostgreSQL, and SQL Server integration
  • Secure PHI/PII handling, flat-file ingestion, source-to-target mapping, transformation, data catalog, governance, and healthcare data standards experience
  • CI/CD, GitHub workflows, automated testing, release management for data pipelines and database changes, and dev-to-prod pipeline ownership

Good to Have Skills:

  • AI-assisted mapping automation and use of LLMs for data cleaning, data quality checks, transformation logic, documentation, and patient de-identification support
  • Experience with RAG patterns, embeddings, vector databases, semantic search, or AI-enabled data discovery solutions
  • Familiarity with infrastructure as code such as Terraform or CloudFormation, plus streaming, Databricks, Snowflake, observability, and DevOps practices

Working Hours:

  • India: 05:30 PM to 02:30 AM IST
  • Philippines: 08:00 PM to 05:00 AM PHT

Similar Jobs

2 Days Ago
Hybrid
Senior level
Senior level
Information Technology
Design, build, and optimize scalable Spark/PySpark data pipelines on Databricks. Develop ETL/ELT workflows using AWS EMR, S3, and Hadoop/Hive. Build and maintain data lake and warehouse solutions, integrate APIs, orchestrate workflows with Airflow/Autosys, ensure data quality and governance, and tune performance. Collaborate with analytics, product, and engineering teams.
Top Skills: AirflowSparkAPIsAthenaAuroraAutosysAws Ec2Aws EmrAws LambdaCloudfrontData ModelingData WarehousingDatabricksEbsEfsElasticsearchETLGitGlueHadoopHiveHTMLLake FormationModern Data PlatformPl/SqlPysparkPythonS3ScalaShell ScriptingSQLStep FunctionsSvnUnix
2 Days Ago
Hybrid
Senior level
Senior level
Information Technology
Hands-on Lead Data Engineer to design, build, and scale Databricks-based ETL/ELT pipelines and lakehouse architectures using PySpark and Delta Lake. Responsibilities include ingestion (batch & real-time), Delta features, pipeline orchestration, Spark optimization, data quality/governance, production support, and migrating legacy platforms to cloud.
Top Skills: AirflowAws EmrAws GlueAws S3Azure Data FactoryAzure Data Lake StorageAzure SynapseCi/CdDatabricksDelta LakeGitPl/SqlPysparkPythonSpark SqlSQLT-Sql
14 Days Ago
In-Office
Senior level
Senior level
Artificial Intelligence • HR Tech • Professional Services • Software
Design, develop, and maintain scalable cloud-based data pipelines and ETL/ELT processes. Build and optimize data solutions on GCP (BigQuery, Dataflow), ensure data quality, governance, monitoring, and high availability. Develop backend data processing services in Java, support BI, analytics, and ML teams, troubleshoot pipeline issues, and contribute to data platform architecture and modernization.
Top Skills: BigQueryData WarehousingDataflowEtl/EltGoogle Cloud Platform (Gcp)JavaMonitoring/Automation ToolsSQLWorkflow Orchestration

What you need to know about the Kolkata Tech Scene

When considering the industries shaping India's tech scene, gaming might not immediately come to mind. However, in the last decade, increased internet usage and greater access to mobile devices have catapulted the industry to new heights, with Kolkata-based companies like Virtualinfocom, Red Apple Technologies and Digitoonz, at the forefront, driving the design and animation of new gaming titles for players.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account