DescriptionWe are seeking a Senior Data Engineer to lead the design, build, and operation of scalable data pipelines and domain-aligned data products on Databricks using Python. This role is accountable for delivering governed, discoverable, high-quality datasets and services aligned to data mesh principles (domain ownership, product thinking, and federated governance), enabling analytics and ML use cases with strong reliability, security, and performance.
Key Responsibilities
- Architect and deliver end-to-end data engineering solutions on Databricks (Spark/Delta Lake) using Python and SQL.
- Build data products aligned to data mesh principles: clear contracts, documentation, metadata, lineage, access controls, and measurable SLAs/SLOs.
- Develop and optimize batch and streaming pipelines with strong data quality checks and observability.
- Implement curated data layers and models (e.g., bronze/silver/gold as appropriate), including partitioning, file sizing, clustering strategies, and query optimization.
- Establish engineering standards: code quality, unit/integration testing, CI/CD, release governance, and reusable frameworks.
- Partner with governance, security, and platform teams to ensure compliance with firm policies (RBAC, encryption, retention, auditability) and smooth onboarding of consumers.
- Provide technical leadership across squads: architecture reviews, mentoring, and hands-on delivery for critical components.
AI-Enabled Engineering Expectations
- Be well-versed in using AI-assisted development tools to accelerate delivery while maintaining strong engineering rigor (secure coding, testing, and review discipline).
- Use AI tools for tasks such as code scaffolding, refactoring, test generation, documentation, and troubleshooting, with human validation and adherence to internal controls.
- Promote responsible adoption patterns within the team (e.g., prompt hygiene, avoiding sensitive data exposure, and ensuring outputs meet quality/security standards).
Required Qualifications
- 8–12+ years of experience in data engineering / software engineering with end-to-end delivery ownership.
- Strong hands-on expertise with AWS, Databricks and Delta Lake.
- Strong Python engineering skills (modular design, packaging, testing, dependency management).
- Strong SQL and data modeling fundamentals.
- Solid Spark fundamentals (joins, shuffle, partitioning, caching, skew handling, cluster sizing).
- Experience implementing CI/CD and SDLC best practices (Git-based workflows, code reviews, automated testing).
- Proven stakeholder management and ability to drive cross-team delivery.