About CACI Data Engineering CACI has implemented a Data Platform that supports and enables a Data Mesh organisation. It uses AWS technology to provide a unified experience across AWS services to deliver an open federated Lakehouse, and a unified User Experience. The Data Platform focusses on enabling decentralised management, processing, analysis and delivery of data, while enforcing corporate wide federated governance on data, and project environments across business domains.
The goal is to empower multiple teams to create and manage high integrity data and data products that are analytics and AI ready and consumed internally and externally.
Data Engineers will be part of a community of data engineers and are often embedded in different business units to develop data pipelines and products, maintaining these for use in consulting engagements and service delivery to clients.
Data Engineers will work with the platform, corporate standards and processes. In a business that values continuous improvement, data engineers are expected to contribute to improving the wider data engineering capability in the business.
What does a Data Engineer do? A Data Engineer partners closely with business units to maintain existing cloud data architectures, as well as assess, design and execute the migration of their existing cloud and on premise/desktop data products and workflows onto a modern cloud data platform
This involves understanding current data architectures, dependencies and transformation logic, and translating that into cloud native solutions in harmony with the Company data platform, governance and strategy.
The Data Engineer will need skills in developing data pipelines that operate across a medallion architecture, with an obsession on data quality and integrity of the products and processes and always considering the cost benefit of different approaches. This will need attention to technical rigour and pragmatic delivery.
You will leverage skills in AWS Services such as Glue, EMR, S3, MWAA (Apache Airflow), Step Functions, Redshift, Sagemaker Unified Studio, as well as an understanding of traditional RDBMS in Postgres, Oracle and SQL Server. SQL, Python and PySpark will be essential in this role. Experience with IaC such as cloud formation will be an important skill to have, or to develop.
You will be able to design architectures and create re-useable solutions to reflect the business needs. It is important that we create “Patterns” that we can reuse for multiple products and services, so that we are note designing new solutions for every need.
Some products will have linear data ingest, transformation and publishing requirements, others will require extensive analytical work in the pipelines, including machine learning, deep learning and the potential to integrate more unstructured data as new products are built and matured.
Responsibilities will include: Collaborating across CACI departments to develop and maintain data products and the data platform
Designing and implementing data processing environments and integrations using AWS PaaS such as Glue, S3, Lambda, Fargate, EMR, Sagemaker, Redshift, Aurora and Snowflake
Data architecture and data modelling across full data lifecycles, as well as more detailed modelling requirements of databases and data products.
Building data processing and analytics pipelines as code, using python, SQL, PySpark, Spark, CloudFormation, Lambda, Step Functions, Apache Airflow
Designing and applying security and access control architectures to secure sensitive data
Enabling business units by working with them to deliver complete and manageable solutions; while providing support and expert advice.
You will have: 4 years of experience in a Data Engineering role.
Strong experience and knowledge of data architectures implemented in AWS using native AWS services such as S3, DataZone, Glue, EMR, Sagemaker, Aurora and Redshift using Python, PySpark, and SQL
Experience developing and administrating databases, data platforms and solutions
Good coding discipline in terms of style, structure, versioning, documentation and unit tests
A well developed understanding of a data mesh organisation, as well as Master and Reference Data Management
Experience migrating legacy to modern, as well as modern to modern. Knowledge and experience of relational databases such as Postgres, Redshift
Experience using Git for code versioning, and lifecycle management Experience operating to Agile principles and ceremonies
Hands-on experience with CI/CD tools such as GitLab
Strong problem-solving skills and ability to work independently or in a team environment.
Excellent communication and collaboration skills.
A keen eye for detail, and a passion for accuracy and correctness in numbers
Whilst not essential, the following skills would also be useful: Experience using Jira, or other agile project management and issue tracking software
Experience with Spatial Data Processing, technology and approaches.
Experience with Machine Learning and AI workflows
Experience with IaC such as cloud formation or Terraform