Description:
We are looking for someone who thrives on autonomy and has experience driving long-term projects to completion. You are detail and quality oriented, and excited about the prospects of having a big impact with data. Our tech stack includes Airflow, EMR, PySpark, and various AWS services.
Responsibilities:
- Must Hands on Experience in Aws Services (EMR, S3, Redshift, Lambda) PySpark, Apache Airflow, Python and Ec2.
- Develops and maintains scalable data pipelines and builds out new API integrations to support continuing increases in data volume and complexity.
- Collaborates with analytics and business teams to improve data models that feed business intelligence tools, increasing data accessibility and fostering data-driven decision making across the organization.
- Implements processes and systems to monitor data quality, ensuring production data is always accurate and available for key stakeholders and business processes that depend on it.
- Writes unit/integration tests, contributes to engineering wiki, and documents work.
- Performs data analysis required to troubleshoot data related issues and assist in the resolution of data issues.
- Works closely with a team of frontend and backend engineers, product managers, and analysts.
- Designs data integrations and data quality framework.
- Designs and evaluates open source and vendor tools for data lineage.
- Works closely with all business units and engineering teams to develop strategy for long term data platform architecture.