Description:
Emumba is looking for a talented and passionate AWS GenAI Data Engineer (Specialist) to join our team and play a vital role in building the future of generative AI solutions for our clients using AWS. In this role, you will bridge the gap between data engineering and genAI, ensuring our clients have a robust data foundation to power their innovative applications built on AWS GenAI service stack.
Key Responsibilities
- Collaborate with client stakeholders to understand their business goals and translate them into a data strategy for GenAI applications on AWS.
- Design and implement a scalable data infrastructure for data collection, storage, and management using AWS services like S3, RDS and DynamoDB.
- Develop and execute data pipelines using AWS Glue or other data integration services to extract, transform, and load (ETL) data for GenAI models.
- Clean, pre-process, and format client data, including unstructured data, to meet the specific requirements of GenAI applications built with Bedrock or Amazon Q. This may involve techniques like text cleaning, feature engineering, and data vectorization.
- Experience working with vector databases and graph databases is required to handle complex data structures used in GenAI models.
- Ensure data quality and integrity through data validation and monitoring techniques.
- Train and fine-tune generative AI models using AWS services.
- Stay up-to-date on the latest advancements in genAI technologies, AWS services for data engineering and GenAI (SageMaker, Bedrock, Amazon Q), and best practices.
- Document data pipelines and processes for maintainability and future reference.
- Work collaboratively with the GenAI team to ensure seamless integration of data pipelines with AI models.
- Identify and troubleshoot data-related issues that arise during the GenAI development process.
Requirements
Skills and Qualifications
- Strong understanding of AWS Architecture, AWS Well-Architected Framework, design guidelines, and best practices.
- Ability to work independently and as part of a team.
- Passion for AWS and cloud computing.
- Excellent communication and interpersonal skills.
Education And Experience
- Bachelor's degree in computer science or a related field.
- 4+ years of experience working in software development.
- strong understanding of data architectures, data warehousing, and data pipelines.
- Proficient in SQL and scripting languages like Python.
- Experience with data cleaning, transformation, and manipulation techniques.
- Experience with AWS data services is highly desirable.
- Working knowledge of generative AI concepts and applications is highly desirable.
- Previous experience working with RAG applications and/or fine-tuning LLMs is a plus.