Description:
We are a tight-knit team responsible for ensuring that our many production systems deliver quality products in accordance with contractual timelines.
What’s In It For You
- You will be part of the SRE Team for MI Data Delivery Engineering and will be assisting multiple development teams
- Full filling several application teams’ requirements by deploying end-to-end CI/CD pipeline setup on cloud infrastructure.
- Migrate the legacy applications from on-premises to AWS cloud infrastructure.
- Identify and Automate BAU toil
- Improve reliability, scalability, quality, and observability for multiple data delivery applications
- Ensure service level objectives through proactive monitoring, custom tooling, and on-call support/troubleshooting for all data delivery systems/products.
- Our Tech Stack Includes: Amazon Web Services, DevOps(Kubernetes, Rundeck, Datadog, Docker), Autosys, Linux, Windows JavaScript, and Python
Responsibilities
- Overseeing the transfer of data from traditional to cloud-based services
- Experience with understanding business proposals and transforming them into technical solutions.
- Participate in the design of information and operational support systems
- Creating production and migration schedules for large projects with timelines/milestones
- Develop and leverage AWS tools and services to manage and automate key operations capabilities. This includes AWS Systems Manager, Patch Manager, Cloud Formation, and custom scripting to extend the AWS services.
- Proactively ensure the highest levels of systems and infrastructure availability
- Monitor and test application performance for potential bottlenecks, identify possible solutions and work with developers to implement those fixes.
- Write and maintain custom scripts to increase system efficiency and reduce human intervention time on tasks.
- Maintain security, backup, and redundancy strategies
- Provide 3rd-level support for AWS infrastructure.
- Increase alerting & monitoring quality, Reduce Alarm noise, and Increase Observability Gaps
- Optimize Cloud Costing and analyze Capacity Planning
- Reduce Operations exposure, Increase the pace of incidents recovery, and Implement Resiliency and remediation plans
- Identifying and correcting problems stemming from audit and compliance.
- Liaise with vendors and other IT personnel for problem resolution