Lead 1, Site Reliability Engineer

Description:

We are a tight-knit team responsible for ensuring that our many production systems deliver quality products in accordance with contractual timelines.

What’s In It For You

You will be part of the SRE Team for MI Data Delivery Engineering and will be assisting multiple development teams
Full filling several application teams’ requirements by deploying end-to-end CI/CD pipeline setup on cloud infrastructure.
Migrate the legacy applications from on-premises to AWS cloud infrastructure.
Identify and Automate BAU toil
Improve reliability, scalability, quality, and observability for multiple data delivery applications
Ensure service level objectives through proactive monitoring, custom tooling, and on-call support/troubleshooting for all data delivery systems/products.
Our Tech Stack Includes: Amazon Web Services, DevOps(Kubernetes, Rundeck, Datadog, Docker), Autosys, Linux, Windows JavaScript, and Python

Responsibilities

Overseeing the transfer of data from traditional to cloud-based services
Experience with understanding business proposals and transforming them into technical solutions.
Participate in the design of information and operational support systems
Creating production and migration schedules for large projects with timelines/milestones
Develop and leverage AWS tools and services to manage and automate key operations capabilities. This includes AWS Systems Manager, Patch Manager, Cloud Formation, and custom scripting to extend the AWS services.
Proactively ensure the highest levels of systems and infrastructure availability
Monitor and test application performance for potential bottlenecks, identify possible solutions and work with developers to implement those fixes.
Write and maintain custom scripts to increase system efficiency and reduce human intervention time on tasks.
Maintain security, backup, and redundancy strategies
Provide 3rd-level support for AWS infrastructure.
Increase alerting & monitoring quality, Reduce Alarm noise, and Increase Observability Gaps
Optimize Cloud Costing and analyze Capacity Planning
Reduce Operations exposure, Increase the pace of incidents recovery, and Implement Resiliency and remediation plans
Identifying and correcting problems stemming from audit and compliance.
Liaise with vendors and other IT personnel for problem resolution

Organization	S&P Global
Industry	IT / Telecom / Software Jobs
Occupational Category	Site Reliability Engineer
Job Location	Islamabad,Pakistan
Shift Type	Morning
Job Type	Full Time
Gender	No Preference
Career Level	Intermediate
Experience	2 Years
Posted at	2023-06-20 1:32 pm
Expires on	2024-12-28