Senior Lead Site Reliability Engineer

Mumbai, Maharashtra, India Full time Technology

Create a free account to apply in seconds

Guide and shape the future of technology at a globally recognized firm, driven by pride in ownership.

As a Senior Manager of Site Reliability Engineering at JPMorgan Chase within the Finance technology team which is aligned to Corporate Technology Division, you are the non-functional requirement owner and champion for the applications in your remit. You are a key influencer in your team’s strategic planning, driving continual improvement in customer experience, resiliency, security, scalability, monitoring, instrumentation, and automation of the software in your area. You act in a blameless, data-driven manner and navigate difficult situations with composure and tact.

Job responsibilities

• Demonstrates expertise in site reliability principles and demonstrates an understanding of the fine balance between features, efficiency, and stability

• Effectively negotiates with peers and executive partners to ensure optimal outcomes for all

• Drives the adoption of site reliability practices throughout the organization

• Ensures your teams demonstrate site reliability best practices with the ability to demonstrate this empirically through stability and reliability metrics

• Drives a culture of continual improvement and solicits real-time feedback to improve the customer’s experience

• Ensures your team collaborates with other teams within your group’s specialization and avoids duplication of work where possible

• Follows blameless, data-driven, post-mortem strategies and conducts regular team debriefs to enable learning from both successes and mistakes

• Provides personalized coaching for entry to mid-level team members

• Ensures your team documents and shares their knowledge and innovations via internal forums, communities of practice, guilds, and conferences

• Focuses on reducing manual toil for the team.

• Works with the larger organization to identify & prioritize fixes for Stability and improving SLOs

Required qualifications, capabilities and skills

• Formal training or certification on software engineering concepts and 5+ years applied experience

• Advanced proficiency in site reliability culture and principles and can demonstrate how to implement site reliability across application and platform teams while avoiding common pitfalls

• Experience leading technologists to manage and solve complex technological issues at a firmwide level

• Ability to influence the team’s culture by championing innovation and change for success

• Experience hiring, developing, and recognizing talent

• Fluency and Proficiency in at least one programming language (e.g., Python, Java Spring Boot, .Net, etc.)

• Proficiency in software applications and technical processes within a technical discipline (e.g., cloud, artificial intelligence, machine learning, mobile, etc.)

• .Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, GitLab, Terraform, etc.)

• Experience building Observability platforms using OTEL, Dynatrace, Cloudwatch, Datadog, Grafana, Prometheus, etc

• Experience with container and container orchestration (e.g., ECS, Kubernetes, Docker, etc.)

Preferred qualifications, capabilities and skills

• Experience working in Big Data and cloud based applications, AWS, S3 is preferred

• Experience in Financial Domain

Skills

Site Reliability EngineeringLeadershipInfluencingContinuous ImprovementCoachingProgramming (Python, Java Spring Boot, .Net)Continuous Integration and Continuous Delivery (CI/CD)Observability Platforms (OTEL, Dynatrace, Cloudwatch, Datadog, Grafana, Prometheus)Container Orchestration (ECS, Kubernetes, Docker)Negotiation