Lead Site Reliability Engineer
Company: JPMorgan Chase & Co.
Location: Jersey City
Posted on: April 1, 2026
|
|
|
Job Description:
Description Assume a critical role in defining the future of a
globally recognized firm and have a direct and significant effect
in a realm tailored for top achievers in site reliability. As a
Lead Site Reliability Engineer at JPMorgan Chase within Employee
Platforms, you hold a leadership role in your team, demonstrate
strong knowledge across multiple technical domains, and advise
others on the technical and business issues facing them. Take lead
and conduct resiliency design reviews, break up complex problems
into digestible work for other engineers, act as a technical lead
for medium to large-sized products, and provide advice and
mentoring to other engineers. Job responsibilities Demonstrates and
champions site reliability culture and practices and exerts
technical influence throughout your team Leads initiatives to
improve the reliability and stability of your team’s applications
and platforms using data-driven analytics to improve service levels
Collaborates with team members to identify comprehensive service
level indicators and stakeholders to establish reasonable service
level objectives and error budgets with customers Demonstrates a
high level of technical expertise within one or more technical
domains and proactively identifies and solves technology-related
bottlenecks in your areas of expertise Acts as the main point of
contact during major incidents for your application and
demonstrates the skills to identify and solve issues quickly to
avoid financial losses Documents and shares knowledge within your
organization via internal forums and communities of practice
Required qualifications, capabilities, and skills Formal training
or certification on site reliability engineering concepts and 5
years of applied experience Deep proficiency in reliability,
scalability, performance, security, enterprise system architecture,
toil reduction, and other site reliability best practices with the
ability to implement these practices within an application or
platform Fluency in at least one programming language such as
(e.g., Python, Java Spring Boot, .Net, etc.) Deep knowledge of
software applications and technical processes with emerging depth
in one or more technical disciplines Proficiency and experience in
observability such as white and black box monitoring, SLO alerting,
and telemetry collection using tools such as Grafana, Dynatrace,
Prometheus, Datadog, Splunk, etc. Proficiency in continuous
integration and continuous delivery tools (e.g., Jenkins, GitLab,
Terraform, etc.) Experience with container and container
orchestration (e.g., ECS, Kubernetes, Docker, etc.) Experience with
troubleshooting common networking technologies and issues Ability
to identify and solve problems related to complex data structures
and algorithms Drive to self-educate and evaluate new technology
Preferred qualifications, capabilities, and skills Experienced in
optimizing Microsoft 365 infrastructure, including SharePoint
Online, Exchange Online, OneDrive, and Teams, with expertise in
using Splunk and Azure to facilitate the migration from monolithic
to distributed services. Experience with cloud infrastructure
management. Demonstrated achievements with automation of
operational excellence; especially pro-active monitoring. Ability
to teach new programming languages to team members Ability to
expand and collaborate across different levels and stakeholder
groups LI-ID1
Keywords: JPMorgan Chase & Co., Yonkers , Lead Site Reliability Engineer, IT / Software / Systems , Jersey City, New York