Posted on 2021-10-08

System Reliability Engineer, Associate

Morgan Stanley

About Us:

Morgan Stanley is a global financial services firm helping governments, corporations, institutions and individuals around the world to achieve their financial goals. Morgan Stanley is committed to maintaining first-class service that has defined the firm. At its foundation are five core values (putting clients first, doing the right thing, leading with exceptional ideas, commit to diversity and inclusion, and giving back) that guides its 55,000+ employees in over 1,200 offices across 43 countries.
On a daily basis, we process hundreds of millions of transactions and we serve more than a trillion dollars of assets across every global market. If this scale resonates with you, come join us.
Click this link to experience life in Morgan Stanley:

Systems Reliability Engineering (SRE) is a discipline focused on improving system service availability, observability, scalability, performance, and resilience across Morgan Stanley by applying sound software engineering principles and adopting the latest technology and tooling.
We are growing SRE capabilities within our Reliability & Production Engineering (RPE) organization as part of the transformation of Morgan Stanley's Technology.

We would like to talk to you if you:
  • Are interested in distributed systems and working with highly scalable and reliable services.
  • Like to work in a fast-moving environment and you aren't afraid to change things to make them better.
  • Enjoy new technological challenges and solving hard problems.
  • Believe a team working well together is smarter than the single smartest person on that team.
  • Aspire to grow as a person, as a teammate, and as an engineer.
  • Have grit, drive and a deep sense of ownership.
Your responsibilities will include, but not be limited to:
  • Working closely with engineering/development teams to design, build, and maintain systems
  • Troubleshooting issues across the entire technology stack: hardware, software, application, and network.
  • Identifying and driving opportunities to improve automation for our platforms; scope and create automation for deployment, management, and visibility of our services.
  • Proactively identifying and addressing systems reliability risks
  • Working alongside existing global and regional team members on a follow-the-sun basis.
Represent the RPE organization in design reviews and operational readiness exercises for new and existing services.


Successful candidates have often had some or all of the following:
  • Demonstrated ability to troubleshoot problems and debug to identify root cause
  • Linux/Unix (desired RHCE or equivalent), network protocols, storage infrastructure
  • Hands on experience on enterprise tools such as AppDynamics, Grafana, Splunk, Dynatrace
  • Experience with Ansible, GitHub or any automation/configuration/release management tools
  • Automation-related experience is particularly valued using scripting languages such as python, bash, perl, ruby. One higher level language is desired.
  • Awareness of, and ability to reason about modern software and systems architectures, including load-balancing, databases, queueing, caching, distributed systems failure modes, micro services, Cloud, etc.
  • Practical experience running large scale systems is an advantage

Interested parties please click Apply Now to apply job.

All applications applied through our system will be delivered directly to the advertiser and privacy of personal data of the applicant will be ensured with security.

More job information
Job Function
Employment Term
  • Permanent
  • Full-time
Career Level
  • Non-management level
  • Degree