Job ref no.: CT3127248-01#9129

Cloud Engineer ( Site Reliability)

IT Search ( Recruitment Firm )

Position : Cloud Engineer (Site Reliability)

Client : global corporation in telco industry

duties:

The role will be responsible for the reliability and performance of production applications and services and for implementing tooling and automation into operations functions to improve service availability, scalability, and performance.

Works with teams across Digital IT, assists them to build robust, scalable services, improves quality, and accelerate delivery. Key responsibilities include:

  • Ensuring applications and services scale and perform consistently and reliably.
  • Maintaining production services by measuring and monitoring availability, latency and overall system health.
  • Responding to incidents and providing 24x7 standby support for those services in the SRE portfolio.
  • Working closely with software development teams across Digital IT to analyze system and application metrics and devise strategies to enhance reliability, scalability and performance.
  • Automating operational functions to increase scalability and speed of operational processes.

This may include the following:

  • Monitoring: Azure Monitoring, AWS CloudWatch, PRTG , Scienve Logic and datadog
  • Fine-tuning: Continuously review the resource usage, monitoring thresholds, performance tuning. Balance site performance and reliability with well-defined service level objectives.
  • Automation: Create sustainable systems and services through automation and uplift. Implement automations of routine jobs to increase scalability and speed of operational processes, save operational costs, and avoid human mistakes.

Your responsibilities will include:

Expected Skills The successful candidate will have experience developing microservices in multiple languages and on multiple cloud platforms. They will be familiar with API development and lifecycle management, test-driven development and automated testing, continuous integration and delivery, packaging, containerization, and infrastructure automation. The candidate must be highly analytical, detail oriented and have a demonstrated ability to work with complex logic and other people’s code. They must demonstrate a mindset of automation, and advanced troubleshooting skills. The candidate will be familiar with containers, and sidecar and proxy service edges. The candidate will be familiar with cloud based observation, monitoring, and logging platforms. The successful candidate will have familiarity with a number of the following tools: Git, GitLab, Jenkins, Bamboo, Nexus, Selenium, Ansible, Capistrano, Consul, Terraform, Pact, SOAP-UI, Junit, JIRA, Rally, Docker, Kubernetes, Istio, Cucumber, Honeycomb, Envoy, Kabana, Logstash, Splunk, Honeycomb, New Relic, and others. The candidate will need to be able to develop close working relationships with developers and operations support staff across the organization, and mentor them in how to develop and maintain robust cloud native services.
Candidate Requirement :

  • Degree
  • Azure and AWS certifications on system administrations (i.e. AZ103 or AZ104 MS Azure Administrator, AWS Associate or above)
  • 3+ years of design and implementation experience on Azure and AWS
  • 3+ years of experience as a software developer, has written codes on auto-scaling, geographically diverse microservices.
  • Solid programming and scripting skills relating to automation in multiple languages.
  • 5+ years’ experience as a system administrator, preferably on Linux (cloud or on prem)
  • 5+ years on supporting business critical services in production.
  • You will have experience as part of a cloud scale operational support team.
  • Able to set up proactive processes and approach to spotting problems, areas for improvement, and performance bottlenecks
  • Good understanding of modern development approaches and architectures like DevOps, CI/CD, APIs, Micro-services, Dockers, Containers, Kubernetes
  • Familiar with a number of the following tools: Git, GitLab, Jenkins, Bamboo, Nexus, Selenium, Ansible, Capistrano, Consul, Terraform, Pact, SOAP-UI, Junit, JIRA, Rally, Docker, Kubernetes, Istio, Cucumber, Honeycomb, Envoy, Kabana, Logstash, Splunk, Honeycomb, New Relic, and others.
  • Standby support during non-office hour is required

The data collected are only for recruitment purpose. For job enquiry, please email by clicking Apply Now

All applications applied through our system will be delivered directly to the advertiser and privacy of personal data of the applicant will be ensured with security.

More job information
Job ref no. CT3127248-01#9129
Salary
  • 35,000 - 55,000 / month
Job Function
Industry
Employment Term
  • Permanent
  • Full-time
Experience
  • 0 year - 5 years
Career Level
  • Non-management level
Education
  • N/A