Intern - Site Reliability Engineer

Workato

Workato

Software Engineering
Singapore
Posted on Saturday, September 9, 2023

About Workato

Workato is the only integration and automation platform that is as simple as it is powerful — and because it’s built to power the largest enterprises, it is quite powerful.

Simultaneously, it’s a low-code/no-code platform. This empowers any user (dev/non-dev) to painlessly automate workflows across any apps and databases.

We’re proud to be named a leader by both Forrester and Gartner and trusted by 7,000+ of the world's top brands such as Box, Grab, Slack, and more. But what is most exciting is that this is only the beginning.

Why join us?

Ultimately, Workato believes in fostering a flexible, trust-oriented culture that empowers everyone to take full ownership of their roles. We are driven by innovation and looking for team players who want to actively build our company.

But, we also believe in balancing productivity with self-care. That’s why we offer all of our employees a vibrant and dynamic work environment along with a multitude of benefits they can enjoy inside and outside of their work lives.

If this sounds right up your alley, please submit an application. We look forward to getting to know you!

Also, feel free to check out why:

  • Business Insider named us an “enterprise startup to bet your career on”

  • Forbes’ Cloud 100 recognized us as one of the top 100 private cloud companies in the world

  • Deloitte Tech Fast 500 ranked us as the 17th fastest growing tech company in the Bay Area, and 96th in North America

  • Quartz ranked us the #1 best company for remote workers

Responsibilities

If you’re looking for a real challenge in terms of mission criticality, multi-geographic region deployments, diversity of managed services, and the chance to work with cutting-edge technologies like Kubernetes, Kafka, Serverless, ArgoCD and more, then this might be the position for you! We are looking for an Intern - Site Reliability Engineer (6 months Internship)

In this role, you will be responsible for:

  • Monitoring end-to-end availability and performance of critical services

  • Independently troubleshoot complex issues and events affecting the entire platform and application stack as well as the infrastructure used to manage and deliver the product

  • Supporting production platform and application-related events and incidents

  • Perform root cause analysis on issues, and participate in blameless post-mortems so we can learn from incidents and automate them out of recurrence

  • Work with engineering teams to better address needs and enable more effective and efficient developer throughput

  • Identify performance bottlenecks and triage with Engineering teams to design and implement a secure and performant solution

Daily and Monthly Responsibilities

  • Oncall activity including the validation and investigation, development requests processing. Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault-finding

  • Partner with development teams to improve services through rigorous testing and release procedures

  • Create automation to reduce manual activities and tasks

Expectations

We are hiring an engineer who will help us with day-to-day product-related operations like Incident management, improving applications-related monitoring and alerting, reacting proactively on alerts as a part of the on-call and driving the incidents. This role requires day-to-day work with dev and infrastructure teams in order to make the product robust and rock solid by continuously improving quality through visibility and proper incident management.

Requirements

Qualifications / Experience / Technical Skills

  • Bachelor's Degree in Computer Science, Information Systems, or a related field. (We are open to 3rd and 4th years students as well)

  • Knowledge of administering Kubernetes-based microservices, ingress controllers, web servers (nginx), and databases (Postgres, MySql, MongoDB; Desirable - Redis, Clickhouse)

  • Ability to program(basic skills ) with one or more high-level languages, such as Python, Java, C/C++, Ruby, and JavaScript.

  • Experience with AWS technologies such as EKS, ELB, RDS, S3, VPC

  • Strong troubleshooting experience in the realm of networking fundamentals, web applications, and DNS

  • Hands-on experience creating automation automation using scripting languages and/or code

  • Knowledge of working with modern CI/CD tools such as ArgoCD, GitHub Actions, orsimilar solutions

  • Experience with Infrastructure as Code tools (e.g. Terraform, CloudFormation)

  • Experience with observability, logging, tracing tools like Grafana, Prometheus, Loki, elasticsearch, cloud watch, jagger

  • Knowledge about operating high-traffic production environments in public clouds: AWS, GCP, or Azure

  • Programming experience in production environments is enough to create the automation and/or simple services

  • Experience with modern cloud environments: containerization, infrastructure-as-code, DevOps, CI/CD pipelines and general automation

  • Hands-on experience with network security, databases systems and related tools

  • Operating Kubernetes clusters and a good understanding of Kubernetes moving parts basics

  • Experience performing stress-testing, failure analysis, and load-testing apps