The Site Reliability Engineer (SRE) position requires a mix of strategic engineering and design along with hands-on, technical work.
A successful candidate will have experience in being a Systems Administrator that has moved on to DevOps/Automation in their career.
The SRE will conﬁgure, tune, and troubleshoot multi-tiered systems to achieve optimal application performance, stability and availability.
The SRE will work closely with the systems engineers, network engineers, database administrators, monitoring team, and information security team. For this position, strict application security and high availability requirements must be balanced to achieve optimal solutions.
- Support development team in handling production environments, releases and pipeline execution
- Identify requirements of new features, and propose design and solution
- Implement features in suitable programming language
- Take ownership of delivering features and improvements on time
- Able to wear multiple hats, do what it takes ability and attitude
- Excellent analytical and problem solving skills
- Excellent oral and written communication skills.
5+ years of managing services in a large scale *nix environment.
- Strong hands-on knowledge in Unix/Linux environments.
- A systematic, test-and-measure approach to continually improving service operations.
- Understanding of standard networking protocols and components such as: HTTP, DNS, TCP/IP, ICMP and Load Balancing.
- Hands-on experience from git, automation and continuous integration.
- Experience with conﬁg management / infrastructure as code.
- Practical knowledge of shell scripting and at least one programming language.
Extra Merit Qualiﬁcations
- Production experience using Docker and Kubernetes, Mesos or other orchestrator.
- Knowledge about pipeline/workﬂow technologies.
- Working knowledge Kafka and Splunk.
- Good understanding of the Java Virtual Machine.
- Java/Scala programming skills
6 months +
Min. 5 years of professional IT experience.