The Role: Site Reliability/DevOps Engineer
Salary: Up to €110,000 per year
Duration: 18 months ANU/AUG contract
Remote: 90%+ (Monthly meeting in Berlin)
Location: Berlin
Language Skills: German & English C1
About the Role
Join a highly skilled team of 10 SREs and DevOps professionals ensuring operational excellence across various projects within a leading organization that plays a crucial role in delivering secure and innovative solutions for government and public services. You will be instrumental in designing, setting up, and maintaining robust, secure, and scalable systems that meet stringent quality and data protection standards.
Your Responsibilities
* Design, implement, and configure logging and monitoring infrastructures.
* Define Service Level Objectives (SLOs) based on customer Service Level Agreements (SLAs) and calculate error budgets.
* Collaborate with software and DevOps engineers to develop modern web application architectures.
* Standardize and automate build, test, and deployment processes with a focus on quality and data protection.
* Set up and manage continuous integration environments.
* Administer and synchronize container registries.
* Set up and manage virtual network infrastructures.
* Introduce and evolve SRE practices across the organization.
* Analyze and resolve system malfunctions through thorough research and documentation.
Your Profile
* Completed studies in IT, natural sciences, mathematics, engineering, or a comparable qualification.
* Experience in cloud environments (e.g., AWS, MS Azure) and operating distributed network infrastructures.
* Proficiency with monitoring, debugging, and performance measurement tools.
* Experience with containers and orchestration tools.
* Strong skills in regular expressions, scripting, and dynamic programming languages (e.g., bash, Python, Ruby, Go).
* Proficiency in distributed versioning systems (e.g., Git, Mercurial).
* Familiarity with SRE methods and culture.
* Systematic, goal-oriented, and independent working style.
* Strong social competence, teamwork, and communication skills.
* Willingness to be on call and work shifts.
Tech Stack
* Kubernetes, Docker
* Terraform, Ansible
* Monitoring: Grafana, Prometheus
This role offers a unique opportunity to work within a team dedicated to maintaining high standards in operational performance and reliability, contributing to critical infrastructure projects that impact millions of users.