Skip to main content
backAll Roles

Senior DevOps Engineer

Job Summary

The Senior DevOps Engineer serves as a critical execution layer for Mercans’ AI-native infrastructure strategy and broader product/engineering delivery and automation. Either based in Tartu, Estonia or Remote, reporting directly to the CTO, and collaborating with Product Managers, Engineering Managers, software engineers, data scientists, and SRE teams, this role operationalizes the  technical vision through hands-on automation, GitLab DevSecOps pipelines, and resilient platform operations.

The position focuses on building and maintaining a secure, cost-efficient private cloud environment capable of hyperscale payroll processing and proprietary AI model training/inference, while providing deployment automation, feature flagging, and release orchestration to accelerate Product and Engineering team velocity.

Duties and Responsibilities:

Platform Automation & CI/CD

  • Support Product and Engineering Teams by implementing and maintaining GitLab CI/CD pipelines that enable rapid feature delivery, A/B testing, feature flagging, and blue-green deployments across all product lines, enforcing architectural standards, security controls, and AI-first engineering patterns defined by the Enterprise Architecture Board.
  • Provide deployment automation for Product releases with GitLab CI/CD pipelines including shift-left security (SAST, DAST, dependency scanning, container scanning, IaC scanning, secret detection) integrated into merge requests, environment promotion gates, and production deployment approvals.
  • Enable Engineering velocity through self-service deployment templates, environment provisioning APIs, and GitLab pipeline libraries that reduce cognitive load for application teams building payroll/HR features.
  • Automate Product experimentation with GitLab Feature Flags, progressive delivery patterns, and canary releases to enable Product Teams to test hypotheses with minimal deployment risk.

Private Cloud Operations

  • Automate infrastructure provisioning for the private cloud (Kubernetes, HCI, GPU nodes, storage) using Infrastructure as Code in line with the AI Cloud reference architecture, scanning Terraform/Kubernetes manifests for misconfigurations via GitLab.
  • Operate and optimize GPU-enabled Kubernetes clusters, including bin-packing, autoscaling, and fractional GPU scheduling to support AI training and inference workloads efficiently, with GitLab runtime security policies and container image scanning for CVEs.

Observability & Resilience

  • Implement observability (logging, metrics, tracing) and SRE practices to contribute toward the 99.999% availability target and active-active multi-datacenter strategy for core payroll and AI services, leveraging GitLab security dashboards for vulnerability tracking and remediation.
  • Identify operational issues, implement fixes and performance improvements, and contribute to chaos engineering and resilience drills to build an anti-fragile engineering culture, with GitLab conditional pipelines for secure testing and deployment.

Security & Compliance

  • Ensure systems are safe and secure against cybersecurity threats by embedding GitLab security policies into pipelines, managing secrets with detection scans, enforcing role-based access control (RBAC), and achieving policy compliance through MR approvals and dashboards.
  • Work closely with Product Managers, software engineers, data scientists, and MLOps teams to standardize release processes for AI models and product features, reduce lead time to production, and integrate with model registries, compliance checks, and feature management platforms using GitLab’s end-to-end DevSecOps workflows.

Documentation & Knowledge Transfer

  • Produce high-quality documentation for runbooks, deployment procedures, GitLab pipeline templates, and platform standards, and contribute to internal Centers of Excellence for SRE and AI Engineering, including GitLab security best practices training.

Skills and experience

  • 4–6+ years of experience as a DevOps / SRE / Platform Engineer operating production‑grade Kubernetes‑based systems and CI/CD pipelines.
  • Hands‑on experience with private cloud or on‑prem Kubernetes (e.g., CAPI‑based clusters, HCI) and automation tools (Terraform/Ansible or equivalents).
  • Experience running containerized workloads with GPUs, including familiarity with scheduling, resource quotas, and performance tuning for AI/ML workloads.
  • Strong automation skills and programming ability in at least one language (e.g., Python, Go, or similar) for scripting, integrations, and tooling.
  • Good understanding of observability stacks, incident management, and SRE practices (SLIs/SLOs, error budgets, postmortems).
  • Knowledge of secure software delivery practices, secrets management, and compliance‑aware deployment in regulated or data‑sensitive environments.
  • Proficiency with GitLab DevSecOps: Configuring .gitlab-ci.yml templates for SAST/DAST/dependency/container/IaC scanning, security dashboards, RBAC, policy enforcement, feature flags, and progressive delivery in CI/CD pipelines.
  • Experience enabling Product/Engineering teams with self-service deployment platforms, GitOps workflows, and golden deployment paths that balance velocity and safety.
  • Experience with Agile teams and collaborative ways of working across Product, development, architecture, and data/AI functions.
  • Strong documentation, time‑management, and communication skills in English, with readiness to take initiative and shape DevOps practices from the ground up in alignment with architectural guidelines.

Performance Goals:

CI/CD reliability and speed

  • Specific: Design and standardize GitLab CI/CD pipelines for core payroll, AI services, and Product feature releases, including automated security testing (SAST, DAST, scanning) and deployment approvals.
  • Measurable: Achieve a pipeline success rate of at least 98% and reduce median lead time from commit to production to under 2 hours for target services and product features.
  • Achievable: Leverage GitLab security templates and collaborate with Product, development, and QA teams to streamline stages and remove manual bottlenecks.
  • Relevant: Directly supports AI‑first engineering standards and Product velocity for faster time‑to‑market for new features and AI models.
  • Time‑bound: Target achieved by end of Q3 2026.

Infrastructure cost and utilization optimization

  • Specific: Implement bin‑packing strategies, right‑size workloads, and refine Kubernetes scheduling for CPU, memory, and GPU resources in the private cloud.
  • Measurable: Contribute to a 15% reduction in infrastructure cost per payslip and increase average GPU and node utilization to at least 70% on production clusters.
  • Achievable: Use monitoring data and autoscaling capabilities; coordinate with architecture on capacity planning and hardware lifecycle.
  • Relevant: Supports broader COGS reduction and maximizes ROI on AI hardware investments.
  • Time‑bound: Target achieved by end of Q4 2026.

Platform resilience and incident reduction

  • Specific: Implement SRE practices, incident runbooks, and active‑active‑aware deployment patterns for critical payroll, AI, and Product services.
  • Measurable: Help reach 99.99%+ availability for owned services on the path to five nines for the core engine, and reduce high‑severity incidents (Sev‑1 and Sev‑2) by 30% year‑over‑year.
  • Achievable: Introduce improved alerting, standardized playbooks, and participate in chaos drills and postmortems to address systemic issues.
  • Relevant: Aligned with the strategic goal of enterprise‑grade resilience for Tier‑1 clients.
  • Time‑bound: Measured over the 12‑month period following the hire date.

MLOps and Product deployment velocity

  • Specific: Integrate GitLab CI/CD pipelines with the model registry, compliance checks, and Product feature management, enabling automated deployment of AI models and product releases to the private cloud.
  • Measurable: Reduce the lead time for deploying updated AI models and Product features from weeks to less than 24 hours for prioritized use cases, with zero non‑approved deployments.
  • Achievable: Build pipeline templates for AI workloads and Product releases and collaborate with data science, Product, and GRC teams.
  • Relevant: Supports the organization’s target of advanced MLOps maturity and Product velocity with safe AI and feature adoption at scale.
  • Time‑bound: Initial target achieved by Q4 2026, with continuous improvement thereafter.

Operational excellence and knowledge sharing

  • Specific: Create and maintain platform documentation, runbooks, and internal knowledge sessions focused on private cloud, GitLab DevSecOps, CI/CD, Product deployment patterns, and AI infrastructure operations.
  • Measurable: Publish at least 10 high‑quality runbooks or platform guides and lead a minimum of 6 internal technical sessions or deep‑dives per year.
  • Achievable: Integrate documentation and knowledge‑sharing into incident resolution, new feature rollout, and architectural change activities.
  • Relevant: Strengthens internal Centers of Excellence and supports talent density and mentorship objectives.
  • Time‑bound: Targets measured on an annual basis, with the first cycle ending Q2 2026.

Apply now

    Upload your Resume/CV (Max size 3 MB • Accepts .pdf, .doc, .docx)


    By submitting this form, you agree to Mercans – General Privacy Policy and GDPR.

    If you prefer to apply directly, kindly email your resume to [email protected], ensuring to specify the job title in the subject line as "Vacancy: Job Title you want to apply for".

    Disclaimer

    Mercans collects and processes personal data in accordance with applicable data protection laws. If you are a European Job Applicant see the privacy notice for further details. Mercans does not discriminate on the basis of race, sex, color, religion, age, national origin, marital status, disability, veteran status, genetic information, sexual orientation, gender identity or any other reason prohibited by law in provision of employment opportunities and benefits.