Skip to content

DevOps Support Engineer

Platform support at CELUM, homelab tinkerer, previously shipped an AI assistant at Steelcase.

Experience

CELUM (Contractor via Weasweb) – DevOps Support Engineer

dec 2025 – present · remote

L3 platform operations for a Digital Asset Management platform across Azure/AKS and on-prem. Root-cause isolation using Grafana + centralized logs, structured bug/feature escalation, and operational lifecycle management.

  • Troubleshoot service health, configuration, and connectivity across AKS/Kubernetes, Docker, Linux, and Azure SQL.
  • Root-cause analysis using Grafana dashboards, centralized logging, and API validation (Swagger/Postman) for escalation decisioning.
  • Create structured engineering items: bugs with repro steps, feature requests with acceptance criteria, and support requests with evidence + impact assessment.
  • Handle operational requests: access provisioning, environment lifecycle ops, and data/integration triage with SQL + API validation.

Steelcase – Service Desk Analyst L2

feb 2024 – dec 2025 · ~2 yrs · cluj-napoca (hybrid)

Shipped an AI chatbot from zero allocation, led security incident response, and drove process automation across the NA support team.

  • Proposed and delivered AI chatbot MVP (RAG-based ServiceNow assistant in Teams): designed dual-agent architecture, ran structured testing rounds, analyzed failure modes (traced to KB gaps, not retrieval logic), and produced rollout plan with automation roadmap (Graph Connector → Power Automate ticket creation).
  • Led security incident response: investigated phishing campaigns, stopped a live intrusion attempt, and improved escalation workflows.
  • Proposed and delivered process improvements: phishing workflow changes, DL/SG provisioning automation, alert/incident management optimization.
  • Daily exposure to enterprise tooling: AD, ServiceNow, SCCM/Intune, Exchange admin, Cisco ISE, Darktrace, Fortimail.
  • Participated in monthly on-call rotation for critical incidents.

Electronic Arts – BioWare – QA Tester

oct 2022 – sept 2023 · ~1 yr · remote

Quality assurance for Star Wars: The Old Republic; built testing discipline, issue triage, and crisp reporting practices.

  • Supported 3-month release cycles through regression, feature, black-box, A/B, and compliance testing.
  • Automated repetitive test setup tasks with batch scripts (hours → minutes).
  • Collaborated with distributed teams across Disney, EA, and BioWare.

Projects

tresor homelab

A self-hosted platform run with the same standards I’d want in a small production environment: explicit routing, repeatable service lifecycles, living operational docs, and recovery paths that have been exercised in practice. The live system is a split 2-node stack (tresor + tresor-vps) with a separate KVM QA sandbox.

  • Public ingress is intentionally split: status.raduhhr.xyz goes through Cloudflare Tunnel -> Traefik, while media.raduhhr.xyz, cloud.raduhhr.xyz, and Minecraft route through the VPS over WireGuard. The stateful node stays private.
  • The repo is the control plane: 23 Ansible roles / 170 playbooks, consistent per-service lifecycles, and docs that capture deployment order, network flows, and recovery boundaries.
  • Networking is explicit rather than accidental: public_net, internal_net, lan_pub, and wg0. LAN services bind to 192.168.0.42, WireGuard-only services to 10.66.66.2, and Paper runs on host networking.
  • tresor-ctl: Python TUI (Rich + Questionary + Paramiko) that auto-discovers services from the playbook tree and turns repo conventions into one-screen ops over SSH.
  • Backups are layered: local snapshots, private GitHub, and selected Cloudflare R2 offsite copies with matching restore-r2.yml playbooks. The important restore paths have been tested.
  • Services include Jellyfin, FileBrowser, Grafana + Prometheus + Node Exporter + cAdvisor, Uptime Kuma with a repo-synced public status page, Kiwix, PaperMC, and three Discord bots.
tresor-ctl TUI showing all services with version, state, uptime, memory and CPU across both hosts
tresor-ctl // service dashboard
Grafana host dashboard
grafana // host dashboard
Grafana containers dashboard
grafana // containers dashboard
ansible docker wireguard traefik cloudflare tunnel grafana prometheus debian ufw/fail2ban python
repo

nursing pas cu pas platform

Took over a rough handoff and turned it into something that feels maintainable: clearer repo boundaries, a sane path into QA, a proper image pipeline, and enough cleanup in the app itself that it reads like a real platform instead of a stitched-together demo.

  • Pulled the work apart into frontend, backend, and infra repos so ownership was obvious and deployment concerns stopped leaking through the app.
  • Defined the runtime layout: Angular on the public side, NestJS behind /api, Redis for cached quiz state, and Postgres as the source of truth.
  • Replaced the old image handling path with signed uploads plus CDN delivery, so question media stopped bloating the normal request flow.
  • Wired app changes into a repeatable QA release path through GitHub Actions and Ansible instead of manual VPS updates.
  • Helped steady the product itself too: question and category management, image-backed quizzes, Romanian copy, and cleaner theme behavior.
  • Captured the architecture, delivery flow, and sanitized examples so the platform could be understood without tribal knowledge.
Nursing Pas cu Pas architecture, light theme Simplified light-mode platform diagram for the Nursing Pas cu Pas project, showing the QA release path and the core runtime pieces. nursing pas cu pas // platform snapshot To QA frontend angular app backend nestjs api infra repo env + deploy logic github actions build + staged release qa server shared validation env At Runtime edge the public entry point caddy qa.raduhhr.xyz serves app + forwards /api app the parts users actually hit angular quiz + admin nestjs auth + api same host, split roles state + media what keeps the app useful postgres main db redis quiz cache r2 images cdn delivery Nursing Pas cu Pas architecture, dark theme Simplified dark-mode platform diagram for the Nursing Pas cu Pas project, showing the QA release path and the core runtime pieces. nursing pas cu pas // platform snapshot To QA frontend angular app backend nestjs api infra repo env + deploy logic github actions build + staged release qa server shared validation env At Runtime edge the public entry point caddy qa.raduhhr.xyz serves app + forwards /api app the parts users actually hit angular quiz + admin nestjs auth + api same host, split roles state + media what keeps the app useful postgres main db redis quiz cache r2 images cdn delivery
diagram // release path and core runtime pieces
angular nestjs caddy cloudflare r2 cdn docker compose postgres redis ansible github actions
repo

service desk ai chatbot

RAG-based assistant for Steelcase's service desk team to surface ServiceNow documentation directly in Teams. Built the MVP end-to-end with zero dedicated allocation, from architecture to testing to rollout planning.

  • Dual-agent architecture: one bot for end users, one for the service desk team, both backed by the same knowledge source.
  • Ran structured testing rounds with the team and validated answer consistency across both agents.
  • Analyzed response gaps: most failures traced to missing or outdated KB articles, not retrieval logic. Documented remediation list.
  • KB ingestion via manual PDF export from ServiceNow into SharePoint (documented automation path via Graph Connector).
  • Designed phased roadmap: Azure AI Search indexing → Power Automate ticket creation from chat → auto-sync when new KBs are published.
  • Produced full demo deck and rollout documentation for leadership review.
rag copilot studio sharepoint teams azure ai search power automate
production

other selected projects

portfolio website (this site)

End-to-end static site build: domain registration, DNS config, responsive frontend, CF Pages deployment, Worker backend, Turnstile integration, and AWS SES in sandbox with verified addresses.

cloudflare pages workers turnstile aws ses dns responsive design
repo

batch-yt-downloader

Bash script using yt-dlp to mirror YouTube "Liked Videos" playlists into local MP3 files with thumbnails and metadata. Music synced to Jellyfin for streaming across devices.

bash yt-dlp jellyfin automation scripting
repo

life-scheduler

Trello-based automation combining Butler rules with a Python script (run via GitHub Actions) to keep daily rituals and recurring tasks self-managing.

python github actions trello api automation
repo

email spam detection

End-to-end KDD pipeline on classic spam datasets: cleaning, feature engineering, model training and evaluation (LR, RF, GB).

python scikit-learn pandas email kdd
repo

Education & Learning Path

Informatics & Economics (FSEGA)

2021 – 2024 · no degree

Prioritized full-time roles at EA and Steelcase.

Contact

Open to platform, DevOps, and infrastructure roles. Drop a message or reach out directly.

0 / 2000
sending...