DevOps Support Engineer
Platform support at CELUM, homelab tinkerer, previously shipped an AI assistant at Steelcase.
Experience
CELUM (Contractor via Weasweb) – DevOps Support Engineer
L3 platform operations for a Digital Asset Management platform across Azure/AKS and on-prem. Root-cause isolation using Grafana + centralized logs, structured bug/feature escalation, and operational lifecycle management.
- Troubleshoot service health, configuration, and connectivity across AKS/Kubernetes, Docker, Linux, and Azure SQL.
- Root-cause analysis using Grafana dashboards, centralized logging, and API validation (Swagger/Postman) for escalation decisioning.
- Create structured engineering items: bugs with repro steps, feature requests with acceptance criteria, and support requests with evidence + impact assessment.
- Handle operational requests: access provisioning, environment lifecycle ops, and data/integration triage with SQL + API validation.
Steelcase – Service Desk Analyst L2
Shipped an AI chatbot from zero allocation, led security incident response, and drove process automation across the NA support team.
- Proposed and delivered AI chatbot MVP (RAG-based ServiceNow assistant in Teams): designed dual-agent architecture, ran structured testing rounds, analyzed failure modes (traced to KB gaps, not retrieval logic), and produced rollout plan with automation roadmap (Graph Connector → Power Automate ticket creation).
- Led security incident response: investigated phishing campaigns, stopped a live intrusion attempt, and improved escalation workflows.
- Proposed and delivered process improvements: phishing workflow changes, DL/SG provisioning automation, alert/incident management optimization.
- Daily exposure to enterprise tooling: AD, ServiceNow, SCCM/Intune, Exchange admin, Cisco ISE, Darktrace, Fortimail.
- Participated in monthly on-call rotation for critical incidents.
Electronic Arts – BioWare – QA Tester
Quality assurance for Star Wars: The Old Republic; built testing discipline, issue triage, and crisp reporting practices.
- Supported 3-month release cycles through regression, feature, black-box, A/B, and compliance testing.
- Automated repetitive test setup tasks with batch scripts (hours → minutes).
- Collaborated with distributed teams across Disney, EA, and BioWare.
Projects
tresor homelab
A self-hosted platform run with the same standards I’d want in a small production environment: explicit routing, repeatable service lifecycles, living operational docs, and recovery paths that have been exercised in practice. The live system is a split 2-node stack (tresor + tresor-vps) with a separate KVM QA sandbox.
- Public ingress is intentionally split:
status.raduhhr.xyzgoes through Cloudflare Tunnel -> Traefik, whilemedia.raduhhr.xyz,cloud.raduhhr.xyz, and Minecraft route through the VPS over WireGuard. The stateful node stays private. - The repo is the control plane: 23 Ansible roles / 170 playbooks, consistent per-service lifecycles, and docs that capture deployment order, network flows, and recovery boundaries.
- Networking is explicit rather than accidental:
public_net,internal_net,lan_pub, andwg0. LAN services bind to192.168.0.42, WireGuard-only services to10.66.66.2, and Paper runs on host networking.
tresor-ctl: Python TUI (Rich + Questionary + Paramiko) that auto-discovers services from the playbook tree and turns repo conventions into one-screen ops over SSH.- Backups are layered: local snapshots, private GitHub, and selected Cloudflare R2 offsite copies with matching
restore-r2.ymlplaybooks. The important restore paths have been tested. - Services include Jellyfin, FileBrowser, Grafana + Prometheus + Node Exporter + cAdvisor, Uptime Kuma with a repo-synced public status page, Kiwix, PaperMC, and three Discord bots.
nursing pas cu pas platform
Took over a rough handoff and turned it into something that feels maintainable: clearer repo boundaries, a sane path into QA, a proper image pipeline, and enough cleanup in the app itself that it reads like a real platform instead of a stitched-together demo.
- Pulled the work apart into frontend, backend, and infra repos so ownership was obvious and deployment concerns stopped leaking through the app.
- Defined the runtime layout: Angular on the public side, NestJS behind
/api, Redis for cached quiz state, and Postgres as the source of truth. - Replaced the old image handling path with signed uploads plus CDN delivery, so question media stopped bloating the normal request flow.
- Wired app changes into a repeatable QA release path through GitHub Actions and Ansible instead of manual VPS updates.
- Helped steady the product itself too: question and category management, image-backed quizzes, Romanian copy, and cleaner theme behavior.
- Captured the architecture, delivery flow, and sanitized examples so the platform could be understood without tribal knowledge.
service desk ai chatbot
RAG-based assistant for Steelcase's service desk team to surface ServiceNow documentation directly in Teams. Built the MVP end-to-end with zero dedicated allocation, from architecture to testing to rollout planning.
- Dual-agent architecture: one bot for end users, one for the service desk team, both backed by the same knowledge source.
- Ran structured testing rounds with the team and validated answer consistency across both agents.
- Analyzed response gaps: most failures traced to missing or outdated KB articles, not retrieval logic. Documented remediation list.
- KB ingestion via manual PDF export from ServiceNow into SharePoint (documented automation path via Graph Connector).
- Designed phased roadmap: Azure AI Search indexing → Power Automate ticket creation from chat → auto-sync when new KBs are published.
- Produced full demo deck and rollout documentation for leadership review.
other selected projects
portfolio website (this site)
End-to-end static site build: domain registration, DNS config, responsive frontend, CF Pages deployment, Worker backend, Turnstile integration, and AWS SES in sandbox with verified addresses.
repobatch-yt-downloader
Bash script using yt-dlp to mirror YouTube "Liked Videos" playlists into local MP3 files with thumbnails and metadata. Music synced to Jellyfin for streaming across devices.
repolife-scheduler
Trello-based automation combining Butler rules with a Python script (run via GitHub Actions) to keep daily rituals and recurring tasks self-managing.
repoemail spam detection
End-to-end KDD pipeline on classic spam datasets: cleaning, feature engineering, model training and evaluation (LR, RF, GB).
repoEducation & Learning Path
Informatics & Economics (FSEGA)
Prioritized full-time roles at EA and Steelcase.
Contact
Open to platform, DevOps, and infrastructure roles. Drop a message or reach out directly.