Design
000
Back to Catalog

Private Infrastructure Platform

Infrastructure Cluster

Private server cluster for CI/CD, GitOps, AI tooling, and controlled development deployments

OpenTofuTerragruntProxmox VE

Project Overview

This project is our own infrastructure foundation: not a single-purpose server, but a private server cluster we actively use for CI/CD experiments, controlled development deployments, GitOps rollouts, quality tooling, internal services, and AI-agent workflows. The repository shows a mature split between layers. OpenTofu and Terragrunt model the Proxmox substrate and VM compositions, Ansible handles the last mile of provisioning and hardening, K3s becomes the container platform, FluxCD owns cluster state, and specialized machines cover infrastructure services, Linux and Windows runners, a development sandbox, and a dedicated OpenClaw AI control plane. In practice, this behaves like a compact internal platform engineered to keep experiments reproducible, environments predictable, and platform growth manageable without depending on public-cloud building blocks for every new initiative.

Product Surfaces

platform core

Platform Core

The underlying infrastructure layer that combines Proxmox-backed VM compositions, generated inventory, K3s bootstrap, Flux-managed base applications, internal PKI, ingress, monitoring, storage, and service routing across the private cluster domain.

platform core
Rendering Diagram
runner fleet

CI/CD Runner Fleet

A delivery layer spanning Linux and Kubernetes GitLab runners, a separate Windows runner for Unity/.NET and heavier build scenarios, shared caches, SonarQube integration, and secure CA distribution into runner hosts and pods.

runner fleet
Rendering Diagram
ai control plane

AI Control Plane

A dedicated AI VM that runs OpenClaw as a thin agent-control plane, receives vault-backed runtime configuration, validates model availability against NVIDIA NIM, and stays ready for agent workflows, tool execution, and future local-model experiments.

ai control plane
Rendering Diagram

The Challenge

Turn a private server setup from ad hoc machine management into a reproducible internal platform that spans VMs, Kubernetes, CI/CD runners, internal services, and AI tooling.

Run experimentation, delivery, and platform services on one self-hosted substrate without letting secrets, GPU allocation, runner scope, or service ownership collapse into operational chaos.

Keep the system extensible enough for new workloads and environments while still being deterministic, testable, and maintainable as a real engineering foundation.

The Execution

Designed the root infrastructure layer around OpenTofu and the bpg/proxmox provider, using dedicated VM profiles, feature toggles, GPU hardware mapping, and composition modules that build a K3s cluster plus specialized machines for infra-core, Linux runner, Windows runner, development sandbox, and AI platform.

Used Terragrunt as a DRY multi-environment wrapper and Taskfile as the operational entrypoint for init, plan, apply, deploy, diagnostics, and focused update flows, giving the platform a clear day-to-day operating model instead of scattered shell commands.

Treated Ansible as a modular provisioner rather than a monolithic config dump: hardening, Docker, K3s, internal PKI, GitLab runners, Flutter/Android/Node/Go/PHP toolchains, AI platform deployment, and infrastructure services are all split into focused roles with vault-backed variables and idempotent execution.

Built the cluster and service layer around K3s, FluxCD, Cilium, Traefik, CloudNativePG, External Secrets, and object-store backups so the cluster behaves more like a small internal platform than a collection of manually patched machines.

Added dedicated CI/CD and AI capabilities through Linux and Kubernetes runners, a Windows runner with optional GPU passthrough, SonarQube quality-gate integration, CA propagation into containers and pods, and a thin OpenClaw control plane that verifies its upstream NVIDIA model policy during deployment.

The Outcome

A real self-hosted internal platform that supports CI/CD experiments, internal service hosting, AI-agent workflows, and controlled developer deployments from one maintainable codebase.

A cleaner separation between IaC, provisioning, cluster state, and service operations, which makes the platform materially easier to extend than a conventional private cluster assembled by manual drift.

A long-term foundation for future products and experiments because runners, ingress, databases, secret flows, monitoring, and AI control surfaces are already in place instead of being rebuilt from scratch for every new initiative.

Clear technical direction. Sharper delivery.

The next version of your systems starts here.

Start with the bottleneck, the brief, or the unstable architecture. We will help turn it into a cleaner technical path.