On-Prem AI Platform for GPU Clusters

Diagnose GPU failures
before they cost you

AI-powered root cause analysis for HPC and GPU clusters. Automated diagnosis, real-time streaming, human-in-the-loop remediation. Runs entirely on your infrastructure.

Download GUI Quick Start

Built for GPU Operations

Everything you need to monitor, diagnose, and remediate GPU cluster issues.

🔍

AI Root Cause Analysis

LangGraph ReAct agent queries Prometheus metrics, Loki logs, and K8s state to find root causes automatically.

📊

Datacenter Topology

Visualize your entire GPU fleet — racks, nodes, GPUs. See health status at a glance with interactive floor plan.

Real-time Streaming

Watch the AI think in real-time. See tool calls, reasoning steps, and results streamed live to the GUI.

🛡

Human-in-the-Loop

Operator approval for high-risk actions. The AI proposes, you approve. Full audit trail for every action.

🔒

On-Prem & Air-Gapped

Runs entirely on your infrastructure. Self-hosted vLLM inference. No data leaves your cluster.

🚀

One-Command Deploy

Single binary installer. Deploy the server, add GPU nodes from the GUI, agents deploy automatically via SSH.

Architecture

User Laptop On-Prem K8s Cluster +-----------+ +----------------------------------+ | OpsPilot |---- :80/:443 ---->| xperf-gateway | | (macOS) | | Nginx → Gateway (FastAPI) | +-----------+ +----------------------------------+ | xperf-data | | Redis · PostgreSQL (pgvector) | +----------------------------------+ | xperf-core | | SmartAnalyzer + Cortex | +----------------------------------+ | xperf-llm | | vLLM · vLLM-Embed · vLLM-guard | +----------------------------------+ | xperf-infra | | Prometheus · Loki · Grafana | +----------------------------------+ GPU Nodes (bare-metal) +----------------------------------------------+ | alloy-agent → pushes metrics + logs | | to Prometheus & Loki in xperf-infra | +----------------------------------------------+

Download

Install the server on your machine, then connect with the GUI.

SERVER (install on your GPU cluster)

🖥

Linux x86_64

Helm chart + Docker images

Download Server

Install: tar xzf metis-server-*.tar.gz && ./metis-server install

GUI (install on your laptop)

🍎

macOS

Apple Silicon & Intel

Download .dmg
🐧

Linux Desktop

x86_64 AppImage

Download .AppImage
🪟

Windows

x86_64 installer

Download .exe