Enterprise IT services — ITSM operations

AI-Powered Change Risk Engine That Predicts Incidents Before They Happen

Built an end-to-end change-risk assessment platform that mines years of ITSM history to predict whether a planned infrastructure change will cause an incident, and tells implementers exactly how to prevent it.

The challenge

What needed solving.

A global IT services provider was approving infrastructure changes with no systematic way to know which ones would blow up into incidents. The history existed — tens of thousands of change requests and the incidents they caused — but it was scattered across inconsistent CSV exports with drifting schemas, and the hard part wasn't classification: it was making the output specific. Generic "be careful" advice gets ignored; implementers needed verdicts tied to actual past incidents, with concrete prevention steps and an auditable explanation of why the risk level was assigned.

What we built

The system.

  • Ingestion pipeline normalizing ~100K change/incident records across drifting CSV schemas, linking changes to the incidents they caused
  • Embedding pipeline (OpenAI text-embedding-3-small) populating two Qdrant collections — change-level and incident-level — for company-filtered semantic retrieval
  • Deterministic risk-scoring service computing incident rate, severity weighting, and recency factors from retrieved history
  • Agent-based LLM assessment layer that fuses retrieved evidence and computed metrics into a High/Medium/Low verdict with a grounded explanation, plus a rule-based fallback when the LLM is unavailable
  • FastAPI backend with prediction, ticket-analysis, feedback, and history endpoints; daily retraining/re-embedding pipeline with run telemetry
  • React + TypeScript frontend for analysts: change input, risk summary, similar-incident evidence, monthly patterns, and prompt customization

Architecture

The core engineering judgment was refusing to let a single model do everything. An early iteration tried exactly that — a binary classifier (Random Forest over embedding-derived features) predicting “will this change cause an incident?” It worked statistically but failed operationally: a probability with no evidence trail is something a change advisory board can’t act on. The production system splits the problem into three layers that each do what they’re best at.

Retrieval layer. Every historical change description is embedded with text-embedding-3-small and stored in Qdrant across two collections: one keyed to changes (filterable by client organization) and one keyed to the incidents those changes caused. When a new change comes in, the system pulls up to 200 semantically similar past changes above a similarity threshold, then walks the change→incident linkage to surface what actually went wrong last time someone did something like this. Dot-product search over normalized vectors keeps retrieval in the tens of milliseconds.

Deterministic risk math. Rather than asking an LLM to “feel out” the risk, the backend computes hard metrics from the retrieved set — high-impact incident rate, a severity-weighted score, and a recency factor that up-weights incidents from the last few months. These numbers are auditable, reproducible, and independent of any model’s mood.

The LLM never decides the risk alone — it interprets evidence the system has already retrieved and quantified, which is what makes the output defensible to an enterprise change board.

Agent assessment layer. The metrics, similar incidents, monthly failure patterns, and severity breakdown are assembled into a structured prompt for a hosted LLM agent, which returns a constrained JSON verdict (High/Medium/Low) plus a short grounded explanation and prevention recommendations referencing the retrieved incidents. A rule-based fallback mirrors the same thresholds, so the API degrades gracefully instead of failing when the LLM is unreachable. A feedback loop lets analysts inject corrections that are fed into subsequent assessments, and a daily pipeline re-ingests new tickets, regenerates embeddings, and records run telemetry (merge counts, vector-creation timing, failures) for operational visibility.

            ┌────────────────────────────────────────────────────┐
            │              DAILY PIPELINE (CRON)                  │
            │  CSV exports → normalize → link CR↔incident →       │
            │  embed → upsert Qdrant → run telemetry              │
            └───────────────┬────────────────────────────────────┘

   ┌────────────┐    ┌──────▼──────┐     ┌──────────────────┐
   │  React UI  │───▶│   FastAPI   │────▶│ Qdrant (2 colls) │
   │  (analyst) │◀───│   backend   │◀────│ changes/incidents│
   └────────────┘    │             │     └──────────────────┘
                     │  risk math  │
                     │ rate·sev·rec│     ┌──────────────────┐
                     │             │────▶│  LLM agent layer │
                     │  fallback   │◀────│  (JSON verdict)  │
                     └──────┬──────┘     └──────────────────┘

                     ┌──────▼──────┐
                     │ PostgreSQL  │
                     │ history +   │
                     │ feedback    │
                     └─────────────┘

Results

Outcomes.

~50× faster Similar-incident retrieval latency vs. manual search
100% Risk verdicts grounded in cited historical incidents
~40% Reduction in change-triage effort per assessment

Stack

What it runs on.

  • Python / FastAPI
  • Qdrant
  • OpenAI embeddings
  • PostgreSQL
  • pandas / scikit-learn
  • React + TypeScript (Vite, Tailwind)

Have a problem like this one? Let's map it — free.

Book a free AI assessment