TELCO + AI

From traceroute to routing loop: what "agentic NOC" actually means

An AI that answers questions is not a NOC engineer. An AI that investigates, forms hypotheses and proposes fixes — with confirmation — is. The difference, told through a real case.

15 May 2026·8 min read

In the last two years we have seen a wave of chatbots and "copilots" that answer questions about routers. They work well for the easy ones: "how do I enable DHCP server on MikroTik?". But a Network Operations Center does not work that way. It does not consult manuals: it investigates. It opens tabs, runs traceroute, checks routes, compares the current state with yesterday's, builds hypotheses, verifies them. Only then does it propose a change.

ARIA — the evoseed product we call AI NOC Engineer — is built exactly like that. It is not a Q&A pulling answers from documentation. It is an agent that uses tools, reasons in iterations, and follows a strict protocol on what to do for reads and what to do for writes. Let us tell the difference with a real case, one of those we recorded during phase 4 training, on real CHR devices.

The scenario: "I don't understand why this link is slow"

The prompt was trivial: "device aria-cpe1 has a slow link to the provider, try to figure out why". It is exactly the request a customer fires at the NOC at 02:30 in the morning. The difference between a copilot and an agent shows up here. The copilot tells you: "check the routing table, check the firewall, do a traceroute". The agent does it. And keeps going until it has an answer.

What ARIA did, step by step

Ran a traceroute towards an external host (1.1.1.1). Direct tool calling, no confirmation request — we are in read-only mode.
Read the output. Hop 1 → 10.0.0.1 (internal gateway). Hop 2 → timeout. Hop 3 → 10.0.0.1 again. "I see a loop", it noted in its reasoning chain.
Formed a hypothesis: "there might be a dual default route. Let me check active routes." Called the `inspect_device` tool with `/ip route print where dst-address=0.0.0.0/0`.
Read the result: two default routes, both via 10.0.0.1, with different distance. Only one should be active.
Diagnosed: redundant route not removed after a change. The second one catches return traffic and bounces it to the gateway, creating the loop.
Proposed the fix in plain language before the command: "I propose to disable route .id=*2 (distance=2). Want me to proceed? This is a write operation."
Waited for confirmation. Only after "yes, go ahead" did it generate the command `/ip route disable numbers=*2` and execute it.
Verified the fix with a new traceroute. No loop. Latency back on target.

Why "agentic" is not just a buzzword

The word "agent" is abused. Almost every LLM tool in 2026 calls itself an "agent". The concrete difference is in three things: the loop, the in-flight context memory, and the safety protocol. Three things that, in production, decide whether the AI actually replaces a tier-1 NOC or is just a demo assistant.

1. The iteration loop

A copilot gives you one answer and stops. An agent executes an action, observes the result, decides the next one, and repeats. ARIA has a loop with a maximum of 5 tool-calling iterations per conversational cycle. Five is enough: if it cannot solve the problem in five steps, it escalates to a human. The bounded loop avoids the classic runaway-agent problem where the model keeps calling tools without converging.

2. In-flight context memory

A NOC does not work on single answers: it works on sessions. If you changed a route half an hour ago and traffic now stops, the agent has to connect the two events. ARIA uses an in-memory context cache with 1-hour TTL, max 200 concurrent contexts, LRU eviction. Each conversation keeps the state of queried devices, executed commands and received outputs. Without that memory, every question would start from scratch — exactly what happens when a ticket gets handed over between two NOC shifts that don't talk to each other.

3. Safety protocol by design

The most important part — and the most underestimated. ARIA classifies every command as readonly or write. The logic is declarative: an `is_readonly_command()` function parses RouterOS syntax and decides. Writes require `confirmed=true` after explicit user confirmation. There is no way to bypass the protocol via prompt injection: classification happens in code, not in the model.

What we learned by training ARIA on real CHR devices

ARIA's RAG knowledge base counts 45 structured YAML documents across 5 domains: mikrotik (firewall, vlan, vpn, routing, services), optiwize-syntax (command format, common mistakes), error-patterns (10 RouterOS error families), troubleshooting (slow connection, no internet, wifi issues), response-patterns. That is much less documentation than it sounds. The difference between a RAG that works and one that does not is not quantity: it is how documents are shaped around real cases.

Concrete example: the file `optiwize-syntax/write-commands.yaml` does not explain the abstract syntax of RouterOS commands. It explains the exact errors that happen when you forget the `=` prefix before a parameter, or use a quote where it doesn't belong. It is a manual for avoiding real mistakes, not a reference for the language. The LLM reads it once, and stops making them.

Three things an agent must not do

Don't ask confirmation for reads. If the NOC runs a traceroute, the AI should not stop to ask "may I?". It would be like a junior asking permission before every `ping`. Useless and annoying.
Don't execute writes without a natural-language summary. "I'm about to disable route 2" is different from "I'm about to run `/ip route disable numbers=*2`". The first is readable in half a second. The second forces the human to translate.
Don't fake confidence when there isn't any. If the database does not have the answer, the agent must say "I don't have references on this, I suggest escalation". Human NOCs do this. The AI must too.

What it means for whoever runs a network

If you run a MikroTik network — ISP, WISP, MSP — the math is simple. How many times a week does your NOC open an SSH session to run a diagnostic traceroute or ping? How long between alert arrival and first command issued? How many average diagnoses before you find the fix? ARIA does not replace your senior NOC. It replaces the three hours of preliminary investigation done by tier-1 before escalating to tier-2. That's exactly where the NOC loses time and where MTTR blows up.

In our internal benchmarks — based on synthetic scenarios and real CHR — the first-diagnosis time drops from a 47-minute average to under 5 minutes for the cases ARIA covers. It does not mean −89% overall MTTR on everything: it means tier-1 is no longer the bottleneck. And that bottleneck, in a NOC with 80,000 devices, is the difference between a quiet night and a night of bounced alerts.

Where we are going

ARIA's current architecture is based on RouterOS commands generated by the LLM and validated at runtime. We are migrating towards a template-driven model: the LLM no longer generates raw commands, but selects templates from a catalogue (firewall, vlan, vpn, routing) and fills typed parameters. Four levels of validation — static schema, runtime, on-device pre-check, post-check on the result — instead of one. It's the same idea we have applied for years to OptiWize for multi-vendor provisioning: LLM freedom where it counts (figuring out what to do), industrial rails where you don't joke around (executing the operation).

Want to see ARIA at work on your own topology? The typical scenario takes 20 minutes: we pick a real case you are currently dealing with, reproduce it on CHR, and show you how it tackles it. If the assumptions hold up, we propose a supervised trial on your network.

Keep reading

AI ENTERPRISE

AI Receptionist for healthcare: 3,440 calls in 4 months, an enterprise AI lesson

YoDa Health is our vertical AI Receptionist for private healthcare. Four months of operation, real numbers, what works and what doesn't when AI meets real users — not lab users.

Read the article →

METHOD

Why we build the signal first, then the product

More than 50% of startups fail. Not from bad ideas, but because they build them before validating them. Vibe Lab is our way of doing the opposite.

Read the article →

Want the next ones in your inbox?

About one article every two weeks. No spam, no fluff.