A Framework for LLM Agent-Assisted Network Management with Human-in-the-Loop

Internet-Draft	LLM4Net	July 2026
Cui, et al.	Expires 2 January 2027	[Page]

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶

This Internet-Draft will expire on 2 January 2027.¶

Copyright Notice

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.¶

1. Introduction

1.1. Motivation

Traditional network automation systems often fail to handle unanticipated scenarios or manage complex, multi-domain data dependencies. Large Language Models (LLMs), when used as agent-assisted components, offer multimodal data comprehension, adaptive reasoning, and broad generalization, making them a candidate technology for network management assistance [TM-IG1230]. This document describes a framework for using LLM-assisted agents as decision-support components in network management workflows.¶

1.2. Why Human-in-the-Loop is Necessary

Human-in-the-loop operation is necessary because network management actions can affect service availability, security posture, customer traffic, and compliance obligations. An LLM-generated recommendation may be syntactically valid but still operationally unsafe if it relies on stale telemetry, misinterprets local policy, affects the wrong service scope, or ignores maintenance windows and business constraints.¶

In addition, many network-management decisions depend on operational knowledge that may not be fully represented in telemetry or retrieved documents, such as planned maintenance, customer exceptions, escalation procedures, and local risk tolerance. Human review therefore provides the point at which evidence, uncertainty, operational impact, and authorization are assessed before a recommendation is applied to the network. This document treats human-in-the-loop control as a necessary part of the safety and accountability model for LLM agent-assisted network management.¶

1.3. Problem Statement

Network management presents persistent operational challenges, including multi-vendor configuration complexity, correlation of heterogeneous telemetry data, and timely response to dynamic security threats. LLM agents offer a potential approach to address these challenges through their data comprehension and reasoning capabilities.¶

However, applying LLM agents in network management raises several research and engineering questions. These include how to provide semantic context for telemetry and configuration state, how to represent confidence and operational risk, how to validate LLM-generated recommendations before execution, and how to preserve auditability through provenance tracking.¶

This document is intended as input for NMRG discussion on AI in network management. It focuses on research challenges and reference components rather than specifying a new network management protocol, a new LLM interface, or fully autonomous network control. Although the framework shows several supporting components, their purpose is to illustrate the human-supervised decision path from context collection to recommendation, validation, operator audit, and controlled execution.¶

1.4. Research Questions

This document motivates discussion of the following research questions:¶

What semantic context is needed for LLM-assisted systems to reason correctly over network telemetry and configuration state?¶
How can confidence, uncertainty, validation results, and operational risk be represented so that operators can make informed decisions?¶
How can human review, approval, modification, and rejection be designed as a meaningful control step rather than a procedural confirmation?¶
What evidence and decision records are needed to make LLM-assisted recommendations and operator actions auditable and reproducible?¶

2. Terminology

2.1. Acronyms and Abbreviations

LLM: Large Language Model¶
RAG: Retrieval-Augmented Generation¶
MCP: Model Context Protocol¶
A2A: Agent-to-Agent Protocol¶
NACM: NETCONF Access Control Model¶

3. Reference Framework

+-------------------------------------------------------------+
|         LLM-Agent Assisted Network Management System        |
+-------------------------------------------------------------+
|+---------------LLM Agent Decision Module-------------------+|
||                                                           ||
||               +----Task Agent Module---+  +-------------+ ||
||               | +---------------------+|  | Task Agent  <-----+
||               | | Tools/Agent Comms   ||<-> Mgt Module  | ||  |
||               | +---------------------+|  +-------------+ ||  |
||               | +------+  +----------+ |  |Syntax Verify| ||  |
||               | |Prompt|  |Fine-Tuned| <->|     Module  | ||  |
||               | | Lib  |  |Weight Lib| |  +-------------+ ||  |
|| +----------+  | +------+  +----------+ |  +--------------+||  |
|| |RAG Module|<-> +--------------------+ |  |Access Control|||  |
|| +-----^----+  | |Foundation Model Lib| -->|    Module    |||  |
||       |       | +--------------------+ |  +-------|------+||  |
||       |       +----^---------------^---+          |       ||  |
|+-------|------------|---------------|--------------|-------+|  |
|+-------v------------v----+ +--------v--------------v-------+|  |
||Enhanced Telemetry Module| |   Operator Audit Module       ||  |
|+-----------^-------------+ +--------------|-------------^--+|  |
+------------|------------------------------|-------------|---+  |
             |                              |       +-----v---+  |
             |                              |       |Operator <--+
             |                              |       +---------+
+------------v------------------------------v------------------+
|               Original Network Management System             |
+------------------------------^-------------------------------+
                               |
+------------------------------v-------------------------------+
|                       Physical Network                       |
+--------------------------------------------------------------+

Figure 1: The LLM agent-Assisted Network Management Framework

Figure 1 illustrates the principal components of the LLM agent-assisted network management framework. The figure is a functional decomposition used to discuss human-supervised decision support, not a complete product architecture. A human operator instantiates a specific task agent (e.g., for fault analysis or topology optimization) via the Task Agent Management Module by specifying a foundation model, a prompt, and optional fine-tuned adapter parameters [Hu22]. The Enhanced Telemetry Module enriches raw telemetry data obtained from the underlying network management system and supplies it to the LLM Agent Decision Module. After decision-making, the generated configuration is validated for syntactic correctness and checked against access control rules. The Operator Audit Module provides a structured mechanism for human review of generated configurations; upon operator approval, configurations are issued to the network management system for deployment.¶

3.1. Enhanced Telemetry Module

The Enhanced Telemetry Module enriches raw telemetry data with semantic context, providing structured input to the LLM Agent Decision Module. Telemetry data retrieved from network devices via NETCONF [RFC6241] (e.g., in XML format) typically lacks field descriptions, structured metadata, and vendor-specific context. Because this supplementary information is not present in the pre-trained knowledge of general-purpose LLMs, its absence can lead to misinterpretation and erroneous reasoning. To address this, an external knowledge base is used to store YANG model schemas, device manuals, and other relevant documentation. The Enhanced Telemetry Module operates as middleware between the network management system and the external knowledge base. Through its southbound interface, it retrieves NETCONF data from the NETCONF client of the existing network management system. Through its northbound interface, it queries the external knowledge base for the corresponding YANG model or device documentation. To improve semantic richness, the module processes retrieved data by simplifying formatted content (e.g., removing redundant or closing XML tags) and appending YANG tree path and field-description information to the relevant data elements. This produces structured, context-enriched input suitable for LLM analysis and reasoning.¶

3.2. LLM Agent Decision Module

3.2.1. RAG Module

A pre-trained LLM may lack knowledge of operator-specific configurations, vendor-specific syntax, or domain-specific operational procedures. Retrieval-Augmented Generation (RAG) [Lewis20] can retrieve relevant information from operator-defined sources, such as device documentation and operational knowledge bases, and combine it with semantically enriched telemetry. For human-in-the-loop review, retrieved evidence needs to be traceable. The identifiers, versions, timestamps, and scope of retrieved documents SHOULD be recorded so that the operator and later audit processes can understand which evidence influenced the recommendation.¶

3.2.2. Task Agent Module

A task agent is created to support a specific network management task, such as traffic analysis, traffic optimization, or fault remediation. A task agent can include a selected foundation model, an associated prompt, and optionally fine-tuned adapter weights. These are deployment choices rather than the primary focus of this document.¶

The framework can use libraries for foundation models, prompts, and fine-tuned adapter weights [Hu22]. From the human-in-the-loop perspective, the important requirement is that the selected model, prompt template, adapter version, input context, and output format are recorded as part of the decision context. This allows operators to understand the source of a recommendation and allows later audit or reproduction of the decision process.¶

3.2.3. Task Agent Communication Module

A task agent may interact with external tools (e.g., Python scripts, network verification tools such as Batfish, or optimization solvers) to acquire additional information or perform specific operations. Emerging agent protocols such as the Model Context Protocol (MCP) [mcp] and Agent-to-Agent Protocol (A2A) [a2a] illustrate possible mechanisms for tool invocation and inter-agent coordination. This document does not require a specific agent protocol. A deployment may use any mechanism that provides authenticated tool access, schema validation for tool inputs and outputs, error propagation, and audit correlation between an LLM-assisted recommendation and the external evidence or tool results used to produce it.¶

In multi-domain or complex scenarios, multiple task agents may collaborate to achieve a shared network management objective. Such collaboration should preserve task context, partial results, constraints, and confidence information so that handoffs remain auditable and operator review remains meaningful.¶

3.2.4. Task Agent Management Module

The Task Agent Management Module is responsible for controlled creation, update, and deletion of task agents. In this framework, its main role is to bind a task agent to an operational objective, permission scope, and audit context.¶

A task agent may be instantiated in response to an operator request, an automated policy trigger, or a higher-level orchestration workflow. Creation includes parsing the high-level intent, selecting an appropriate task template, initializing network context, assigning credentials or permissions, and creating a session identifier for audit correlation.¶

Task agent updates may be needed when network conditions, model versions, prompts, or operator policies change. Updates SHOULD preserve the audit context and SHOULD record changes that may affect the interpretation of later recommendations.¶

When a task agent completes, fails, or is explicitly terminated, its final state, generated actions, tool results, and relevant performance information SHOULD be archived in the audit log. This ensures that operator decisions and LLM-assisted recommendations can be traced after the task has ended.¶

By providing structured, auditable, and policy-governed lifecycle management, the Task Agent Management Module supports operator-supervised LLM-assisted decision support rather than unconstrained autonomous network control.¶

3.3. Config Verification Module

3.3.1. Syntax Validation Module

LLM-generated configurations MUST pass YANG schema validation before being queued for human approval. This module ensures that only syntactically correct configurations are presented for operator review, reducing the likelihood of invalid configurations reaching the deployment stage.¶

3.3.2. Access Control Module

Syntactic correctness alone does not prevent an LLM from generating configurations that would perform unintended or harmful operations on critical network devices. It is therefore necessary to enforce explicit permission boundaries for LLM task agents.¶

The NETCONF Access Control Model (NACM) defined in [RFC8341] provides a framework for specifying access permissions that can be applied to LLM task agents. NACM defines the concepts of users, groups, access operation types, and action types, which are applied as follows:¶

User and Group: Each task agent is registered as a distinct user, representing an entity with defined access permissions for specific devices. A task agent (user) is identified by a unique string within the system. Access control may also be applied at the group level, where a group consists of zero or more members and a task agent may belong to multiple groups.¶
Access Operation Types: These define the types of operations permitted, including create, read, update, delete, and execute. Each task agent is assigned a set of permitted operation types based on its role.¶
Action Types: These specify whether a given operation is permitted or denied, determining whether an LLM-generated operation request is allowed under the configured access control rules.¶
Rule List: Each rule governs access control by specifying the content and operations a task agent is authorized to handle within the system.¶

This module MUST enforce explicit restrictions on the operations an LLM agent is permitted to perform, ensuring that network configurations remain compliant with operational security policies.¶

3.3.3. Feedback Module

LLM-generated configurations may not always satisfy YANG schema constraints, access control rules, or operational requirements. The Feedback Module supplies structured feedback (e.g., in structured text format) and corrective hints to the LLM agent, enabling iterative refinement of generated configurations to meet these constraints.¶

3.4. Operator Audit Module

The Operator Audit Module provides a structured mechanism for human review of LLM-generated configurations prior to deployment. The output of the LLM Decision Module includes both the generated configuration and an associated confidence score. The configuration is validated against the YANG model and subject to access control enforcement. The confidence score (e.g., on a scale of 0 to 100) provides operators with a quantitative reference for assessing the reliability of the recommendation.¶

The purpose of this module is not only to collect an approval action. It is intended to give operators enough context to judge whether the recommendation is consistent with the operational objective, whether the supporting data is complete and fresh, whether the affected network scope is acceptable, and whether additional verification or escalation is needed.¶

Each audit instance MUST record the input context (e.g., input data, RAG query content, model selection, relevant configuration files) and the corresponding decision output. The audit steps include the following:¶

Result Verification: The operator verifies that the LLM-generated output is consistent with operational objectives and policy requirements.¶
Compliance Check: The operator confirms that the output adheres to applicable regulatory standards and operational policies.¶
Security Verification: The operator checks the output for potential security issues, such as misconfigurations or unintended access changes.¶
Correction: If issues are identified, the operator documents the findings and applies corrective modifications.¶

Upon completion of the audit, the system records an audit decision entry to ensure traceability of operator actions. The audit record includes:¶

Timestamp of the audit action¶
LLM Task Agent ID¶
Operator decision (approve, reject, modify, or defer)¶
Final executed command¶
Operation type (e.g., configuration update, deletion, or execution)¶

The Operator Audit Module also provides explainability support to improve transparency in LLM-assisted decision-making. Each LLM-generated configuration includes a structured rationale indicating the key factors that influenced the decision. For example, if the system recommends increasing bandwidth allocation, the decision log indicates whether this was driven by high latency observed in telemetry, an SLA threshold breach, or another contributing factor.¶

The audit process additionally supports counterfactual reasoning, enabling operators to assess the projected outcome if no action is taken. For example, the system may indicate that without intervention, packet loss is expected to increase by a specified percentage within a defined time window. This provides operators with a comparative basis for evaluating proposed actions.¶

If an LLM agent decision is based on incomplete or uncertain data, the system flags it accordingly. For example, if real-time telemetry data is insufficient, the confidence score is lowered and the condition is noted in the audit record, allowing operators to exercise appropriate judgment.¶

4. Research Challenges and Considerations

This section summarizes research challenges raised by the reference framework. The intent is to identify issues for further NMRG discussion rather than to prescribe a single implementation.¶

4.1. Semantic Context and Provenance

Human review depends on context that is both machine-processable and meaningful to operators. Telemetry values, topology information, configuration state, and retrieved documents need scope, timestamp, source, collection method, and provenance metadata so that operators can judge whether the recommendation is based on relevant and current evidence.¶

4.2. Uncertainty and Risk Representation

Confidence scores alone are insufficient for network-management decisions. Research is needed on how to present model uncertainty, input-data uncertainty, validation status, and operational impact as operator-facing risk information.¶

4.3. Human Review Workflow

Human review needs to be designed so that operators can make informed decisions under time pressure. Open questions include what evidence to present, when to require secondary approval, how to avoid automation bias, and how to support operator requests for more evidence.¶

4.4. Accountability and Auditability

LLM-assisted decisions need audit records that connect input evidence, model output, tool results, validation outcomes, operator decisions, and final actions. Such records are needed for incident analysis, rollback, compliance, and research reproducibility, while still protecting sensitive operational data and credentials.¶

5. Use Cases

5.1. DDoS Intelligent Defense

Distributed Denial of Service (DDoS) attacks represent a persistent operational threat. Conventional mitigation systems based on rate-limiting and signature matching may not adapt rapidly enough to generate fine-grained filtering rules in response to multi-dimensional telemetry patterns.¶

This use case illustrates how the LLM agent-assisted framework supports filtering recommendation generation with human oversight. The Enhanced Telemetry Module retrieves and enriches traffic statistics, and the LLM-assisted agent generates a filtering recommendation with a rationale and evidence references.¶

The recommendation is passed through syntax validation and access-control enforcement before being shown to the operator. Human review is needed because the operator may need to judge collateral impact, customer exceptions, threat-intelligence context, and escalation policy. The final operator decision and any executed command are recorded in the audit log.¶

5.2. Traffic Scheduling and Optimization

In large-scale networks, dynamic traffic scheduling is required to respond to fluctuating load, maintain QoS, and satisfy SLA requirements. Static or rule-based methods may not provide sufficient responsiveness or cross-domain visibility.¶

This use case illustrates how the framework supports LLM agent-assisted traffic scheduling with operator control. The Enhanced Telemetry Module collects and enriches link utilization, queue occupancy, delay metrics, topology information, and service-class constraints.¶

The LLM-assisted agent proposes a traffic-engineering adjustment, such as rerouting selected traffic classes through underutilized paths or adjusting policy parameters. The recommendation is validated before operator review. The operator then reviews the recommendation, validation result, expected path shift, policy constraints, and possible impact on other traffic classes before approving, modifying, rejecting, or deferring the action. The final operator decision and any revised configuration are stored in the audit log.¶

6. Security Considerations

This section summarizes the main security issues introduced by LLM-assisted network management. The analysis assumes that LLM-assisted agents can access telemetry, retrieve contextual knowledge, invoke external tools, and generate recommendations or candidate configuration changes subject to human oversight.¶

Security-sensitive assets include network configuration state, telemetry data, external knowledge bases, prompts and system instructions, fine-tuned weights, agent credentials, and human audit records. Compromise of these assets may lead to service disruption, policy violations, data leakage, or unauthorized configuration changes.¶

The framework introduces trust boundaries between the LLM agent and the network management system, between the agent and external toolchains, between cooperating task agents, between retrieval databases and the LLM context, and between automated decision modules and human operators. Each boundary needs explicit protection through authentication, authorization, input validation, integrity protection, and audit logging.¶

Prompt injection and RAG knowledge poisoning are important risks because they can influence operator-facing recommendations. Malicious telemetry fields, compromised documentation, poisoned retrieval databases, or cross-agent messages may affect model output. Mitigations include context separation, structured role tagging, input sanitization, integrity verification of retrieval documents, versioning of knowledge sources, logging of retrieved document identifiers, and deterministic validation of generated configurations.¶

Agent identity and tool access also need protection. Each task agent should be bound to a distinct identity and to explicit permissions, for example through NACM-based access control. Tool invocation should use authenticated access, schema validation for tool inputs and outputs, sandboxing where appropriate, and human confirmation for high-risk tool invocations.¶

The LLM-assisted decision layer can itself become a denial-of-service target. Excessive task instantiation requests, high-frequency telemetry triggers, or multi-agent loops may cause resource exhaustion or delayed incident response. Mitigations include rate limiting, admission control, quotas, maximum reasoning-depth or token limits, and circuit breakers in the Task Agent Management Module.¶

LLMs may generate syntactically correct but semantically invalid configurations, such as referencing non-existent interfaces, misinterpreting vendor-specific syntax, or using incorrect parameter units. Mitigations include YANG schema validation, deterministic configuration simulation, access-control checks, confidence-based escalation thresholds, explicit reasoning logs, and operator review. Human approval MUST remain the final authority for high-impact changes.¶

To support structured oversight, each generated configuration SHOULD be assigned a risk level derived from factors such as scope of impact, operation type, policy sensitivity, confidence score, and historical rollback frequency. Risk classification MUST be included in the audit record.¶

7. IANA Considerations

This document includes no request to IANA.¶

8. References

8.1. Normative References

[RFC6241]: Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed., and A. Bierman, Ed., "Network Configuration Protocol (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011, <https://www.rfc-editor.org/rfc/rfc6241>.
[RFC8341]: Bierman, A. and M. Bjorklund, "Network Configuration Access Control Model", STD 91, RFC 8341, DOI 10.17487/RFC8341, March 2018, <https://www.rfc-editor.org/rfc/rfc8341>.

8.2. Informative References

[a2a]: "Announcing the Agent2Agent Protocol (A2A)", July 2025, <https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/>.
[Hu22]: Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and W. Chen, "LoRA Low-Rank Adaptation of Large Language Models", n.d..
[Lewis20]: Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., and S. Riede, "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks", n.d..
[mcp]: "Model Context Protocol", July 2025, <https://modelcontextprotocol.io/introduction>.
[TM-IG1230]: McDonnell, K., Machwe, A., Milham, D., O’Sullivan, J., Niemöller, J., Varvello, L. F., Devadatta, V., Lei, W., Xu, W., Yuan, X., and Y. Stein, "Autonomous Networks Technical Architecture", February 2023.

module: llm-response-module
  +--rw llm-response
     +--rw config?         string
     +--rw confidence?     uint8
     +--rw error-reason?   enumeration

The LLM response YANG model is structured as follows:¶

module llm-response-module {
  namespace "urn:ietf:params:xml:ns:yang:ietf-nmrg-llmn4et";
  prefix llmresponse;
  container llm-response {
    leaf config {
      type string;
    }
    leaf confidence {
      type uint8;
    }
    leaf error-reason {
      type enumeration {
        enum unsupported-task;
        enum unsupported-vendor;
      }
    }
  }
}

Human Audit Data Model

This data model defines the structure for human audit operations and records. It supports collaborative decision-making by recording LLM-generated actions alongside the operator's final decision.¶

module: human-audit-module
  +--rw human-audit
     +--rw task-id?      string
     +--rw generated-config?   string
     +--rw confidence?         uint8
     +--rw human-actions
        +--rw operator?          string
        +--rw action?            enumeration
        +--rw modified-config?   string
        +--rw timestamp?         yang:date-and-time

The human audit YANG model is structured as follows:¶

module human-audit-module {
  namespace "urn:ietf:params:xml:ns:yang:ietf-nmrg-llmn4et";
  prefix llmaudit;
  import ietf-yang-types { prefix yang; }

  container human-audit {
    leaf task-id {
      type string;
      }
    leaf generated-config {
      type string;
      }
    leaf confidence {
      type uint8;
      }
    container human-actions {
      leaf operator {
        type string;
        }
      leaf action {
        type enumeration {
          enum approve;
          enum modify;
          enum reject;
          }
        }
      leaf modified-config {
        type string;
        }
      leaf timestamp {
        type yang:date-and-time;
        }
    }
  }
}