Technology | eclicktech's Engineering Practices in Agentic AI_News_Corporate News_International Leading Digital Service Provider

Technology | eclicktech's Engineering Practices in Agentic AI

2026-05-09
09:50 AM

With the arrival of 2026, generative AI has completely left behind the era of ubiquitous demos and officially entered the deep waters of engineering implementation. As the capability boundaries of large models are no longer mysterious, the core proposition facing enterprise architects has shifted: how to effectively tame the inherent "hallucinations" and "forgetfulness" of Agents in complex enterprise architectures with extremely high determinism requirements, and make probabilistic intelligence run stably on deterministic production systems?

At the 2026 QCon Global Software Development Conference in Beijing, He Yuhang, Director of Middle Platform R&D at eclicktech, shared his engineering practices and insights in the field of Agentic AI. Relying on enterprise-grade Context Engineering and a deep defense-in-depth security system, he detailed how eclicktech securely embeds probabilistic AI into the capillaries of its global business.

01 Underlying Foundation: A Deterministic Architecture of Multi-Cloud Symbiosis

The stable operation of Agents cannot be separated from a solid infrastructure. eclicktech's core business covers more than 230 countries and regions worldwide. Facing extremely high compliance and network connectivity challenges, its underlying Cycor platform established a multi-cloud strategy from the very beginning of its design.

Currently, the platform has achieved seamless access and unified resource scheduling for major cloud vendors including AWS, GCP, Alibaba Cloud, Tencent Cloud and HUAWEI Cloud, actually managing a large number of K8s clusters and underlying components, forming a cross-cloud, cross-region unified control plane.

In the construction logic of Agents, this multi-cloud symbiotic architecture that does not rely on a single cloud base has extremely high strategic value: it not only fundamentally avoids the risk of vendor lock-in, but more importantly, enables dynamic balancing between cost, effectiveness and controllability when scheduling large model computing power. For the underlying O&M team with relatively lean manpower, this highly unified multi-cloud scheduling capability is the physical prerequisite for building DevOps Agents later.

02 Technical Breakthrough: From Prompt Orchestration to Context Engineering

In the early stage of Agent exploration (V1 phase), the R&D team built linear workflow orchestration based on a low-code platform: a pre-classifier routed to different fixed Agents according to the System Prompt. After three months of system operation, the architectural pains were concentratedly exposed:

First, the classifier that was highly dependent on system prompts was extremely unstable, with a classification error rate maintained at around 15% for a long time. The team often fell into the dilemma of "fixing scenario A but breaking scenario B". Second, memory was limited to a single context window, lacking cross-session persistence capability, resulting in the same fault being repeatedly reasoned from scratch in different sessions. Finally, fixed orchestration made each Agent fight independently, unable to collaboratively handle cross-domain link problems.

Therefore, the technical team decisively abandoned linear orchestration and turned to an autonomous agent architecture based on Agent Loop (up to 15 rounds of tool call loops within a single turn of dialogue), and completely shifted the engineering focus from "how to phrase (Prompt)" to "what information to give at each step (Context)" — Context Engineering.

Simply put, Context Engineering solves three intertwined fundamental problems: how to get the required information in, how to block out irrelevant information, and how to spend the precious Token budget where it matters most. Around these three main lines, eclicktech has built a complete engineering system covering "layered memory + active injection + budget governance + compression and continuation".

① Building a Six-Layer Context System

To enable Agents to both remember key clues and effectively filter noise in long-term tasks, the system designed a dynamic information pipeline that refines context into six levels:

L1 Session Memory (Current Session): Based on standard PostgreSQL tables, hard-isolated by session_id, supporting millisecond-level read/write and automatic cleanup for the current session.
L2 Short-Term Memory: Maintains a 24-hour cross-session time window for identifying recurring faults in the short term.
L3 Long-Term Memory (Persistent Facts): Introduces a memory engine and vector storage to extract high-value dialogues into objective facts and persist them, cooperating with Agentic Search for semantic retrieval and conflict merging.
L4 Knowledge Graph (Entity Relationships): Triples extracted by LLMs are stored in a graph database, helping Agents establish topological awareness of resources in complex microservice networks.
L5 Experience (Personal Experience Library): The system automatically clusters high-frequency fault patterns and extracts experience tags such as "check limits first when encountering OOM", which are automatically injected in similar error scenarios.
L6 Skill (Organizational Skill Manual): Standardized Markdown manuals solidified from manually verified experience, precipitated as organizational-level Skill assets, truly realizing the leap from "personal experience" to "team assets".

② Active Injection: Let Agents "Know Exactly When They Need It"

Just storing information is far from enough. The traditional "Agent retrieves on demand" model has a fundamental flaw — the model does not know what it does not know, and cannot actively retrieve a fault record it has never heard of. To address this, eclicktech drew on the hook-based active push approach and built three types of retrieval hooks at key nodes in the Agent lifecycle:

UserMessage Hook: Before a user's question enters the Agent Loop, perform intent filtering and dual-path recall of keywords/semantics, and inject relevant memories into the System Prompt in layers.
PreToolUse Hook: Before sensitive tool calls such as writing files or modifying configurations, match historical change records and known risks by precise resource ID to prevent Agents from repeating mistakes.
ErrorSignal Hook: Once error keywords (timeout, OOM, ImagePullBackOff, etc.) are detected, automatically pull historical solutions by bugs/errors dimensions and inject them in layers.

This mechanism upgrades "memory" from a passive database to an active co-pilot — the relevant context is already quietly in place before the Agent actually needs a piece of knowledge.

③ Token Budget Governance: Progressive Injection and Layered Content

The context window is the most scarce resource in the Agent era. Experience shows that crudely inserting 3 pieces of knowledge, 500 tokens each, will consume about 10% of the available window, not only crowding out reasoning space but also amplifying the Lost in the Middle effect. For this, eclicktech built a layered Token budget governance system:

Three-Tier Content Stratification: Each piece of knowledge is pre-generated into three "resolutions" — L0 Abstract (a one-sentence summary of about 100 tokens), L1 Overview (detailed key points of about 300 tokens), and L2 Full (the complete Markdown text).
Dynamic Tiering by Relevance: After retrieval hits, inject L1 for relevance score > 0.8, downgrade to L0 for score ≤ 0.8, and expand L2 only when the user or Agent actively reads. Single injection tokens are stably controlled within the small window of 100–300.
Short Session Pass-Through, Long Session Sampling: When the total character count of the entire session is within the budget, pass through without compression for zero information loss. Once over budget, prioritize truncating single assistantText instead of discarding entire question-answer pairs to preserve the integrity of the reasoning chain.
Hard Budget + Soft Degradation: Each link has clear performance budgets (e.g., UserMessage injection completed within 3 seconds, PreToolUse injection within 100 milliseconds). Timeout triggers the degradation path — better to inject less than to block the main process.

With this combination of measures, the Token consumption per injection decreased by about 80%, while information integrity was better guaranteed because the full L2 content was always "one click away".

④ Progressive Tool Loading to Break the Token Bottleneck

In real K8s O&M scenarios, Agents face dozens or even hundreds of optional tools. If all Tool Schemas are stuffed into the Prompt at once, it not only wastes Tokens but also triggers the large model's Lost in the Middle effect, leading to disordered tool selection.

For this, eclicktech designed the Deferred Tool Registry mechanism: only core tools such as list_pods are activated in the initial state, and the rest of the long-tail tools only retain minimal descriptions in the Prompt. When the model reasoning requires it, the corresponding tools are dynamically awakened and loaded on demand through the internal tool_search capability.

This engineering measure brought a leapfrog improvement in effectiveness: the tool call accuracy increased from about 70% in V1 to about 90% in V2. Since the memory layer can directly hit solved fault patterns, the processing time for repetitive problems was shortened from the original 60-second level to less than 5 seconds, achieving an order-of-magnitude improvement in response cycle.

⑤ Compression and Continuation: Preventing Long Tasks from "Forgetting"

Even with strict budget governance, long-link troubleshooting in real O&M scenarios may still approach the context limit. For this, the system triggers the PreCompact hook when the window approaches the threshold: compresses the existing dialogue into a structured summary format of "Problem-Action-Observation-Conclusion", generating a three-part session summary of {overview, steps, todos}, which is injected as the Warm layer (summaries of the last 10 sessions, FIFO eviction) in the next round. This enables Agents to "remember what was done last time and which TODOs are not closed" even across multi-stage tasks spanning several hours, completely bidding farewell to the "forgetting when the dialogue is closed" dilemma in the V1 phase.

It can be said that if L1–L6 answer where to put the information, and active injection solves when to send it in, then Token budget governance and compression and continuation answer how to spend every Token on the most valuable place within the limited window — these three things together constitute eclicktech's true "Context Engineering".

03 Security Bottom Line: Five Lines of Defense to Ensure Controllability

Handing over production environments with extremely high determinism requirements (such as operating massive K8s clusters) to probabilistic large models for decision-making makes security an insurmountable bottom line. In eclicktech's governance philosophy, "AI is an accelerator, not a brake" — but accelerators must run on tracks with guardrails.

For this, the system designed five layers of structured defense-in-depth gates on the Agent operation link. In this system, only 1 layer allows large models (LLMs) to participate in decision-making, and the remaining 4 layers are all physically backed by rigorous rule-based code:

Whitelist Access Control (NamespaceGuard): Directly blocks the visibility and operation permissions of LLMs to core namespaces such as kube-system at the middleware code level, isolating risks at the source.
Dry Run + Human-in-the-Loop (HITL): O&M instructions generated by LLMs first undergo dry-run verification, and sensitive operations forcefully trigger the manual approval workflow (this is the only layer where LLMs participate in verification judgment).
Resource Locks and Blast Radius Limitation: Hardcodes the resource quota and impact scope of a single operation through code to prevent cascading avalanches.
Rule Verification (Do Not Trust LLMs): After instruction execution, the system refuses to rely on the LLM's natural language response, but re-calls the system interface through code to compare whether the actual state meets expectations.
Mandatory Rollback Mechanism: The system requires all modification tools to be registered with degradation and rollback logic, enabling one-click return to a safe state in case of exceptions.

Through this defense mechanism, eclicktech has reduced the misexecution rate of complex cluster operations to nearly zero, achieving a stable balance between efficiency and security.

04 Future Insights: Productivity Restructuring and the New Identity of Developers

Driven by engineering implementation, the positioning of AI has undergone a qualitative change. As stated in its internal practice summary: in the 2026 era of AI coding, the working posture of developers will be completely restructured — "AI is responsible for execution, and humans are responsible for Taste (aesthetics and logical judgment)".

At present, Agents at eclicktech have long broken through the boundaries of the underlying O&M laboratory and deeply penetrated into the capillaries of the business as "digital companions". Relying on a solid AI middle platform architecture, nearly a hundred Agents with different functions are actively running within the company, covering multiple dimensions including marketing business (strategy insight, automated delivery), internal operations (BI analysis, approval collaboration), technical O&M, and customer service.

As large model bases gradually converge, the real technical barriers will be built on three things: the depth of an enterprise's understanding of Context Engineering, the control over multi-cloud architecture, and the ability to precipitate organizational experience into executable Skills. eclicktech's engineering practices once again prove that in this wave of productivity restructuring, harnessing the uncertainty of AI with the certainty of engineering is the only way for enterprises to move toward intelligence.