LLMOps Infrastructure & Security | 2026-05-11

🔥 Story of the Day

The attack surface moved inside the agent. So did Arcjet. (https://thenewstack.io/arcjet-wafs-guards-ai-agents-security/) — The New Stack

Arcjet's Guards capability tackles a critical blind spot in AI application security: malicious activity originating deep inside an agent's execution flow. Traditional security mechanisms like WAFs and proxies are inherently perimeter-based, only inspecting traffic that passes through a defined network boundary. This assumption fails when an agent processes untrusted input through internal functions, queue handlers, or complex workflow steps—paths that never materialize as a standard HTTP request.

This shift in attack surface requires a change in defense posture. Instead of just inspecting incoming payload signatures, security enforcement must become context-aware and internal to the execution logic. Arcjet’s approach focuses on embedding policy enforcement points directly into the agent runtime environment, effectively treating the internal logic calls as points requiring inspection, regardless of their network origin.

For building robust, secure ML infrastructure, this is significant because it demands a shift from network security thinking to compute-graph security thinking. The inability of a WAF to detect malicious content exfiltrated via a function argument passed to an agent, bypassing the request body inspection layer, exemplifies this operational gap.

⚡ Quick Hits

Running local models on an M4 with 24GB memory (https://jola.dev/posts/running-local-models-on-m4) — Hacker News - Best

Running LLMs locally on Apple Silicon M4 hardware demonstrates the feasibility of shifting inference entirely to on-device accelerators. This capability significantly improves data privacy and reduces operational expenditure by eliminating reliance on external cloud API calls for inference compute.

Local LLM Speed Calculator (https://martinalderson.com/posts/local-llm-speed-calculator/) — Hacker News - LLM

This calculator estimates LLM inference speed by factoring in model size, quantization level (e.g., 4-bit), and target GPU memory bandwidth. This allows for proactive capacity planning and realistic throughput benchmarking on self-hosted LLM deployments without needing to wait for vendor performance metrics.

Agent VCR – Time-travel debugging for LLM agents (rewind, edit state, resume) (https://github.com/ixchio/agent-vcr) — Hacker News - LLM

agent-vcr captures the complete state, prompts, and outputs of complex agentic interactions, enabling recording and deterministic replay. This is essential for debugging agent logic and verifying reproducible behavior in production MLOps pipelines.

Why 157,000 developers are hedging against Anthropic with OpenCode (https://thenewstack.io/anthropic-claudecode-opencode-split/) — The New Stack

Anthropic is increasing rate limits and announcing public beta features like "Outcomes" for Managed Agents. However, the article notes a trend toward vendor-locked, managed agent execution, citing necessary changes like OAuth lockout procedures for platform access.

How to get engineering time back from Kubernetes upgrades (https://www.cncf.io/blog/2026/05/11/how-to-get-engineering-time-back-from-kubernetes-upgrades/) — CNCF Blog

Maintaining large-scale Kubernetes deployments incurs substantial operational overhead dealing with API deprecations and open-source drift. A minor cross-region EKS upgrade was reported to consume four to six weeks of engineering effort, diverting focus from product features.

Researcher: gemma4:e4b • Writer: gemma4:e4b • Editor: gemma4:e4b