
In 2025, MEV revenue on Solana reached $720.1 million for the full year, overtaking priority fees for the first time as the largest component of the network's real economic value. Over 3 billion Jito bundles were processed in that same period, generating 3.75 million SOL in tips. This is the dominant economic layer of Solana trading.
The conversation around AI agents in crypto often focuses on language models, autonomous decision-making, and strategy design. Those things matter. But in the context of Solana HFT, none of them matter if the execution pipeline underneath isn't fast enough to act on the decisions those models make.
This article covers how AI HFT agents are actually built on Solana in 2026—the architecture, the data layer, the execution path from signal to block inclusion, and where latency appears at each stage. We also cover why most agents fail, what dedicated infrastructure changes, and what the performance numbers look like in practice.
What high-frequency trading means on Solana
High-frequency trading in traditional finance operates on microsecond or nanosecond timescales, colocated at exchange matching engines, using FPGA hardware and kernel-bypass networking. On Solana, the timescales are different but the competitive dynamic is identical: whoever gets there first captures the value.
Solana's defining constraint for HFT is the 400ms slot. Every block closes in approximately 400 milliseconds. An opportunity that appears at the start of a slot—a mispriced pool, an undercollateralized loan, a large swap creating an arbitrage gap—exists until the slot closes or a competing transaction corrects it. The HFT agent's goal is to detect, compute, and land a transaction within that window, ahead of all competing agents.
The competitive pressure is severe. According to RPC Fast's internal benchmarks, over 70% of sniper bots target sub-50ms RPC latency for transaction sends, yet only about 10% deliver steady profits—because infrastructure kills their edge before strategy gets a chance.
Solana's slot architecture means that even a 50ms latency advantage can be the difference between first and last in the same block. Unlike traditional HFT where microseconds matter for queue position, on Solana the unit that matters is the slot—400ms—and being inside or outside it is binary.
What a Solana HFT AI agent is
A Solana HFT AI agent is an autonomous system that observes on-chain state, applies decision logic to identify profitable opportunities, constructs transactions, and submits them—all within a sub-slot timeframe, without human intervention.
The "AI" component can mean several different things depending on the strategy:
- Rule-based ML models: trained classifiers that score incoming account updates for opportunity probability. Fast inference, predictable behavior, easy to tune.
- Reinforcement learning agents: models that learn optimal bidding and timing behavior through simulated and live feedback. Used for tip calibration and strategy parameter optimization.
- LLM-based orchestration: language model reasoning for higher-level strategy decisions—position sizing, risk exposure, multi-step trade planning. Runs on a slower loop than the execution layer.
- Hybrid architectures: the most common production setup. An ML model handles microsecond signal classification; an LLM handles strategy-level decisions on a slower cycle; a deterministic execution engine handles transaction construction and submission.
The distinction matters because different components have different latency requirements. The signal detection and execution engine must operate in sub-10ms. The LLM reasoning layer can operate on a 1–5 second cycle without affecting execution speed, as long as the outputs feed a fast execution pipeline.
High-level architecture of a Solana AI HFT agent
A production Solana HFT AI agent consists of five interconnected components. Each has specific performance requirements, and failure in any one of them breaks the whole system.
The data ingestion and signal processing layers run continuously. The decision and construction layers activate only when the signal layer flags an opportunity. The submission layer fires immediately after construction completes.
Most production agents separate the data ingestion component from the execution component—running them on different threads or even different processes—to prevent I/O overhead from blocking execution logic.
End-to-end execution path: from data to block inclusion
Understanding the full execution path reveals where time is spent and where optimizations have the most impact. Here is the complete flow from on-chain event to confirmed transaction:
Step 1: shred arrival via ShredStream
The execution path begins before a transaction confirms. Jito ShredStream delivers block data at the shred level—raw fragments of transactions produced by the slot leader before they assemble into a full block. An agent connected to ShredStream sees incoming swap activity 50–100ms earlier than an agent waiting on Yellowstone gRPC account updates. On a 400ms slot, that head start represents 12–25% of the total window.
Step 2: account state update via Yellowstone gRPC
For strategies that react to confirmed state changes rather than pending transactions, Yellowstone gRPC provides push-based account updates directly from validator memory. Latency from state change to agent notification runs under 50ms on a properly configured dedicated node, versus 100–300ms on standard setups that rely on gossip propagation.
Step 3: opportunity detection and route computation
The signal engine receives the account update, recalculates prices across all monitored pool pairs, and evaluates whether a profitable route exists after fees and the required Jito tip. For simple two-pool atomic arbitrage, this computation completes in microseconds on optimized Rust code. Triangular routes across three pools require slightly more computation but still complete well under 1ms.
Step 4: transaction construction
The agent constructs the transaction: swap instruction sequence, ComputeBudgetProgram.setComputeUnitPrice instruction with a calibrated priority fee, and a Jito tip instruction. The ComputeUnitPrice targets the 75th–90th percentile of recent fees for the same program, calculated from getRecentPrioritizationFees on a live polling loop.
Step 5: bundle submission to Jito Block Engine
The transaction is wrapped in a Jito bundle and submitted to multiple Block Engine regional endpoints in parallel—US East, EU, Tokyo—because the current slot leader's geographic proximity varies across the leader schedule. Parallel submission reduces p99 submission latency by hedging against regional variance. The Jito tip (typically 50–60% of estimated profit) determines bundle priority in the block engine auction.
Step 6: confirmation at processed commitment
The agent monitors confirmation via getSignatureStatuses at processed commitment—not confirmed or finalized, which add 400–800ms of unnecessary latency. On successful execution, the agent logs route, profit, tip paid, and slot delta for tip calibration. On failure, it logs the failure reason and adjusts parameters accordingly.
Where latency appears in the pipeline
Latency is not distributed evenly across the pipeline. Understanding where it actually accumulates helps prioritize optimization effort.
The cumulative difference between a public endpoint and a dedicated colocated node can exceed 300–500ms on the full pipeline. For strategies operating inside a 400ms slot, this difference is not a performance gap—it's the difference between competing and not competing.
Why most Solana AI HFT agents fail
The tooling is public, the strategies are documented, and the infrastructure is accessible. Yet the overwhelming majority of HFT agents on Solana produce no consistent profit. The failure patterns are predictable:
Wrong commitment level
Using confirmed or finalized commitment instead of processed adds 400–800ms of latency to every data update. Processed commitment is the only option that provides data within the current slot. Teams that accept library defaults often miss this entirely.
Public RPC under load
Public endpoints apply rate limits during high-traffic events—exactly when HFT opportunities appear. A token launch, a liquidation cascade, a large market order: all of them spike RPC traffic while creating pricing inefficiencies. An agent that hits a rate limit during a meme coin launch misses every opportunity that launch generates.
Static tip calibration
The Jito bundle auction clears at a price that evolves with competition. A tip calibrated during development and never revisited gradually falls below the clearing price as more sophisticated agents enter. Bundle acceptance rate drops quietly. The agent fires, pays compute fees, and captures nothing.
No ShredStream
Agents relying solely on Yellowstone gRPC miss the pre-confirmation signal window. ShredStream provides 50–100ms of additional lead time—enough to detect an incoming large swap and respond before it confirms. Without it, the agent reacts to events rather than anticipating them.
No colocation
The physical distance between the agent's server and the nearest Solana validator adds latency that no software optimization eliminates. A server in a cloud region 200ms from the nearest validator loses 200ms on every submission. In a 400ms window, that's half the slot, gone before the transaction is even constructed. Colocated bare-metal setups in Frankfurt, London, or NY achieve sub-30ms full-path latency including signing, forwarding, and confirmation tracking.
Public RPC vs dedicated infrastructure for HFT
The performance difference between public shared endpoints and dedicated colocated infrastructure is not incremental—it's structural. Here is what changes at each level:
The transaction landing rate difference under congestion—40–60% on public endpoints versus 90%+ on dedicated infrastructure—is the single most important number for HFT profitability. An agent with a 45% landing rate captures less than half its opportunities regardless of how good the strategy is.
Required infrastructure stack for Solana AI HFT agents
Based on production deployments, here is the complete infrastructure stack that competitive Solana HFT AI agents run in 2026:
Real-world performance comparison (micro case)
The following comparison draws from a reconstructed case profile of a three-person prop trading desk running market-making and arbitrage strategies on Solana. The team ran on a premium shared endpoint for eight months before moving to dedicated infrastructure.
The most significant gain was not average latency but p99 latency under congestion—the metric that determines whether the agent can compete during the high-volatility events that generate the most HFT opportunities. A 92% reduction in worst-case latency effectively moved the agent from the non-competitive tier to the competitive tier for the strategies it runs.
Key takeaways: what actually wins in Solana HFT
After working through the architecture, latency sources, failure modes, and infrastructure comparisons, the competitive advantages in Solana HFT reduce to a clear set of principles:
- ShredStream is not optional for competitive strategies. Sub-slot signal detection requires pre-confirmation data. Without ShredStream, the agent reacts to confirmed events while competitors act on pending ones.
- p99 latency under congestion is the only metric that matters. Average latency looks good on shared endpoints. p99 under load reveals the actual competitive position. Always benchmark during high-traffic windows, not quiet periods.
- Processed commitment, not confirmed. Every millisecond counts. Confirmed and finalized commitment add hundreds of milliseconds for no benefit in atomic strategies that revert on failure anyway.
- Parallel bundle submission across regions. Leader proximity varies with the rotation schedule. Sending to US East, EU, and Tokyo simultaneously hedges this variance and reduces p99 submission latency.
- Dynamic tip calibration is a continuous process. Static tips degrade as competition evolves. Track bundle acceptance rate in real time and adjust the tip relative to observed clearing prices.
- Colocation is physics, not preference. 200ms geographic distance cannot be optimized away in software. Leader-adjacent bare metal is the only path to sub-30ms full-path latency.
- Rust for execution, TypeScript for prototyping. GC pauses on TypeScript runtimes are unpredictable. For any strategy competing at the sub-slot level, Rust is required for the execution engine.
Solana HFT AI agents are not primarily an AI problem. They're an infrastructure problem that AI tools help solve. The decision logic—whether rule-based, ML-driven, or LLM-orchestrated—runs on top of a data and execution pipeline that either delivers the right information at the right time or doesn't. When it doesn't, no AI model compensates for a 300ms pipeline gap in a 400ms slot.
The infrastructure stack described in this article—Jito ShredStream, Yellowstone gRPC, SWQoS-enabled transaction paths, bare-metal colocation, dynamic tip calibration, automated failover—is what production Solana HFT agents actually run in 2026. The teams that compete consistently aren't running smarter models. They're running smarter models on infrastructure that doesn't lose slots, drop bundles, or serve stale state during the moments that determine P&L.

