Solana AI agents for HFT: architecture, latency, and execution path

Written by:

Maksym Bogdan

12

min read

Date:

April 16, 2026

Updated on:

April 16, 2026

In 2025, MEV revenue on Solana reached $720.1 million for the full year, overtaking priority fees for the first time as the largest component of the network's real economic value. Over 3 billion Jito bundles were processed in that same period, generating 3.75 million SOL in tips. This is the dominant economic layer of Solana trading.

The conversation around AI agents in crypto often focuses on language models, autonomous decision-making, and strategy design. Those things matter. But in the context of Solana HFT, none of them matter if the execution pipeline underneath isn't fast enough to act on the decisions those models make.

This article covers how AI HFT agents are actually built on Solana in 2026—the architecture, the data layer, the execution path from signal to block inclusion, and where latency appears at each stage. We also cover why most agents fail, what dedicated infrastructure changes, and what the performance numbers look like in practice.

What high-frequency trading means on Solana

High-frequency trading in traditional finance operates on microsecond or nanosecond timescales, colocated at exchange matching engines, using FPGA hardware and kernel-bypass networking. On Solana, the timescales are different but the competitive dynamic is identical: whoever gets there first captures the value.

Solana's defining constraint for HFT is the 400ms slot. Every block closes in approximately 400 milliseconds. An opportunity that appears at the start of a slot—a mispriced pool, an undercollateralized loan, a large swap creating an arbitrage gap—exists until the slot closes or a competing transaction corrects it. The HFT agent's goal is to detect, compute, and land a transaction within that window, ahead of all competing agents.

The competitive pressure is severe. According to RPC Fast's internal benchmarks, over 70% of sniper bots target sub-50ms RPC latency for transaction sends, yet only about 10% deliver steady profits—because infrastructure kills their edge before strategy gets a chance.

Solana's slot architecture means that even a 50ms latency advantage can be the difference between first and last in the same block. Unlike traditional HFT where microseconds matter for queue position, on Solana the unit that matters is the slot—400ms—and being inside or outside it is binary.

What a Solana HFT AI agent is

A Solana HFT AI agent is an autonomous system that observes on-chain state, applies decision logic to identify profitable opportunities, constructs transactions, and submits them—all within a sub-slot timeframe, without human intervention.

The "AI" component can mean several different things depending on the strategy:

  • Rule-based ML models: trained classifiers that score incoming account updates for opportunity probability. Fast inference, predictable behavior, easy to tune.
  • Reinforcement learning agents: models that learn optimal bidding and timing behavior through simulated and live feedback. Used for tip calibration and strategy parameter optimization.
  • LLM-based orchestration: language model reasoning for higher-level strategy decisions—position sizing, risk exposure, multi-step trade planning. Runs on a slower loop than the execution layer.
  • Hybrid architectures: the most common production setup. An ML model handles microsecond signal classification; an LLM handles strategy-level decisions on a slower cycle; a deterministic execution engine handles transaction construction and submission.

The distinction matters because different components have different latency requirements. The signal detection and execution engine must operate in sub-10ms. The LLM reasoning layer can operate on a 1–5 second cycle without affecting execution speed, as long as the outputs feed a fast execution pipeline.

High-level architecture of a Solana AI HFT agent

A production Solana HFT AI agent consists of five interconnected components. Each has specific performance requirements, and failure in any one of them breaks the whole system.

Component Function Latency requirement Technology
Data ingestion layer Receives real-time account and transaction updates <50ms from state change Yellowstone gRPC, Jito ShredStream
Signal processing engine Classifies incoming data, detects opportunities <5ms per update Rust, ML model inference
Decision logic Computes trade parameters, validates profitability <2ms for simple routes Rust / compiled strategy logic
Transaction construction Builds signed tx with correct compute budget and tip <1ms solana-sdk, Rust
Submission layer Broadcasts to validators via multiple paths in parallel <25ms to leader Jito Block Engine, SWQoS paths

The data ingestion and signal processing layers run continuously. The decision and construction layers activate only when the signal layer flags an opportunity. The submission layer fires immediately after construction completes.

Most production agents separate the data ingestion component from the execution component—running them on different threads or even different processes—to prevent I/O overhead from blocking execution logic.

End-to-end execution path: from data to block inclusion

Understanding the full execution path reveals where time is spent and where optimizations have the most impact. Here is the complete flow from on-chain event to confirmed transaction:

Step 1: shred arrival via ShredStream

The execution path begins before a transaction confirms. Jito ShredStream delivers block data at the shred level—raw fragments of transactions produced by the slot leader before they assemble into a full block. An agent connected to ShredStream sees incoming swap activity 50–100ms earlier than an agent waiting on Yellowstone gRPC account updates. On a 400ms slot, that head start represents 12–25% of the total window.

Step 2: account state update via Yellowstone gRPC

For strategies that react to confirmed state changes rather than pending transactions, Yellowstone gRPC provides push-based account updates directly from validator memory. Latency from state change to agent notification runs under 50ms on a properly configured dedicated node, versus 100–300ms on standard setups that rely on gossip propagation.

Step 3: opportunity detection and route computation

The signal engine receives the account update, recalculates prices across all monitored pool pairs, and evaluates whether a profitable route exists after fees and the required Jito tip. For simple two-pool atomic arbitrage, this computation completes in microseconds on optimized Rust code. Triangular routes across three pools require slightly more computation but still complete well under 1ms.

Step 4: transaction construction

The agent constructs the transaction: swap instruction sequence, ComputeBudgetProgram.setComputeUnitPrice instruction with a calibrated priority fee, and a Jito tip instruction. The ComputeUnitPrice targets the 75th–90th percentile of recent fees for the same program, calculated from getRecentPrioritizationFees on a live polling loop.

Step 5: bundle submission to Jito Block Engine

The transaction is wrapped in a Jito bundle and submitted to multiple Block Engine regional endpoints in parallel—US East, EU, Tokyo—because the current slot leader's geographic proximity varies across the leader schedule. Parallel submission reduces p99 submission latency by hedging against regional variance. The Jito tip (typically 50–60% of estimated profit) determines bundle priority in the block engine auction.

Step 6: confirmation at processed commitment

The agent monitors confirmation via getSignatureStatuses at processed commitment—not confirmed or finalized, which add 400–800ms of unnecessary latency. On successful execution, the agent logs route, profit, tip paid, and slot delta for tip calibration. On failure, it logs the failure reason and adjusts parameters accordingly.

Where latency appears in the pipeline

Latency is not distributed evenly across the pipeline. Understanding where it actually accumulates helps prioritize optimization effort.

Pipeline stage Public endpoint latency Dedicated colocated node Latency source
Shred arrival (ShredStream) N/A—not available 0–50ms from leader Gossip hop count; ShredStream bypasses this
Account update (gRPC) 100–300ms 10–50ms Standard nodes rely on gossip propagation; dedicated nodes receive directly
Opportunity computation N/A—code dependent <1–5ms Algorithm complexity; Rust vs TypeScript matters here
Transaction construction N/A—code dependent <1ms Signing overhead; key management architecture
sendTransaction to leader 100–400ms 5–25ms Network hops; physical distance to current leader
Jito bundle confirmation Variable / unreliable Predictable, <400ms Tip calibration; block engine proximity
Failover on node drop Manual / minutes <50ms automated Infrastructure monitoring and rerouting

The cumulative difference between a public endpoint and a dedicated colocated node can exceed 300–500ms on the full pipeline. For strategies operating inside a 400ms slot, this difference is not a performance gap—it's the difference between competing and not competing.

Why most Solana AI HFT agents fail

The tooling is public, the strategies are documented, and the infrastructure is accessible. Yet the overwhelming majority of HFT agents on Solana produce no consistent profit. The failure patterns are predictable:

Wrong commitment level

Using confirmed or finalized commitment instead of processed adds 400–800ms of latency to every data update. Processed commitment is the only option that provides data within the current slot. Teams that accept library defaults often miss this entirely.

Public RPC under load

Public endpoints apply rate limits during high-traffic events—exactly when HFT opportunities appear. A token launch, a liquidation cascade, a large market order: all of them spike RPC traffic while creating pricing inefficiencies. An agent that hits a rate limit during a meme coin launch misses every opportunity that launch generates.

Static tip calibration

The Jito bundle auction clears at a price that evolves with competition. A tip calibrated during development and never revisited gradually falls below the clearing price as more sophisticated agents enter. Bundle acceptance rate drops quietly. The agent fires, pays compute fees, and captures nothing.

No ShredStream

Agents relying solely on Yellowstone gRPC miss the pre-confirmation signal window. ShredStream provides 50–100ms of additional lead time—enough to detect an incoming large swap and respond before it confirms. Without it, the agent reacts to events rather than anticipating them.

No colocation

The physical distance between the agent's server and the nearest Solana validator adds latency that no software optimization eliminates. A server in a cloud region 200ms from the nearest validator loses 200ms on every submission. In a 400ms window, that's half the slot, gone before the transaction is even constructed. Colocated bare-metal setups in Frankfurt, London, or NY achieve sub-30ms full-path latency including signing, forwarding, and confirmation tracking.

Public RPC vs dedicated infrastructure for HFT

The performance difference between public shared endpoints and dedicated colocated infrastructure is not incremental—it's structural. Here is what changes at each level:

Factor Public shared RPC Managed SaaS RPC Dedicated bare-metal (colocated)
Latency (p50) 50–150ms 20–80ms 4–25ms
Latency (p99) 500ms–2s under load 100–400ms <50ms
Rate limits Applied during congestion Tier-based, may throttle None on dedicated plans
ShredStream Not available Add-on, extra cost Included by default
SWQoS paths Not available Available on premium tiers Native, validator-peered
Yellowstone gRPC Not available Available on higher tiers Included, filtered streams
Failover None / manual Automated, 100–500ms Automated, <50ms
Noisy neighbor effect Severe Moderate None—isolated hardware
Transaction landing rate (congestion) 40–60% 65–80% 90–99%+

The transaction landing rate difference under congestion—40–60% on public endpoints versus 90%+ on dedicated infrastructure—is the single most important number for HFT profitability. An agent with a 45% landing rate captures less than half its opportunities regardless of how good the strategy is.

Required infrastructure stack for Solana AI HFT agents

Based on production deployments, here is the complete infrastructure stack that competitive Solana HFT AI agents run in 2026:

Layer Component Specification / notes
Hardware Bare-metal server AMD EPYC TURIN 9005 or GENOA 9354, 512GB–1.5TB DDR5 RAM, enterprise NVMe (Samsung PM9A3 or equivalent)
Location Colocation near validators Frankfurt FR5, London, NY5 (Equinix, OVH, Latitude, TeraSwitch). Sub-30ms to nearest leader cluster.
Data feed (pre-confirm) Jito ShredStream Direct shred delivery from leader. 50–100ms faster than gRPC. Default on RPC Fast dedicated nodes.
Data feed (account state) Yellowstone gRPC Filtered push streams for monitored accounts. <50ms from state change. Eliminates polling overhead.
Supplementary feed bloxRoute OFR (optional) 30–50ms improvement vs default propagation. Best used as redundancy alongside Jito.
Transaction submission Jito Block Engine (bundles) Atomic execution, tip-based priority auction. Parallel submission to US East, EU, Tokyo endpoints.
Transaction submission bloxRoute Trading API 83% first-block hit rate via SWQoS (bloxRoute / RPC Fast partnership data).
Priority routing SWQoS-enabled paths Staked validator peering. Transactions enter reserved bandwidth lane; not competing on public 20% pool.
Tip calibration getRecentPrioritizationFees loop Real-time fee oracle. Target 75th–90th percentile for target programs. Adjust based on bundle acceptance rate.
Execution language Rust (production) No GC pauses. Compiled binary. Full memory control. TypeScript acceptable for prototyping only.
Monitoring Grafana + Prometheus Slot lag, landing rate, bundle acceptance rate, account update latency. Real-time dashboards.
Failover Automated <50ms rerouting Pre-synced spare nodes. Node swap time ~15 minutes with zero strategy downtime.

Real-world performance comparison (micro case)

The following comparison draws from a reconstructed case profile of a three-person prop trading desk running market-making and arbitrage strategies on Solana. The team ran on a premium shared endpoint for eight months before moving to dedicated infrastructure.

Metric Shared premium endpoint Dedicated colocated node (RPC Fast) Change
p50 account update latency ~80ms ~18ms -78%
p99 account update latency ~600ms during load ~45ms -92%
Transaction landing rate (normal) ~72% ~96% +24pp
Transaction landing rate (congestion) ~43% ~91% +48pp
Bundle acceptance rate ~55% ~89% +34pp
Node downtime (monthly) ~3.5 hours ~0 minutes Automated failover
Engineering hours on infra incidents ~8h/week ~0.5h/week -94%
The most significant gain was not average latency but p99 latency under congestion—the metric that determines whether the agent can compete during the high-volatility events that generate the most HFT opportunities. A 92% reduction in worst-case latency effectively moved the agent from the non-competitive tier to the competitive tier for the strategies it runs.

Key takeaways: what actually wins in Solana HFT

After working through the architecture, latency sources, failure modes, and infrastructure comparisons, the competitive advantages in Solana HFT reduce to a clear set of principles:

  • ShredStream is not optional for competitive strategies. Sub-slot signal detection requires pre-confirmation data. Without ShredStream, the agent reacts to confirmed events while competitors act on pending ones.
  • p99 latency under congestion is the only metric that matters. Average latency looks good on shared endpoints. p99 under load reveals the actual competitive position. Always benchmark during high-traffic windows, not quiet periods.
  • Processed commitment, not confirmed. Every millisecond counts. Confirmed and finalized commitment add hundreds of milliseconds for no benefit in atomic strategies that revert on failure anyway.
  • Parallel bundle submission across regions. Leader proximity varies with the rotation schedule. Sending to US East, EU, and Tokyo simultaneously hedges this variance and reduces p99 submission latency.
  • Dynamic tip calibration is a continuous process. Static tips degrade as competition evolves. Track bundle acceptance rate in real time and adjust the tip relative to observed clearing prices.
  • Colocation is physics, not preference. 200ms geographic distance cannot be optimized away in software. Leader-adjacent bare metal is the only path to sub-30ms full-path latency.
  • Rust for execution, TypeScript for prototyping. GC pauses on TypeScript runtimes are unpredictable. For any strategy competing at the sub-slot level, Rust is required for the execution engine.

Solana HFT AI agents are not primarily an AI problem. They're an infrastructure problem that AI tools help solve. The decision logic—whether rule-based, ML-driven, or LLM-orchestrated—runs on top of a data and execution pipeline that either delivers the right information at the right time or doesn't. When it doesn't, no AI model compensates for a 300ms pipeline gap in a 400ms slot.

The infrastructure stack described in this article—Jito ShredStream, Yellowstone gRPC, SWQoS-enabled transaction paths, bare-metal colocation, dynamic tip calibration, automated failover—is what production Solana HFT agents actually run in 2026. The teams that compete consistently aren't running smarter models. They're running smarter models on infrastructure that doesn't lose slots, drop bundles, or serve stale state during the moments that determine P&L.

Building an HFT AI agent on Solana?

RPC Fast provides dedicated bare-metal nodes colocated with Solana validators, Jito ShredStream by default, Yellowstone gRPC with filtered streams, SWQoS transaction paths, and automated failover under 50ms. Whether you're at SaaS scale today or evaluating dedicated infrastructure for production HFT, the team at RPC Fast has configured over 100 trading bots on Solana and can review your current execution pipeline.

Get a free infrastructure review → rpcfast.com

Table of Content

Need help with Web3 infrastructure?

Drop a line
More articles

Market insights

All

Written by:

Olha Diachuk

Date:

06 Apr 26

2

min read

Guide

All

Written by:

Maksym Bogdan

Date:

03 Apr 26

9

min read

We use cookies to personalize your experience