Back to blog

We built a Solana RPC that does 500k requests per second on one box

How light-indexer replaces a full validator, history backend, and DAS service with a single Rust binary.

T

Taimoor

Thursday, April 16, 20265 min read

If you've tried running Solana infrastructure, you already know the pain. A Yellowstone gRPC node degrades past 100 connections. Standard RPC needs a 768 GB validator that still gets throttled under moderate load. And nobody gives you raw decoded market data without charging enterprise pricing for it.

We got tired of working around these limits. light-indexer is a single Rust binary that replaces the validator, the RPC node, the history backend, and the DAS service. One process, one box, 35 JSON-RPC methods.

The pipeline

Everything runs in one OS process. A Yellowstone gRPC stream comes in, gets decoded and classified, then written to tiered storage. A broadcast channel keeps in-memory caches in sync. The RPC server reads from caches first, falls back to disk when needed.

System architecture

If the gRPC endpoint goes down, we automatically cycle to the next one. If all endpoints die, we recover from local snapshot files. No RPC calls back to the network. The process never hangs because every stage has timeouts on connect, subscribe, and stream.

Tiered storage

Data lives in three places, each with a different job.

RocksDB handles the hot path. Seven column families: slots, transactions, signatures-for-address, current account state, program indexes, token owner ATAs, and mint top holders. The first three get pruned on a configurable schedule (default two days for txs, seven for signatures). Account state is never pruned.

ClickHouse gets everything permanently via dual-write. Every block the ingester processes goes to both RocksDB and ClickHouse. When a query misses in RocksDB because the data has aged past retention, ClickHouse answers it. The materialized views (gsfa_mv, gsfa_hot_mv, sig_status_mv) keep historical signature lookups fast.

PostgreSQL handles tokens, DAS tables, and a few fallback paths. It's not on the hot path for any of the high-throughput methods.

Recent queries resolve sub-millisecond from RocksDB. Older stuff hits ClickHouse, 1-10ms. The client never knows which tier answered.

Caching

Seven caches, all invalidated via the same broadcast channel. Two matter most:

encoded_account_lru stores pre-serialized JSON responses as Arc<Box<RawValue>>. When getAccountInfo hits this cache, jsonrpsee emits the bytes verbatim to the wire. No serialization, no deep clone. Just a refcount bump. This single optimization took gAI from 7.3k rps to 509k.

sig_index is a DashMap of recent transaction signatures to their slot and position. getTransaction checks this before touching RocksDB. If the sig is in the last ~64 slots, the disk lookup is skipped entirely.

All caches are 32-way sharded with pubkey[0] & 31. No single mutex bottleneck under high concurrency.

The numbers

Measured with wrk -t8 -c128 on a 48-core box with NVMe (bm82). These are sustained throughput, not burst:

MethodRPSAvg latency
getSlot502k261 µs
getAccountInfo (USDC, 165 bytes)509k253 µs
getAccountInfo (Token program, 17KB)128k695 µs
gSFA (100 signatures)142k940 µs

Single-shot latency with keep-alive: p50 94 µs for getAccountInfo. Sub-100 microseconds.

We use SO_REUSEPORT with 32 listeners (one per CPU), mimalloc as the global allocator, TCP_NODELAY, and singleflight coalescing on getAccountInfo and getBlock. The RPC server is axum + jsonrpsee with tower-http compression.

getTransactionsForAddress

On standard Solana RPC, getting all transactions for a wallet means calling getSignaturesForAddress, then getTransaction for each result, then doing the same for every associated token account, then stitching the results together with pagination and retry logic. Hundreds of calls. Seconds of latency. Per wallet.

getTransactionsForAddress

We replaced that with one call. getTransactionsForAddress scans the owner_atas column family in RocksDB to find every ATA the wallet owns, runs parallel signature lookups across all of them via spawn_blocking, sort-merges by slot descending, and hydrates from tx_index.

9,775 rps. p99 of 9.4 ms. 32x faster than the previous Postgres-based implementation. One client call instead of hundreds.

Triton announced getTransactionsForAddress as a future feature in April 2026. We already ship it.

How queries resolve

Each RPC method has a defined path through the storage tiers. The pattern is the same: check the fastest layer first, fall back on miss.

Read-path routing

getSlot and getLatestBlockhash read from atomic values in MemoryCache. They never touch disk.

getAccountInfo checks the encoded response cache, then the account LRU, then RocksDB, then Postgres as a last resort.

getTransaction goes through sig_index, tx_cache, RocksDB, then ClickHouse for anything pruned past retention.

getSignaturesForAddress does a RocksDB prefix scan on sfa_index (bincode-encoded, early termination). ClickHouse's gsfa materialized view fills in older history.

getBlock checks a sharded block cache, then MemoryCache for recent slots, then reads compressed block files from disk.

What's next

35 methods ship today. A handful of niche ones (getInflationReward, getRecentPrioritizationFees) are still missing. ClickHouse historical backfill via Jetstreamer isn't wired yet. Weeks of work, not months.

We haven't decided whether to open-source it. The code lives at sodaeio/light-indexer-private for now.

Discord / X