They typically load all data in memory, so you still need persistence to handle crashes (two setups). And since data is typically huge you need servers with lots of expensive RAM.
It's still loaded from a file, but heavily uses memory-mapping and caching to be speedy and not overload your RAM immediately. And in production scenarios, multiple worker processes can share that memory due to the memory mapping.
Granted it's read-only, so might not be exactly what you are looking for.
How about a vector oriented 'database' instead? Pinecone(https://www.pinecone.io/) does both exact and approx search and it's fully managed so you don't have to worry about reliability,availability etc.