You have it backwards. Zip files have a table of contents letting you unzip indi...

sheetjs · on March 18, 2018

The table of contents in a zip file is at the end, not the beginning, so you need random access to the full file if you want to properly jump to the right location. Some zip files use the length feature of local file headers so you can jump ahead in the file without processing the data, but many skip the length so you have to work through the DEFLATE algorithm to find the end of the first file (to find where the second file starts)

skybrian · on March 18, 2018

Yes, you can and should use random file access to load things from a zip file. For example, Go's zip package takes an io.ReaderAt interface, not an io.Reader. Using sequential I/O for this would be inefficient.

jlouis · on March 18, 2018

The problem here is that you mostly have a streaming interface In s3.

skybrian · on March 18, 2018

I'm not familiar with s3, but it seems like if you want random access for a key-value database, you should store each file under a separate key? Storing an entire file archive as a blob (in any format) and downloading the whole thing just to access one file seems like a weird way to do it.

Apparently there is "s3 select" in preview [1], but it only supports gzipped CSV or JSON.

[1] https://aws.amazon.com/blogs/aws/s3-glacier-select/

etaioinshrdlu · on March 18, 2018

S3 supports reading ranges of bytes. It might not always be a great idea for performance.

fulafel · on March 18, 2018

But this does not work for streaming since the directory is at the end. So you have to temporarily store the whole zip file.

skybrian · on March 18, 2018

Yes, you need random-access file I/O.

But if the zip file isn't stored locally, sending the entire thing over the network to access one subfile is very inefficient. Instead you'd want a network protocol where you request the files you actually want and the remote server just sends those.