Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

You have it backwards. Zip files have a table of contents letting you unzip individual files without decompressing the whole thing, and you don't need to load it all into memory.


The table of contents in a zip file is at the end, not the beginning, so you need random access to the full file if you want to properly jump to the right location. Some zip files use the length feature of local file headers so you can jump ahead in the file without processing the data, but many skip the length so you have to work through the DEFLATE algorithm to find the end of the first file (to find where the second file starts)


Yes, you can and should use random file access to load things from a zip file. For example, Go's zip package takes an io.ReaderAt interface, not an io.Reader. Using sequential I/O for this would be inefficient.


The problem here is that you mostly have a streaming interface In s3.


I'm not familiar with s3, but it seems like if you want random access for a key-value database, you should store each file under a separate key? Storing an entire file archive as a blob (in any format) and downloading the whole thing just to access one file seems like a weird way to do it.

Apparently there is "s3 select" in preview [1], but it only supports gzipped CSV or JSON.

[1] https://aws.amazon.com/blogs/aws/s3-glacier-select/


S3 supports reading ranges of bytes. It might not always be a great idea for performance.


But this does not work for streaming since the directory is at the end. So you have to temporarily store the whole zip file.


Yes, you need random-access file I/O.

But if the zip file isn't stored locally, sending the entire thing over the network to access one subfile is very inefficient. Instead you'd want a network protocol where you request the files you actually want and the remote server just sends those.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: