What’s your take on parquet?
I’m still reading into it. Why is it closely related to apache? Does inly apache push it? Meaning, if apache drops it, there’d be no interest from others to push it further?
It’s published under apache hadoop license. It is a permissive license. Is there a drawback to the license?
Do you use it? When?
I assume for sharing small data, csv is sufficient. Also, I assume csv is more accessible than parquet.
Yeah depends on what you’re using it for. CSV is terrible in many many ways but it is widely supported and much less complex.
I would guess if you’re considering Parquet then your use case is probably one where you should use it.
JSON is another option, but I would only use it if you can guarantee that you’ll never have more than like 100MB of data. Large JSON files are extremely painful.
since the data is tabular, JSONL works better than JSON for large files