I built an open source self-hosted web application designed to make archiving to S3 Deep Archive simpler and more accessible.

Madman200@alien.top · 1 year ago

I built an open source self-hosted web application designed to make archiving to S3 Deep Archive simpler and more accessible.

patronus6285@alien.top · 1 year ago

Have you thought about using dynamodb to store a catalog of the data?

Madman200@alien.top · 1 year ago

Currently the catalog is a simple H2 database, but using dynamo could be interesting.

I would personally be more inclined to update the application configuration to allow hosting of your own relation database, so you could use the default H2, run a DB in the cloud, or self host something.

Supporting dynamoDB would be difficult because the data layer currently is designed to be relational, although you could definitely argue that a relational DB is overkill for the simplicity of my schemas. Either way tho, switching to noSQL would be a significant refactor

Fair-Equivalent-8651@alien.top · 1 year ago

AWS is not a consumer tool, it’s an enterprise tool. I work with AWS in my day to day, so I had no problem quickly hacking together some scripts to upload my data, even if it wasn’t the most elegant setup. But some of the comments on my original post were saying that they would be interested in S3, but didn’t know where to start. Uploading in bulk to glacier requires at a bare minimum, familiarity with the CLI, but more likely it requires coding knowledge and familiarity with the AWS SDK.

Someone chime in if I’m mistaken, but I know platforms like TrueNAS and QNAP QTS (and presumably the OSes from Synology, Asustor, and TerraMaster) have built-in functionality for automatically uploading folders or datasets to B2 / S3 / etc. At least on TrueNAS and QNAP, you just give it credentials, point to a bucket, and go.

It’s great that you’re doing this but are you focusing more on people trying to DIY a solution? Or are you saying what you’ve put together is better than what TrueNAS et al have built in?

lightweaver@alien.top · 1 year ago

/u/Madman200 have you tried using the 1TB monthly free egress that cloudfront offers to handle downloading the exports?

I’ve been experimenting with using Deep Archive myself, and I suspect that if I:

Restore a Deep Archive object
Copy that object to a new S3 bucket
Set up a cloudfront distribution with an S3 origin
Download the object through the Cloudfront distribution

the download would consume the “Always Free” 1TB bandwidth instead of being considered normal data egress.

I’m pretty sure 1TB out to a single IP address is an unintentional use, but Cloudfront + S3 looks like a normal CDN-type use to me.

I’m just waiting for a restore to complete before trying this and seeing what shows up on my AWS bill.

patronus6285@alien.top · 1 year ago

I’m surprised this hasn’t been modded down more since it seems like it’s forbidden to mention anything other than backblaze on this subreddit.