As I was browsing lemmy and the fediverse at large, this question kept popping into my head.
Since multimedia files have a much bigger footprint than raw text, it made me feel worried since as time goes, massive resources will be needed to keep up with the big data coming in.
I do wonder if the instances have taken the route of the cloud and just decided to put all of it in something like AWS S3? Or maybe they use self hosted storage with something like minio for object storage?
Edit: I am partially wrong. (See below)
They’re stored on their host Instance. Only text is copied across instances.
That is not true. As long as a user on your instance is subscribed to a community, the media content of posts [Edit: only posts linking to outside sources, e.g imgur] of that community is stored locally on your instance as well.
This, of course, only applies to media which is uploaded to Lemmy, links to media hosted externally are not downloaded.
See this issue for more context.
Edit: I want to clarify that I was partially wrong - Lemmy only locally caches content which is hosted on outside sites. It does (should?) not cache content that was directly uploaded to a Lemmy instance and just embeds the source media.
I think this could be a ticking DOS time bomb.
Someone manages to spam upload massive files to the largest Lemmy instances could wipe out a ton of smaller ones.
Not to mention scalability wise this seems like a nightmare… eventually the largest Lemmy instances will have petabytes of media data with 100s of gbs coming in per day, giving other instances no chance to sync with them.
I think the system architecture needs a significant review. This won’t scale.