As I was browsing lemmy and the fediverse at large, this question kept popping into my head.

Since multimedia files have a much bigger footprint than raw text, it made me feel worried since as time goes, massive resources will be needed to keep up with the big data coming in.

I do wonder if the instances have taken the route of the cloud and just decided to put all of it in something like AWS S3? Or maybe they use self hosted storage with something like minio for object storage?

  • laenurd@lemmy.lemist.de
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    1 year ago

    This will differ greatly from instance to instance. The people running lemmy.world have published some info on their infrastructure. My instance is running on a rather small VPS with 100GB storage, but I will have to rethink my solution rather soon as images and videos from my subbed communities [Edit: which are stored on outside sites] are eating around a gigabyte per day and I think this is likely to increase.

    Edit: I want to clarify that I was partially wrong - Lemmy only locally caches content which is hosted on outside sites (e.g. imgur). It does not cache content that was directly uploaded to another Lemmy instance and just embeds the source media.

    • Blurker@programming.devOP
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 year ago

      Thank you for your work for the community! I think with more people using lemmy, we should also as users lookout for the infra we are using because the admins are not a mega corporation ready to spin up infinite resources.

      • laenurd@lemmy.lemist.de
        link
        fedilink
        English
        arrow-up
        0
        ·
        1 year ago

        No need to thank me, currently I am the only non-bot-user of my instance and do not allow registrations 😅

        Many of the bigger instances have links to donate to their operators, but I am doubtful that relying solely on donations will be enough in the long run.

        • Nat@apollo.town
          link
          fedilink
          English
          arrow-up
          0
          arrow-down
          1
          ·
          1 year ago

          Since you’re the only one, you might consider setting an expiration on the media so your local storage serves as more of a cache. Like, I’m sure you’re far more likely to revisit a recent thread than a super old one, and as long as the original instance is still around you could redownload the media. This might require software patches though idk

    • Blamemeta@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      arrow-down
      3
      ·
      1 year ago

      Maybe have users use an outside image provider, like imgchest or gfycat or whatever?

  • RCMaehl [Any]@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    1 year ago

    Edit: I am partially wrong. (See below)

    They’re stored on their host Instance. Only text is copied across instances.

    • laenurd@lemmy.lemist.de
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      1 year ago

      That is not true. As long as a user on your instance is subscribed to a community, the media content of posts [Edit: only posts linking to outside sources, e.g imgur] of that community is stored locally on your instance as well.

      This, of course, only applies to media which is uploaded to Lemmy, links to media hosted externally are not downloaded.

      See this issue for more context.

      Edit: I want to clarify that I was partially wrong - Lemmy only locally caches content which is hosted on outside sites. It does (should?) not cache content that was directly uploaded to a Lemmy instance and just embeds the source media.

      • Trifictional@lemmy.ca
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        I think this could be a ticking DOS time bomb.

        Someone manages to spam upload massive files to the largest Lemmy instances could wipe out a ton of smaller ones.

        Not to mention scalability wise this seems like a nightmare… eventually the largest Lemmy instances will have petabytes of media data with 100s of gbs coming in per day, giving other instances no chance to sync with them.

        I think the system architecture needs a significant review. This won’t scale.