OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling’s Harry Potter series::A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.

  • Technoguyfication@lemmy.ml
    link
    fedilink
    English
    arrow-up
    35
    arrow-down
    15
    ·
    1 year ago

    People are acting like ChatGPT is storing the entire Harry Potter series in its neural net somewhere. It’s not storing or reproducing text in a 1:1 manner from the original material. Certain material, like very popular books, has likely been interpreted tens of thousands of times due to how many times it was reposted online (and therefore how many times it appeared in the training data).

    Just because it can recite certain passages almost perfectly doesn’t mean it’s redistributing copyrighted books. How many quotes do you know perfectly from books you’ve read before? I would guess quite a few. LLMs are doing the same thing, but on mega steroids with a nearly limitless capacity for information retention.

    • abbotsbury@lemmy.world
      link
      fedilink
      English
      arrow-up
      17
      arrow-down
      8
      ·
      1 year ago

      but on mega steroids with a nearly limitless capacity for information retention.

      That sounds like redistributing copyrighted books

    • Hup!@lemmy.world
      link
      fedilink
      English
      arrow-up
      15
      arrow-down
      7
      ·
      edit-2
      1 year ago

      Nope people are just acting like ChatGPT is making commercial use of the content. Knowing a quote from a book isn’t copyright infringement. Selling that quote is. Also it doesn’t need to be content stored 1:1 somewhere to be infringement. That misses the point. If you’re making money of a synopsis you wrote based on imperfect memory and in your own words it’s still copyright infringment until you sign a licensing agreement with JK.

      Now mull that over and tell us what you think about modern copyright laws.

      • Ronath@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        ·
        1 year ago

        Just adding, that, outside of Rowling, who I believe has a different contract than most authors due to the expanded Wizarding World and Pottermore, most authors themselves cannot quote their own novels online because that would be publishing part of the novel digitally and that’s a right they’ve sold to their publisher. The publisher usually ignores this as it creates hype for the work, but authors are careful not to abuse it.

        • Corkyskog@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          1
          ·
          1 year ago

          Yeah I don’t see how that’s true. If that were true wouldn’t every board walk tee shirt shop be sued into oblivion from Nickelodeon over Sponge Bob?

    • Teritz@feddit.de
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      18
      ·
      1 year ago

      Using Copyrighted Work as Art as example still influences the AI which their make Profit from.

      If they use my Works then they need to pay thats it.

      • coheedcollapse@lemmy.world
        link
        fedilink
        English
        arrow-up
        38
        arrow-down
        9
        ·
        1 year ago

        Still kinda blows my mind how like the most socialist people I know (fellow artists) turned super capitalist the second a tool showed like an inkling of potential to impact their bottom line.

        Personally, I’m happy to have my work scraped and permutated by systems that are open to the public. My biggest enemy isn’t the existence of software scraping an open internet, it’s the huge companies who see it as a way to cut us out of the picture.

        If we go all copyright crazy on the models for looking at stuff we’ve already posted openly on the internet, the only companies with access to the tools will be those who already control huge amounts of data.

        I mean, for real, it’s just mind-blowing seeing the entire artistic community pretty much go full-blown “Metallica with the RIAA” after decades of making the “you wouldn’t download a car” joke.

        • Sir_Kevin@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          16
          arrow-down
          6
          ·
          1 year ago

          Fuckin preach! I feel like I’m surrounded by children that didn’t live through the many other technologies that have came along and changed things. People lost their shit when photoshop became mainstream, when music started using samples, etc. AI is here to stay. These same people are probably listening to autotuned music all day while they complain on the internet about AI looking at their art.

        • angstylittlecatboy@reddthat.com
          link
          fedilink
          English
          arrow-up
          11
          arrow-down
          5
          ·
          edit-2
          1 year ago

          I feel like a lot of internet people (not even just socialists) go from seeing copyright as at best a compromise that allows the arts to have value under capitalism to treating it like a holy doctrine when the subject of LLMs comes up.

          Like, people who will say “piracy is always okay” will also say “ban AI, period” (and misrepresent organizations that want regulations on it’s use as wanting a full ban.)

          Like, growing up with an internet full of technically illegal content (or grey area at best) like fangames and YouTube Poops made me a lifelong copyright skeptic. It’s outright confusing to me when people take copyright as seriously as this.

        • dx1@lemmy.world
          link
          fedilink
          English
          arrow-up
          8
          arrow-down
          8
          ·
          edit-2
          1 year ago

          Nobody would defend copyright if it wasn’t already in place, it’s a sick idea. They ask us to cut the field of human knowledge for private benefit. Now they want to destroy a new technology in its name. Greed knows no bounds.

          • Hildegarde@lemmy.world
            link
            fedilink
            English
            arrow-up
            10
            arrow-down
            2
            ·
            1 year ago

            I defend the idea of copyright. The first copyright law was in 1710, to protect authors from the printing press. Without copyright, whoever owned the printing press would sell copies of books with no obligation to pay the author. When copying art is trivial, the artist needs copyright protection in order to make a living creating art.

            There are major problems with modern copyrights. Like all things in capitalism it has been subverted to benefit the rich, but the core idea behind copyright is sound.

            These lawsuits are not to stop the development if generative AI. These lawsuits are to stop the unlicensed use of copyrighted works as AI training data.

            There are AI models that are only trained with licensed data. This doesn’t stop the development of AI.

            Artists should have the right to choose whether their work is used as training data. And they should be compensated fairly for it. That will be the case if these lawsuits succeed.

          • voluble@lemmy.world
            link
            fedilink
            English
            arrow-up
            6
            ·
            1 year ago

            Nobody would defend copyright if it wasn’t already in place

            I don’t know about that. Say you take a few years to write a handful of poems, and it turns out people in your neighborhood really like them. You compile the poems into a book, and sell it for $5, and it sells well. Seeing this, your neighbor buys one, copies it, and starts selling it one neighborhood over for $2, and representing themself as the author. I would think most people in that situation would want to say, ‘hey, that’s not fair’. I don’t think that’s sick or rooted in greed, copyright can be a check on greed.

          • assassin_aragorn@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            ·
            edit-2
            1 year ago

            So the people who generate and curate that knowledge don’t deserve to be compensated? Are you going to be a full time wikipedia editor then? Or does your “greed know no bounds”?

          • BURN@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            arrow-down
            1
            ·
            1 year ago

            I defend copyright. The original intent was to protect creators in order to foster more creativity. Most artists will have no incentive to create if their work can be reappropriated by a larger group to leverage it for monetary gain, which is directly being taken from the original creator.

            I’m a photographer. I’ve removed all my pictures from the internet and plan to never post more. I don’t want my work being used to train AI. Right now we have no choice in that matter, so the only option is to no longer share our work.

        • Teritz@feddit.de
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          4
          ·
          1 year ago

          As a Civilian Pirating is no Problem but if its a Company that behaves like they own their Neural Network to 100%.

          Piracy is gonna live as long Services are Bad for Average Joe,but these US Corps can afford to pay for this.