• 7fb2adfb45bafcc01c80@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    3 hours ago

    Again, isn’t that the site’s prerogative?

    I think there should at least be a recognized way to opt-out that archive.org actually follows. For years they told people to put

    User-agent: ia_archiver
    Disallow:
    ``` /
    
    in robots.txt, but they still archived content from those sites.  They refuse to publish what IP addresses they pull content down from, but that would be a trivial thing to do.  They refuse to use a UserAgent that you can filter on.  
    
    If you want to be a library, be open and honest about it.  There's no need to sneak around.