how do you expect an archive to happen if they are not allowed to archive while it is still up. How are you suposed to track changed or see how the world has shifted. This is a very narrow and in my opinion selfish way to view the world
A couple of good examples are lifehacker.com and lifehack.org. Both sites used to have excellent content. The sites are still up and running, but the first one has turned into a collection of listicles and the second is an ad for an “AI-powered life coach”. All of that old content is gone and is only accessible through the Internet Archive.
In fact, many domains never shut down, they just change owners or change direction.
I think there should at least be a recognized way to opt-out that archive.org actually follows. For years they told people to put
User-agent: ia_archiverDisallow:
in robots.txt, but they still archived content from those sites. They refuse to publish what IP addresses they pull content down from, but that would be a trivial thing to do. They refuse to use a UserAgent that you can filter on.
If you want to be a library, be open and honest about it. There’s no need to sneak around.
how do you expect an archive to happen if they are not allowed to archive while it is still up. How are you suposed to track changed or see how the world has shifted. This is a very narrow and in my opinion selfish way to view the world
I don’t want them publishing their archive while it’s up. If they archive but don’t republish while the site exists then there’s less damage.
I support the concept of archiving and screenshotting. I have my own linkwarden server set up and I use it all the time.
But I don’t republish anything that I archive because that dilutes the value of the original creator.
A couple of good examples are lifehacker.com and lifehack.org. Both sites used to have excellent content. The sites are still up and running, but the first one has turned into a collection of listicles and the second is an ad for an “AI-powered life coach”. All of that old content is gone and is only accessible through the Internet Archive.
In fact, many domains never shut down, they just change owners or change direction.
Again, isn’t that the site’s prerogative?
I think there should at least be a recognized way to opt-out that archive.org actually follows. For years they told people to put
User-agent: ia_archiver Disallow:
in robots.txt, but they still archived content from those sites. They refuse to publish what IP addresses they pull content down from, but that would be a trivial thing to do. They refuse to use a UserAgent that you can filter on.
If you want to be a library, be open and honest about it. There’s no need to sneak around.