YouTube, Wikipedia, and Archive.org are the modern day library of Alexandria

TriflingToad@sh.itjust.works · edit-2 19 days ago

YouTube, Wikipedia, and Archive.org are the modern day library of Alexandria

grue@lemmy.world · 19 days ago

One of those is not a non-profit foundation, and that’s a Problem.

kamenLady.@lemmy.world · 19 days ago

And that one is not really comparable to the library of Alexandria.

sqw@lemmy.sdf.org · 18 days ago

i was thinking about how much human effort has gone into making instructional videos on how to do things and how all that content exists almost solely in the hands of Alphabet Corp

Tedesche@lemmy.world · 19 days ago

I think it’s a bit ironic that Wikipedia hasn’t succumbed to the modern era of misinformation the way other information sources have, particularly given the warnings about it that have been given in the past. Not saying those warnings aren’t warranted, just that the way things have played out is counter to said expectations.

JubilantJaguar@lemmy.world · 18 days ago

There’s an obvious reason for that. Wikipedia is owned by a nonprofit foundation and does not accept advertising.

Mwa@lemm.ee · 18 days ago

There is people who watch most popular articles,its not rlly misinformation.

Beacon@fedia.io · 19 days ago

Wikipedia essentially can’t be destroyed without a global catastrophe that would mean we have way worse problems. Wikipedia is downloadable. Meaning the ENTIRE Wikipedia. And so there are many copies of it stored all around the planet.

If you have an extra 150 GB of space available then you can download a personal copy for yourself

https://www.howtogeek.com/260023/how-to-download-wikipedia-for-offline-at-your-fingertips-reading/

Chozo@fedia.io · 19 days ago

https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia

It’s under 25 GB, too.

Beacon@fedia.io · 19 days ago

But that’s just for the text version without media files

TriflingToad@sh.itjust.works · 19 days ago

25gb of text is a lot dang!

Beacon@fedia.io · 19 days ago

I assume that contains all the different languages. So most articles will repeat the same information like 10 times or whatever for all the different common languages. Still a huge amount of text though!

macroplastic@sh.itjust.works · edit-2 19 days ago

Nope, 25 gb is just english language wikipedia compressed, no images. All the other languages are smaller.

https://meta.wikimedia.org/wiki/Data_dump_torrents

brbposting@sh.itjust.works · 19 days ago

Ahh compressed so it’s like… a lot times a lot of text

lemmydividebyzero@reddthat.com · 19 days ago

Is that compressed? I assume, they let you download zip files?

lemmydividebyzero@reddthat.com · edit-2 19 days ago

With scraping, you can fully download YouTube, too.

You just need an additional 10 EB of storage space, Millions of different IP addresses, a law firm to deffend against Alphabet, lots of time and energy, …

Beacon@fedia.io · 18 days ago

I think i have an old thumb drive with 10 exabytes free on it

Fermion@feddit.nl · 18 days ago

You left that in your other pajamas, Professor Farnsworth.

The Snark Urge@lemmy.world · 19 days ago

Alexandria was important in its time, but in terms of the volume and quality of information we keep on Wikipedia alone, it is a mosquito in the Taj Mahal.

sit@lemmy.dbzer0.com · 18 days ago

You can’t rely on YouTube videos staying up over time.

Better download what you want might want to look up again

Wade@lemmy.world · 18 days ago

Can’t count on the library of Alexandria staying up over time either

Ellen_musk_ox@lemm.ee · 18 days ago

I think we also overestimate the valve if what would have been at Alexandria.

Considering everything would have been hand copied/transcribed back then, and his expensive that would have been, the selection bias would be massive.

I doubt it could compare to Wikipedia.

einlander@lemmy.world · 19 days ago

Add wiki books https://en.m.wikibooks.org/wiki/Main_Page

Libretexts https://commons.libretexts.org/

And Openstax https://openstax.org/subjects

TriflingToad@sh.itjust.works · 19 days ago

wikibooks is cool, had no idea that existed. I’m sure next time I get curious at 3am I’ll end up there reading about the history of ‘vectors’ or some other random stuff lol

TriflingToad@sh.itjust.works · 19 days ago

There was a video I saw (I think it was hank or John Green), where they talked about the implications of twitter being deleted during the start of Elon. They pulled out a joke book they bought of “1000 twitter posts” and said how it would be the only recorded proof they (personally) had of what twitter was.

It’s terrifying thinking of just how much information is just being put in the hands of companies that don’t care or just on old hard drives about to give out due to funding. I wish there was a way to backup a random part of the information automatically, like a “I’ll give you a terabyte of backup, make the most of it” automatically choosing what isn’t backuped already.

Also add reddit too, the amount of times I’ve searched a question and went through 2024 website crap then went back to the search and added “site:reddit” into DuckDuckGo and got an answer instantly.

UltraGiGaGigantic@lemmy.ml · 18 days ago

The heros at !datahoarder@lemmy.ml got our back y’all.

NickwithaC@lemmy.world · 18 days ago

.ml

ಠ⁠_⁠ಠ

SubArcticTundra@lemmy.ml · 19 days ago

There pught to be a decentralized archive of YT. …and Archive

9point6@lemmy.world · edit-2 18 days ago

The problem with YouTube is the sheer amount of storage required. Just going by the 10 Exabyte figure mentioned elsewhere in the thread, there are about 25,000 fediverse servers across all services in total IIRC, so even if you evenly split that 10EB across all of them, they would still need 400TB each just to cover what we have today.

Famously YouTube needs a petabyte of fresh storage every day, so each of those servers would need to be able to accept an additional 40GB a day.

Realistically though, any kind of decentralised archive wouldn’t start with 25,000 servers, so the operational needs are going to be significantly higher in reality

coronach@lemmy.sdf.org · 18 days ago

I know it’s totally subjective, but I wonder how much “non-trash” YouTube is uploaded each day?

SubArcticTundra@lemmy.ml · 18 days ago

Hmm good point. If this was too be anywhere near realistic, there would need to be a way to triage videos by whether they are actually worth archiving

Possibly linux@lemmy.zip · 18 days ago

I wish that the Internet Archive would focus on allowing the public to store data. Distribute the network over the world.

csm10495@sh.itjust.works · 18 days ago

In theory this could be true. In practice, data would be ripe for poisoning. It’s like the idea of turning every router into a last mile CDN with a 20TB hard drive.

Then you have to think about security and not letting the data change from what was originally given. Idk. I’m sure something is possible, but without a real ‘omph’ nothing big happens.

Possibly linux@lemmy.zip · 17 days ago

The data would be hashed so any changes would be thrown out.

csm10495@sh.itjust.works · 17 days ago

Hashed by whom? Who has the source of truth for the hashes? How would you prevent it from being poisoned? … or are you saying a non-distributed (centralized) hash store?

If centralized: you have a similar problem to IA today. If not centralized: How would you prevent poisoning? If enough distributed nodes say different things, the truth can be lost.

Possibly linux@lemmy.zip · 17 days ago

This is a topic that is pretty well tested. Basically the data is validated when received.

For instance in IPFS data is tracked by its hash. You request something by a CID which is just a hash.

There are other distributed networks and they all have there own ways of protecting against attacks. Usually an attack requires a huge amount of resources.

csm10495@sh.itjust.works · 17 days ago

Even in ipfs, I don’t understand discoverability. Sort of sounds like it still needs a centralized list of metadata to content I’d, etc.

Beacon@fedia.io · 17 days ago

Nah, that’s the easy part. Checksum technology has been around for many decades

https://www.lifewire.com/what-does-checksum-mean-2625825

antonim@lemmy.dbzer0.com · 18 days ago

Huh? The public can store data on IA just fine. I’ve uploaded dozens of public-domain books there.

antonim@lemmy.dbzer0.com · edit-2 18 days ago

If we’re going to stick to ancient Greek references, one of these is closer to the modern day Augean stables.

Kcg@lemmy.ml · 18 days ago

AnnasArchive.org is good at backing up knowledge on a large scale. They also have torrents to spread it around a bit.

MonkderVierte@lemmy.ml · 18 days ago

One of them isn’t like the others.