Reposted from c/politics since it violated their rule about needing to have a link:
Now that the fascists have taken over, what books, academic studies, and pieces of knowledge should take priority in personal/private archival? I’m thinking about what happened in Nazi Germany, especially with the burning of the Institute for Sexual Science(Institut für Sexualwissenschaft) and what was lost completely in the burnings.
Some of us should consider saving stuff digitally or physically. Redundancies will help preserve stuff.
Samesies…
It’s very difficult though, sourcing material is difficult enough, archiving and making it actually useful and valuable even moreso. It takes a lot of intelligent processing.
LLMs can reduce that effort a lot, on the searching side, but that’s very expensive either in hardware or in API costs. And either way, would likely involve the efforts of a team to achieve.
Honestly I think the hardest part is identifying it and locating it.
Probably need to start a community around it. People link to stuff. Pay walls would have to be dealt with. In the vast majority of cases that’s not too difficult. No it does start getting less available around video sources. IP restrictions will be a pain in the ass.
Recording the video is messy. YT-DLP could work for some things, But honestly with the level of what’s coming I’d be afraid of leaving fingerprints on anything. Probably throwing a full screen player up in a 1080p window and using OBS on it would be the safest. Could probably get away with using an elgato to capture the HDMI signal up to 4K.
Speech to text models are light enough to run on raspberry pi. They’ll need to be vetted. They’re not highly accurate. Captioning is a great community task.
Organizing an indexing the captions, there’s no shortage of free database software. I probably start with sqlight to keep things portable and fluid. Moving to Maria or Postgres when things get too slow, But then we’re going to have to host it. Anonymous hosting is a completely different ball of wax.
Storing the data would get out of hand quickly. It’s trivial enough to buy a single 20 tb hard drive and store more than we’d need for years. But then hosting it anonymously would be difficult to say the least. Even the markers of these conversations would be traceable enough for us to be located. Paying for a private enough nude to be safe is going to be pricey over time.
I’m sure archive.org would take it, But honestly I wouldn’t put $5 on them surviving a couple of years into the new administration. What they’re doing is to inconvenient to too many corporations with deep pockets.
IPFS would work, well about as well as it works anyway, but that’s the opposite of anonymous.
Edit: come to think of it it would be a hoot to run it on the short video federated platform. Just keep the database somewhere else. Again not anonymous enough for my tastes, But what I had a little bit of fun to the project.