Any good options to prune out similar-but-variable bits of text from a large document?

Sterile_Technique@lemmy.world · edit-2 2 years ago

Any good options to prune out similar-but-variable bits of text from a large document?

LANA_DEL_KARENINA@lemmy.world · 2 years ago

The good news: there is a tool built to solve this exact problem: regular expressions (aka regex)

The bad news: regular expressions are famously frustrating to read and write

Depending on how badly you want the problem solved and how patient you are, using online resources to craft some regular expressions would be the ticket

Sterile_Technique@lemmy.world · 2 years ago

hmmm “famously frustrating”, presumably to people who know what they’re doing, very likely translates to “WAY outside of my skill level”. Worth some digging though, especially now that I have a keyword! Thank you!!

BearOfaTime@lemm.ee · 2 years ago

There are regex tutorials online, and you can test your regex there.

I’d say, since you’re learning, this could be an opportunity that may be useful later.

Just start with one relatively simple thing, like maybe copyright stuff. Work on getting regex to match that properly throughout a doc, and enjoy the improvement. Then when ready, tackle the next thing.

Sterile_Technique@lemmy.world · edit-2 1 year ago

I wish I had asked this sooner. I don’t know really any code at all, but this might be the thing that pushes me to learn some. This looks crazy useful. Time is the enemy right now though - I’ve only got a few free evenings left before class starts, and I don’t trust that I’d know it well enough not to shoot myself in the foot.

When the next break rolls around though, I think regex will be my project. Any foundation you’d recommend learning first? From the bit of searching I’ve done, regex seems to feed straight into conversations about Python or Java - I don’t know any of that. Would it even make sense to try to learn regex without first knowing the basics of a coding language?

I did manage to fine-tune MS Word’s find and replace commands… I’ve got a list of 10 or so find-and-replace searches that does close-enough-for-now to what I want it to do.

otter@lemmy.dbzer0.com · 2 years ago

IMHO, this is one of the applications wherein “AI” in its current form can really shine. Even the low monthly cost of GPT could be worth it if only to be able to train your own bot on specifics for your own use-case. Hell, there might even be one already made that’s close enough? If you’d like me to give a quick look, LMK. 🖖🏽