A comprehensive guide to the dangers of Regular Expressions in JavaScript

philnash@programming.dev · 1 year ago

A comprehensive guide to the dangers of Regular Expressions in JavaScript

Alexstarfire@lemmy.world · 1 year ago

Am I the only one shocked to learn that to find something at the end of a string it starts at the beginning? Perhaps it’s because of the simplicity of the example but I expected it to start at the end.

cgtjsiwy@programming.dev · 1 year ago

Regular expressions are great and can always be matched in linear time with respect to the input string length.

The problem is that JS standard library RegExps aren’t actually regular expressions, but rather a much broader language, which is impossible to implement efficiently. If RegExp switched to proper regular expressions, they would match much faster but supporting backreferences like /(.*)x\1/ would be impossible.

jeffhykin@lemm.ee · 1 year ago

This is why we need regex licenses https://regexlicensing.org/

/s

philnash@programming.dev · 1 year ago

That’s brilliant!

recursive_recursion [they/them]@programming.dev · edit-2 1 year ago

Although I haven’t fully read this article
feel free to crosspost in:

Programming.dev - Regex

philnash@programming.dev · 1 year ago

Ah, I didn’t realise there was a regex channel here. Thanks!

sebsch@discuss.tchncs.de · 1 year ago

Is there one thing not screwed up in this language? I mean it’s regex, there are so many good implementations for it.

philnash@programming.dev · 1 year ago

JavaScript’s regex engine isn’t the only one to have these problems. There certainly are other implementations, like Re2 and Rust’s implementation, that don’t have this issue. But they also lack some of the features of the JS implementation too.

sebsch@discuss.tchncs.de · 1 year ago

Ok thanks for the clarification.

I would argue, the gold standard of regex would be perlre or even re from python. I never heard one discouraging using them. Do you know sth I don’t?

burntsushi@programming.dev · 1 year ago

Both Perl and Python use backtracking regex engines and are thus susceptible to similar problems as discussed in the OP.