My First Regular Expressions

harsh3466@lemmy.world · edit-2 1 year ago

My First Regular Expressions

danc4498@lemmy.world · 1 year ago

I relearn regex from scratch every time I need to use it.

Nis@feddit.dk · 1 year ago

This is the way.

Tetsuo@jlai.lu · edit-2 1 year ago

Good job !

I highly recommend trying out the various online regex editor.

These WISIWIG kind of editors are great because you immediately see what the regex is catching and for what reason.

I took the first one in my search results but try different ones.

https://regex101.com/

Also I used GPT to get some regex for some specific strings and it can be helpful to get a quickstart at building a specific regex.

In that case I was building a regex for a specific log from postfix.

PS: just make sure to select the correct flavor of regex you are using in these online tools.

Edit: Also one of my favorite YT channels has pretty cool videos on RegEx : https://youtu.be/6gddK-cOxYc?si=0bnNkSDzifjdxwjU

virku@lemmy.world · 1 year ago

Wait. Are there flavors of regex? Every time I have to use regex it hurts my brain and I never need to do it enough to actually sit down and learn it properly like OP is doing. Just knowing there are different ways of doing the same things in an already mind baffeling language blows me away even more.

remotelove@lemmy.ca · edit-2 1 year ago

Yeah. The only one you really need to care about (especially under Linux) is PCRE, the good 'ol Perl Compatible Regular Expressions. For the most part, every other flavor is a derivative of that. Microsoft had a weird version for a while, but that may be completely dead now, thankfully.

Learning the syntax of regex is fairly easy. Hell, I still have to use this cheat sheet more often now that my perl skills are no longer needed or even relevant.

Regex isn’t that hard. The challenge is identifying and understanding patterns in the data that you are filtering. Here is a brain hack: As an example, if to have pages and pages of logs that you need to filter, open up one of the log files, stare at the screen and hold the page down key for several dozen pages. Patterns can be easily seen in the blur of text that is quickly scrolling across the screen. (Our brains love to find patterns in noise, btw.) The patterns that you see will give you focus points for developing regular expressions to match. ie: You start breaking strings into chunks and seeing the ebb and flow of data streaming across a screen helps. Anomalies in the data “stream” are are easy to spot as well.

From a security and efficiency standpoint, you should also understand where the most processing takes place so you don’t kill whatever platform you are working on.

Sorry for the rambling, but I am getting older and feel the need to pass on a ton of tips and tricks whenever I can for these “archaic” languages.

harsh3466@lemmy.world · 1 year ago

That screen scrolling tip is gold. I’ve often used that trick to spot anomalies in data. Hadn’t considered using it to spot the patterns for regex.

bizdelnick@lemmy.ml · edit-2 1 year ago

The only one you really need to care about (especially under Linux) is PCRE,

Well, no. sed, grep, awk, vi etc. use POSIX regexes. GNU implementations also provide perl compatible mode via an unportable option. In modern programming languages like go and rust standard regex engines are compatible to RE2 - relatively new dialect developed in Google that is not described in the Friedl’s book (you may think of it as an extension of extended POSIX dialect). Even raku has its own dialect incompatible to perl as well as other ones.

Nowadays it is common to move away from perl-like engines, however they are still widely used in PCRE based software and software written in python, JS etc.

remotelove@lemmy.ca · 1 year ago

POSIX? Never heard of her.

While you are likely 100% correct, the legacy perl developer side of me is making nasty comments to you with illegible syntax.

bizdelnick@lemmy.ml · 1 year ago

Perl has introduced powerful backtracking regexes that were widely adopted. However they can be damn slow in some cases, that’s why RE2 refused backtracking while using some perl-like elements. Both basic and extended POSIX regexes are also non-backtracking because they are older than perl.

virku@lemmy.world · 1 year ago

Thanks for the comprehensive reply! I have only used it for quite simple things like getting the id’s out of log lines where this and this key word exist. Great tip about pattern searching!

Merry Christmas

fuckwit_mcbumcrumble@lemmy.world · 1 year ago

Regex101 is amazing. It tends to balk at backtracing which we rely on a lot for work, but it’s such a good visual.

Chat GPT can also save a lot of time writing regex, but it tends to write very unreadable regex because it thinks it’s being clever when it really isnt.

Regex is an art form, and writing readable regex is another step above that.

harsh3466@lemmy.world · 1 year ago

Computerphile! I’ll check those out.

malijaffri@feddit.ch · 1 year ago

Piggybacking onto this to mention my go-to online RegEx editor: RegExr. It lets you test the regex as you type, explains the particular symbols used, as well as has a sidebar where you can see different pattern types categorically. I’ve been using it for almost 2 years now, and haven’t had any reason to use much else (after I discovered this).

harsh3466@lemmy.world · edit-2 1 year ago

Thank you very much. I will definitely check out the regex builders. That’ll be super useful

Edit: fix stupid autocorrect turning regex into Reyes.

bizdelnick@lemmy.ml · edit-2 1 year ago

It is a great book, although a bit outdated. In particular, nowadays egrep is not recommended to use. grep -E is a more portable synonim.

Some notes on you script:

You don’t need to escape slashes in grep regex. In the sed s/// command better use another character like s### so you also can leave slashes unescaped.
You usually don’t need to pipe grep and sed, sed -n with regex address and explicit printing command gives the same result as grep.
You could omit leading slash in your egrep regex, so you won’t need to remove it later.

So I would do the same with

tar -tzvf file.tar.gz | sed -En '/\.(mp4|mkv)$/{s#^.*/##; s#\.\[.*##; s#[^a-zA-Z0-9()&-]# #g; s/ +/ /g; p}'

ShittyBeatlesFCPres@lemmy.world · 1 year ago

Nice! Learning regular expressions is one of those things where it’s absurd but once you do it, you can solve problems that bedevil whole industries.

harsh3466@lemmy.world · 1 year ago

Thanks!

And it still kinda breaks my brain when I look at an expression. When I just look at it it looks like utter gibberish, but when I say to myself, “okay, what’s this doing?”

And go through it character by character, it turns into something I can comprehend.

FaceDeer@kbin.social · 1 year ago

Just to chip in because I haven’t seen it mentioned yet, but I fing LLMs like ChatGPT or Microsoft Copilot are really good at making regexes and also at explaining regexes. So if you’re learning them or just want to get the darned thing to work so you can go to bed those are a good resource.

harsh3466@lemmy.world · 1 year ago

You know, I haven’t yet used ChatGPT for anything, I might check it out for this reason.

spittingimage@lemmy.world · 1 year ago

I use it to tell me which page of the Pathfinder 1e manual I should look on for the rules I need.

Trent@lemmy.ml · 1 year ago

Just adding my congrats. Good job, OP. Regex is super useful stuff.

harsh3466@lemmy.world · 1 year ago

Thank you!!!

davel [he/him]@lemmy.ml · 1 year ago

“regex” means “regular expression”, so “regex expression” means “regular expression expression”.

harsh3466@lemmy.world · 1 year ago

Dang! I read through my post three times to make sure I didn’t do that and completely missed that I did it right in the title. (Now fixed).

prowess2956@kbin.social · 1 year ago

I think the most impressive part of this is that your wife cares.

…does she have a sister?

sab@kbin.social · 1 year ago

I’m currently seeing a girl I started dating after she had problems with her regex and I helped her out.

So far so good.

immibis@social.immibis.com · 1 year ago

@sab @prowess2956 @harsh3466 now you have two problems, but you don’t know it yet

harsh3466@lemmy.world · 1 year ago

She does but, I’d stay away from the sister. 🤣

mindlessLump@lemmy.world · 1 year ago

I’ll have to check out this book. Just remember HTML cannot be parsed with regex

bizdelnick@lemmy.ml · edit-2 1 year ago

Well, technically it is possible with regex dialect that has lookarounds, but it is overcomplicated. There’s really no reason to do it.

Alex@feddit.ro · 1 year ago

Thanks for that link.

Em Adespoton@lemmy.ca · 1 year ago

I highly recommend https://alf.nu/RegexGolf?world=regex&level=r00

harsh3466@lemmy.world · 1 year ago

That looks like a great way to practice

Em Adespoton@lemmy.ca · 1 year ago

It’s definitely a way to get your regex-fu to the next level, especially if you have people to compete against.

harsh3466@lemmy.world · 1 year ago

Oh gosh. There are regex competitions out there, aren’t there.

Em Adespoton@lemmy.ca · 1 year ago

Yup, including for the largest “in production” regular expression….

rustyricotta@lemmy.ml · 1 year ago

I stumbled upon this regex crossword puzzle a while back. I was never good enough to get it, but it seems like it could be fun.

mcepl@lemmy.world · edit-2 1 year ago

Give a man a regular expression and he’ll match a string… teach him to make his own regular expressions and you’ve got a man with problems. – yakugo in http://regex.info/blog/2006-09-15/247#comment-3022 (and yes, it is http:// never https:// for this domain)

harsh3466@lemmy.world · 1 year ago

Guess I’ve got problems!

juli@programming.dev · 1 year ago

That’s cool! Kudos!

My biggest project was to remove leading and trailing whitespaces but I think I failed twice 😅

harsh3466@lemmy.world · 1 year ago

🤣

I went though about 20 iterations to get all of this to work correctly.

NegativeLookBehind@kbin.social · 1 year ago

Why spend 20 minutes manually changing text in a file, when you can spend 90 minutes figuring out a single RegEx to do it?

harsh3466@lemmy.world · 1 year ago

So much truth here.

Deluxe0293 · 1 year ago

this is definitely something to be proud of. great work, keep it up!

harsh3466@lemmy.world · 1 year ago

Thank you! I plan to, this has given me more motivation!

OpenStars@kbin.social · 1 year ago

Regexps are awesome! And also not at the same time:-P. 🎉 Congrats👏!:-)

harsh3466@lemmy.world · 1 year ago

Thank you!