Regex pattern that excludes whitespaces

andi · April 13, 2024, 7:41am

Hello I’m using this regex pattern to extract this text succesfully:
Regex Pattern: tailored_bullet_point_1(.+?)explanation_1

Text: tailored_bullet_point_1": "Collaborated seamlessly across teams, ensuring detailed integration of academic and social-emotional aspects into comprehensive learning plans, showcasing @strong communication@ and commitment to personalized experiences for stakeholders.",\n "explanation_1\

There are 4 white spaces after the \n at the end of the text right before "explanation_1. The number of white spaces can vary but they’re always in the same place (between the line break and the "explanation_1.

I’ve been experimenting and haven’t been able to figure out how to adapt the regex pattern to exclude whitespaces after the \n and before the "explanation_1\

Any suggestion on what to try are greatly appreciated!!!

vladimir.pak · April 13, 2024, 8:28am

try
tailored_bullet_point_1(.+?)[\x20\xa0]*explanation_1
Include other white space chars, if they might be present in your texts.

andi · April 13, 2024, 9:01am

Thank you but my goal is to EXCLUDE the white spaces.

I’m trying variations of tailored_bullet_point_1(.+?)\s*explanation_1
where \s* : Matches zero or more whitespace characters (spaces, tabs, newlines). When used after a capturing group, it excludes any whitespace immediately after the content captured by that group from being included in the match.
So I’m not sure why it’s not working

vladimir.pak · April 13, 2024, 9:50am

It works well, consider explaining your setup in more details.

andi · April 13, 2024, 8:02pm

Hello Vladimir, thanks for you willingness to help out. I think this image explains it better. I have the regex code you graciously provided applied and I’m still getting the white spaces that I want to exclude (see green underline):

The whitespaces are after the \n which I’ve tried multiple ways to take into consideration in the patterns which capture the group in tools like regexr but do not exclude whitespace between the capture group that always ends with “.” and “explanation_1”

Here’s the original text as returned by the API
tailored_bullet_point_1": "Collaborated effectively across teams to enhance stakeholder experiences through seamless integration of academic, social-emotional, and logistical aspects, showcasing exceptional @communication@ skills and attention to personalized learning plans.",\n "explanation_1

Here are the patterns that I think should work but is not:
tailored_bullet_point_1(.+?)\s*[\s\S]*?explanation_1

Here’s the Chat GPT interpretation of the code which is exactly what I’m looking for and again it captures the group successfully in regexr:
In this modified pattern:

[\s\S]*? matches any whitespace or non-whitespace character (including line breaks) zero or more times, lazily.
This ensures that any whitespace characters, including line breaks, between the end of the capture group and “explanation_1” are included in the match, and will be effectively eliminated.

vladimir.pak · April 14, 2024, 9:51am

Extract with regex extracts the portion that matches the whole expression, not the group inside it, so you need to use positive look-ahead and look-behind assertions.
And AFAIK (I could be wrong here), newline modifier is enabled by default, so you might need to transform original text escaping newlines, then extract and de-escape back.

vladimir.pak · April 14, 2024, 10:16am

This one works within the Extract with Regex with your text, but I still didn’t get whether I should decode \n and replace it with a new line or it’s part of the text as it is. I chose the former case.
Note that if there’s line break within the value text Collaborated ... plans., the pattern will fail. I do remember there’s some trick like I described in my post above, but I don’t remember the cause… It looks like not about newlines modifier, but I forgot what it is. Address bubble docs about this operator or maybe there’re some related post here in the forum.

P.S.: got it, use ((.|\s)+?) instead of simply (.+?) as dotAll flag is not enabled by default, so you finally get: (?<=^\s*tailored_bullet_point_1)((.|\s)+?)(?=\s*\"explanation_1\s*$)

andi · April 15, 2024, 5:52am

Vladimir! Thank you very much. This was extremely helpful to me and I learned a lot. Your a champion! Thank you!!!

vladimir.pak · April 15, 2024, 8:27pm

No problem, you’re welcome.

Topic		Replies	Views
Is Extract with Regex still working? Bugs	5	1763	November 27, 2018
Remove all characters after a new line Need help	3	1012	November 17, 2017
Regex to extract only non-whitespace characters from an input field App Organization	2	691	May 23, 2023
Extract with Regex pattern Need help	5	2697	May 7, 2019
Regex - capture string and the next line Need help	6	2553	July 24, 2018

Regex pattern that excludes whitespaces

Related topics