Regex - how to extract a list of items from a numbered list

Any Regex expert here who can help with this?
Thanks!

EXAMPLE TEXT:

aaa
aaaaaa

  1. bbb
  2. ccc
    cccccc

LIST OF 3 ITEMS:

aaa
aaaaaa

bbb

ccc
cccccc

What are you asking? Your example text has no explanation…it is understood you want to extract values, but how are you wanting to extract them? What is the original text and what do you want the output to be?

It looks kind of like you might want to simply remove the numbers and the period and space…

So, for example:

  1. aaa

Should be extracted as

aaa

Is that what you are trying to achieve?

Yes, the text is a numbered list and I want to extract items from it, sans the numbering.

  • We can assume each numbering starts on a new line and is followed by a full stop (e.g. ā€œ1.ā€, ā€œ2.ā€, etc.).
  • An item may have 1 or multiple lines

Example:

  1. aaa
  2. bbb
    bbbbbb

Should be extracted as a list of 2 items:

aaa

bbb
bbbbbb

So there is the possibility that not every line will have a number and a full stop in front of it?

Are these situations of multiple lines on a single item such that it is clear the line that is missing a number and full stop is ā€˜item #’ of the original item number (ie: bbbbbb is item number two of item number two of the original list)?

Do the entries themselves ever have a number or a full stop or are they always text characters only?

Basically, what I’m trying to do is split the text, with the delimiters as an integer on a new line that is followed by a full stop, i.e. ā€œ\nN.ā€ where \n is a new line and N is an integer.

Yes, not every line will start with a number and full stop. In this case, there’s no need to split the text here.

There can be multiple lines in a single item, so there’s no need to split the text.
e.g.

1.aaa
aaaaaa

should just return as one item:

aaa
aaaaaa

There is no nested list, only 1 list of items.

The items themselves may contain numbers and full stops, but they would not be split here because the preceding character is not a new line.

Sorry if I’m making it sound more complicated than it is, but intuitively what I’m trying to do is simply this - split a numbered list into a list of items. It would have been pretty straightforward in code, but I guess regex is my only option here? Let me know if there are other options I’m missing out on. Thanks!

There are other operators available such as :split by, which would let you split the list into chunks based on a delimiter (not sure if you could simply use the full stop and a space because that might still result in the numbers as part of the list)

Other operators like :find and replace if you know already the number of items in the list, so you would know the highest number to be removed, and therefore all preceeding numbers as well, you could setup to use find and replace but that would be more work than simply using a regex.

I am not a regex expert, nor a intermediate regex user, but there are various sites that allow you to construct regex patterns for your needs.

Anytime I am in need of regex I search the forum first to see if somebody has already posted a working example, if not, I post in the forum to see if somebody can share a working example, and also search google looking for examples…I’ve been able to, mostly through Google searches, find some working examples for various needs I’ve had.

I think if you google search you will likely find something on the wider web that would work.

1 Like

Thanks for the reply! Will try to look deeper for it.

This topic was automatically closed after 70 days. New replies are no longer allowed.