Regex to get all the h1, h2,h3,h4 on an html

I am working on a simple scraper.
I have managed to do the workflow on scraping the entire article in HTML format from a URL
However, I am not interested in the entire article and I just want to take only the subheadings like h1, h2, h3, h4 etc…

and arrange the subheadings maybe in a repeating group like this:

will handle some more cases. It will return the tag as well.

Usually what you want is better done with an html parser.

do you know an html parser that I can use for this use case?

I know that htmlparser2 it’s a very fast library used as a starting point by many projects.

Are you using any scraping API service to scrape the articles, or have you created your own scraper?

