Remove HTML from text before saving to DB

Hi All,

I’m receiving some text data from an API call but it comes with tons of HTML included in the results. I’m trying to figure out a way to extract only the text data, while removing the HTML, before saving it into the bubble database.

For example I would want to change this:

Into this:
This is a LEFT HANDED Iron Set…

I haven’t been able to find a way to do this inside Bubble, and I’ve looked for a 3rd party service I can connect to via API to remove HTML but have come up short.

I’ve read that many folks use something called Jsoup to perform this action. However, I’m not familiar with Java and am not sure how I could actually implement this within Bubble. Any thoughts or ideas would be greatly appreciated!

Hey,

I got really bored so I put this together for you. All you need is regex. Please review the editor for the general regex theory. It would take too long to explain regex, so you will have to learn it, but the building blocks are there.

If you have any specific questions, I will do my best, but I will be unavailable after 11am EST tomorrow, for a day or two.

Good luck!

Regex tester / learning site: https://regex101.com/

2 Likes

Troy,

Appreciate you giving this a go! I took a look at your demo site and was able to get the basic gist of what you are doing with regex (that 101 site you provided was very helpful as well - thank you).

This works great for the example I provided, but as the HTML values and tags change with each result, it throws off the regex logic. I did some additional searching on the interwebz, and it does look like some folks have created some regex expressions that handle a number of html tags/values (although it still isn’t foolproof). Figured I would include it here in case anyone else has a similar need.

Find and Replace: <(?:“[^”]“['”]|‘[^’]‘[’"]|[^'">])+>

–Picture version below since copy and pasted text seems to hide the asterisks–
image

In my use of it so far, I’ve found that it tends to squish independent sentences together (the next sentence begins right after the previous period with no spaces). As a fix to that issue, I also did a second find and replace for any periods and replaced them with a period+space.

1 Like

Good stuff!

Yeah you can do all kinds of things, just depends how much you want to work on it.

Are you all fixed up then, or are you still not 100%?

This actually works for about 95% of items. Anything that falls through the cracks, I’ll just change manually. Thanks!

@troy.roberge are you free to talk me through your solution Can’t seem to figure it out.