I am playing with blockspring to scrape the websites.
The problem I am facing is that many apis return arrays of text, usually sorted by column. The blockspring plugin allows to choose a column as a list to enter into Bubble, but there seems to be no easy way to tokenize text within one entry of this list (which is just a text string). So I get the string that has an answer but I need some piece if it.
Is there anything to do a simple tokenization of text? A full regex interface is not needed, its enough to be able to cut out space-separated words. I know it can be done by piping apis, but it seems like an overkill for a simple use.
For example, can text be treated as a list of words using a modifier :#?
We don’t have this, but we could add that I guess. You could also use a custom block for this if you want something more custom, it’s literally one line of code. If you want us to add this natively, email us and we’ll chat.
I sort of have a way to solve this problem by switching to connector of import.io (it allows for regexp to be permanently attached). The wrinkle on that is that there is a bug in the blockspring with using the connector block of import.io and blockspring devs are not replying to e-mails.
So if you have a better idea, I am willing to use it for now.
But more generally, I would think that sentence to word list splitting is a common function so I would expect bubble to be able to handle it natively.
Adding my voice to this: i needed something like this a while back but simply abandoned my ideal workflow because this string manipulation wasn’t available. In my case I wanted to extract a string based on the first instance of a certain character in the text, something like ‘truncate to’ except delimited by a specific character rather than number of characters. A more generic tokeniser would have been nice.
Many of my apps require scraping web pages. I’ll work more in Bubble if this capability is added.
Just for your information. Efficient web scraping requires only a few commands.
1- Place the raw text into a field: viewer
2- search the viewer field for a the first occurrence of a string, returning: position x
3- delete character 1 to position x of viewer
4- then grab required field, like position x + 6 for a length of 5.
Depending on how many fields I need to grab this process is repeated over and over again. Since the raw data contains SO much unnecessary data, the trick is finding unique strings to search for. Places that will get you close to the data field you need. Often it takes two iterations of the above process to localize and grab the exact field needed.
Oh… one last thing. When developing such a scrap routine, (which many grab anywhere from 1 to a half dozen fields), the main viewer field is a visible text field, that I can actually watch step by step as my routine is chopping away. Then when the scrape routine is done, I just convert the on screen text box (viewer ) to an off screen variable.