Webscraping - Return HTML of external page?

Hello,

I’m trying to fetch, parse, and save HTML from an external page.

Fetch URL, Parse URL, Return src from all link elements.

Save all values to database.

Does Bubble have a Fetch URL plugin?

I’m interested in knowing if this is possible as well.

I use octoparse for this. Then use parabola to push the data into bubble

Check out apify, I use it for a lot of scraping uses and when you combine Integromat or Zapier it can work really well with bubble.

You can use Integromat or also API Connector with Regex. Set API Connector to GET, type to text and after you can use Regex to get a list of url

The problem with a basic API GET and Regex is if you get blocked, you’re in trouble.

Professional scrapers will rotate IPs etc

1 Like

I’m currently attempting this with a site that has a strange page refresh for the content to become viewable…is there any way around this? In my API call I get the page HTML returned but none of the content, which is what I am after. Pinging the URL twice by using the call twice in a row doesn’t seem to do anything different.

It won’t work because this is another request. You may need to use another tool that can add a delay before starting to scrap data

1 Like