Webscraping - Return HTML of external page?

joeslindsay · February 1, 2021, 1:45am

Hello,

I’m trying to fetch, parse, and save HTML from an external page.

Fetch URL, Parse URL, Return src from all link elements.

Save all values to database.

Does Bubble have a Fetch URL plugin?

eLPDev · March 28, 2021, 7:40pm

I’m interested in knowing if this is possible as well.

richard10 · March 28, 2021, 8:01pm

I use octoparse for this. Then use parabola to push the data into bubble

AustinAllen · March 28, 2021, 8:26pm

Check out apify, I use it for a lot of scraping uses and when you combine Integromat or Zapier it can work really well with bubble.

Jici · March 28, 2021, 8:41pm

You can use Integromat or also API Connector with Regex. Set API Connector to GET, type to text and after you can use Regex to get a list of url

richard10 · March 30, 2021, 6:29am

The problem with a basic API GET and Regex is if you get blocked, you’re in trouble.

Professional scrapers will rotate IPs etc

boston85719 · June 1, 2023, 6:30pm

I’m currently attempting this with a site that has a strange page refresh for the content to become viewable…is there any way around this? In my API call I get the page HTML returned but none of the content, which is what I am after. Pinging the URL twice by using the call twice in a row doesn’t seem to do anything different.

Jici · June 1, 2023, 7:50pm

It won’t work because this is another request. You may need to use another tool that can add a delay before starting to scrap data