Hi @lindsay_knowcode I found a link a know its working but broken link is reporting “Exists: no”.
I’m using the Check URL Response Codes on a backend workflow and tested with Check URL as well.
The url: https://www.zara.com/br/pt/calca-chino-bainha-com-dobra-p07218986.html?v1=266327115&utm_campaign=productmultishare&utm_medium=mobile_sharing_ios&utm_source=red_social_movil
I’ve also tested on your page: https://broken-link-checker.bubbleapps.io/version-test and got the following results
Thanks for letting me know. What that website is reporting is a 403 response meaning that the webservers at zara.com decided not to let the URL checker get the content of that page.
Websites (content owners) , especially shopping websites (Amazon is a good example) do not like programs accessing their content, ie web scrape, as often people are trying to mimic or somehow share the page content in a way that the content owners don’t want.
Some website (zara.com is an example) will not even throw a 404 (not found) code if a page doesn’t exist - but instead redirect you to a search results page.
There are limits to what the URL checker can do. Some websites try hard to not to be screen scraped. Because this is a programmatic request coming from Bubble’s AWS servers in Oregon US, some web sites will detect that this is a non-human request (screen scraping), and somehow block the request, for example putting up a Captcha page, or giving a non HTTP response of 200.
The plugin tries hard to emulate a real web browser with the appropriate request headers, but can’t change the fact it is coming from AWS in Oregon. This also means that the site being checked will ‘know’ this is a US originated request and may have geography specific content.
Also I’ve removed the Client Side check - it was an experiment that didn’t work. My apologies.
If you feel you have been miss-sold let’s work something out.
Thanks @lindsay_knowcode. That’s fine, the plugin is good I’ll keep using… Maybe I’ll try a backup method when URL checker finds an error - do you have any suggestion? Maybe an API for that?
And for extracting metadata, is it possible to extract other data besides page title and description? Here’s all the meta data I’d like to extract:
The plugin pulls out the meta data from the HTML (eg description, author, title etc etc …) but in terms of scraping the page it doesn’t do that - that’s a different kind of problem. However you’ve give me a idea that it could extract the LD-JSON … which any web content worth it’s salt will publish if it wants Google to index it’s content.