Broken Link Checker Plugin reporting a broken link I know its working

betosg · April 19, 2023, 3:13am

Hi @lindsay_knowcode I found a link a know its working but broken link is reporting “Exists: no”.
I’m using the Check URL Response Codes on a backend workflow and tested with Check URL as well.
The url: https://www.zara.com/br/pt/calca-chino-bainha-com-dobra-p07218986.html?v1=266327115&utm_campaign=productmultishare&utm_medium=mobile_sharing_ios&utm_source=red_social_movil

I’ve also tested on your page: https://broken-link-checker.bubbleapps.io/version-test and got the following results

check HTTP response codes using your page → Exists: no

CleanShot 2023-04-19 at 00.04.12@2x2024×1404 333 KB
check using client side action → Exists: yes

CleanShot 2023-04-19 at 00.08.40@2x2110×1486 381 KB

Please help.
Thanks,
Roberto

lindsay_knowcode · April 19, 2023, 10:31am

Thanks for letting me know. What that website is reporting is a 403 response meaning that the webservers at zara.com decided not to let the URL checker get the content of that page.

Websites (content owners) , especially shopping websites (Amazon is a good example) do not like programs accessing their content, ie web scrape, as often people are trying to mimic or somehow share the page content in a way that the content owners don’t want.

Some website (zara.com is an example) will not even throw a 404 (not found) code if a page doesn’t exist - but instead redirect you to a search results page.

There are limits to what the URL checker can do. Some websites try hard to not to be screen scraped. Because this is a programmatic request coming from Bubble’s AWS servers in Oregon US, some web sites will detect that this is a non-human request (screen scraping), and somehow block the request, for example putting up a Captcha page, or giving a non HTTP response of 200.

The plugin tries hard to emulate a real web browser with the appropriate request headers, but can’t change the fact it is coming from AWS in Oregon. This also means that the site being checked will ‘know’ this is a US originated request and may have geography specific content.

Also I’ve removed the Client Side check - it was an experiment that didn’t work. My apologies.

If you feel you have been miss-sold let’s work something out.

betosg · April 19, 2023, 2:59pm

Thanks @lindsay_knowcode. That’s fine, the plugin is good I’ll keep using… Maybe I’ll try a backup method when URL checker finds an error - do you have any suggestion? Maybe an API for that?

And for extracting metadata, is it possible to extract other data besides page title and description? Here’s all the meta data I’d like to extract:

Title
Description
Headline
Keywords
Lang
Price
Product
Sku
Author
Publisher
Logo
Image
Video
Currency
Feed
Date

Thanks
Roberto

lindsay_knowcode · April 19, 2023, 6:55pm

The plugin pulls out the meta data from the HTML (eg description, author, title etc etc …) but in terms of scraping the page it doesn’t do that - that’s a different kind of problem. However you’ve give me a idea that it could extract the LD-JSON … which any web content worth it’s salt will publish if it wants Google to index it’s content.

mustrunet · March 24, 2024, 5:32pm

Such plugins are usually slow and can overload your server. For such tasks, it’s better to use cloud solutions like: https://www.deadbrokenlinkchecker.com/

Topic		Replies	Views
Links suddenly not working causing 404 errors Database	1	528	April 26, 2021
Link Preview plugin error Plugins	11	2366	September 9, 2021
Zeroqode Link Preview - Link Info Plugin is Broken! Plugins	2	305	April 30, 2021
A link validator and metadata extractor plugin or service that actually works reliably? Plugins	0	235	January 28, 2023
Test List of URLs for 404 Need help	6	678	September 20, 2017

Broken Link Checker Plugin reporting a broken link I know its working

Related topics