Is this possible through Bubble’s built-in features or through plugins (I have not come across such an plugins)? or does it require an API? something like crawling my be? any ideas?
I am looking for a way can work through the workflow
Since different websites store them in different places there’s not a one size fix all.
Easiest way to do it at scale is probably to find the robots.txt file, extract the text in the workflow. There’s usually a sitemap link in there. Then use GET url call to retrieve the sitemap itself.
Search console also has a tool that could help but you need the sitemap URL.
Another option that’s not as robust and won’t work as widely is check the common locations like
/sitemap.xml
I actually looking for a one size fix all solution, by crawling my be or something like this
Scraping robots.txt is going to be the route you want to go then. It’s the same route search engines use to locate the sitemap.
You’ll have to scrape robots.txt first. Extract the URL of sitemap then scrape that.
If there is a way to scrape websites that can be used in Workflow that would be great