Hello everyone, what I’m trying to do is not simple (for me anyway)
Quickly, the context. I have a client who asked me to develop an application that uses a scraping API to output a CSV with all the data.
It would have been so simple if the API we used didn’t export 50 by 50 data. In the end, I managed to run the API in a loop until the maximum number of pages of data for the API in question was reached.
But my question now is: how do I extract an Excel or a CSV, or even a JSON from all that? Knowing that the API scrapes 60 different pieces of information (Name, address, email, if it is open…etc etc…). Basically the API can extract 50 rows and 60 columns (if we think in terms of an Excel table).
I’ll show you a little bit of the setup:
The user presses the blue button and it launches the process:
- The Blue button is therefore clicked, this creates a command (which I called download)
- It programs a workflow API which includes all the information necessary to operate the external scraping API. And it also shows the ID of the order that I just created so that I can refer to it later and download the holy grail (the CSV)
on these last two workflows I try things:
- I’m trying to schedule a workflow API that only focuses on CSV downloading but it doesn’t work.
- on the last workflow I try to create a CSV from a JSON (it’s a plug-in that allows me to do it, but I don’t know how to go about it. The data that the external scrapping api are not Json… well maybe but I don’t know how to use it. What works in this case is to directly download a CSV with the “download data as CSV” action but as I As I said, it only allows me to make a CSV with a maximum of 50 data. So I would have to compile these CSVs afterwards. Indeed, if I have 1000 it’s not practical so I have to find a way to compile give them upstream and then make a CSV.
SO ! here is my Workflow backend:
When I click on the blue button, it creates, as we have seen, a command that I called “download”, which schedules the workflow api “A” at the current date/time:
it creates a page of 50 pieces of data, all containing 60 pieces of information (in an Excel table, that’s roughly 50 lines and 60 columns. So you follow me.
it changes my “Download”, therefore my command, indeed, I integrate my excel command into my download. My "Download has a field: list of excel commands
So as we can see I added to my “Download” which I had created by clicking on the blue button, an Excel page of 50 lines and 60 columns which I recovered using the famous scrapping API. Great ! and now I Schedule the workflow api “B” which is just used to restart the workflow api “A” when a condition is respected, this condition is this: The scrapping must no longer include additional pages:
You see in the “only when”: it is clearly marked the condition that I have just written to you. And so when this is verified, it restarts the workflow api “A” and so on.
And that’s just to understand. When I have no more pages, I stop. And that’s where I’m stuck!!
What to do ?
How do I retrieve my 10, 20, 30 “excel commands” which are in my “Download” in CSV form:
See in my download, there is a list of “commandexcel”: This is what I want to transform into CSV. (don’t pay attention to the rest, I’m trying things. I don’t know if anyone has the answer and has already done it. It would save my life. The ideal would be to not have to fill in by hand in JSON the 60 scraping information to recover.
In short I need an EXCEL or a CSV of all my data combined in my “DOWNLOAD”:
See all these excel commands, of which we only see the ID, these are excels waiting to be combined. If I may say so. But for now it’s only a list of characters in my “Download” type
Whoever finds the answer for me is a genius <3