Best way to return 200+ items through the data API?

In my app, I have a datatype called “fruits”

I want an external website to fetch on a daily basis a list of fruits. This list would always be 200+ items.

At first, I built a backend API workflow, using “return data from API”, and “do a search for”, but I’ve noticed that the response is limited to 50 items.

I’ve then explored the data API with pagination, but since my datatype “fruits” needs to be fully public in my app, I’ve noticed anyone can call my endpoint and get the data, which is fine, but I am concerned about attacks ? How can I prevent abusers from calling this endpoint 1000x per day, making my WUs costs skyrocket ?

What is the best approach to make this work ?

Thank you for your help :folded_hands:

API Tokens

But I have noticed that anyone can call my data API endpoint, Even without an authorization header.

Yes, without an authorization header, anybody would be able to access. If you use an authorization header, only calls with correct authorization headers would have access.

Not true

Data API is accessible to anyone and they can access all data their privacy rules permit.

@thibautranger can’t you use return data from a backend workflow, but instead of returning the actual Bubble things, return JSON and format the things appropriately? I’d be surprised if there’s a limit there…

The cheapest and safest method is to handle your request through middleware like Cloudflare workers. Firstly you can add bearer tokens for auth from the external site. Secondly any calls to a Cloudflare worker can be routed through a CF domain proxy that has DDOS protection. On top of that you can set up rate limits.

It’s an extra step but if you are concerned about security then Cloudflare is your best bet. If your traffic isn’t high the free Worker plan will more than suffice. You’ll need to pony up for a CF managed domain but you can buy some cheap domain just for this.

1 Like

Would you mind explaining in a bit more details the solution you’re imagining ?

I am currently doing this, returning data from API and setting content-type as structured JSON.

With this setup, list is limited to 50.

Set the content-type to custom and use application/json

So authorization headers or api tokens in bubble api do nothing?

In the data API, it affects what data you can access, not whether you can call the endpoint.

In the workflow API/backend workflows, it does the same if ignore privacy rules is unchecked. If the barack end workflow requires authentication, it means the user must be logged in, or an admin API token is required.

Okay, so the Admin API Token is needed in the Authorization Header I believe based on my testing and what I am seeing, but I may just not understand the security aspect very well.

If I have an API call set to take data from my apps database, in the settings I have each data type in my database available to be checked on or off basically, so as to make them accessible or not. When I try to initialize the API call with the data type unchecked, so as to not be available, whether or not I send in an Authorization Header with a Admin API token, I get the following error from Bubble.

But if I do check the box for that data type, so as to make it accessible, and I try to initialize my api call without an authorization header present, I get the following successful initialization.

But if I do check the box for that data type, so as to make it accessible, and I try to initialize my api call with an authorization header present, but not a valid Admin API Token, I get the following error message from Bubble.

In both situations above with the data type api endpoint enabled, I did not have Privacy Rules on the data type. So once I add Privacy Rules that allow if the User Type is Admin to find in searches but everybody else can not, then I get the following.

Without Authorization Header Present - Successful call with no response of data fields values

With authorization Header Present but wrong Admin Token - Failed Call

With authorization header present and correct Admin Token - successful call

So what does this all mean?

This is true that only calls with correct authorization header would have access so long as there are privacy rules that do not allow ‘everybody else’ to find in searches. But it is not true that without an Authorization header present anybody would be able to access ONLY if there are Privacy Rules that restrict access to ‘everybody else’.

This is true ONLY if there are no Admin API Tokens. But if the developer sets up Admin Tokens and the Authorization Header is not present, it doesn’t matter what Privacy Rules are in place, without an correct Admin Token in the Authorization header, the data is only available to those whose privacy rules permit and whose API Calls had a valid Authorization Header present.

This is not true. The Authorization Header and correct API Token affect only whether or not you can access the data via the api endpoint, and it is the Privacy Rules that affect what data you can access.

This is similar to why I suggest for the new feature of Turn off File Upload API endpoint, that a better, more robust, actual solution is to set the File Upload API endpoint to have an authorization header that uses an Admin Token. Because the way that is setup now, is that it is simply a checkbox, of turn on or off, which is JUST LIKE the checkbox to turn on or off access to a specific data types api endpoint. When it is not checked, it is not accessible to anybody whether there is a valid authorization header with valid API Token or not. And once it is checked to be accessible, we then need to have a valid authorization header with valid API Token.

I don’t know, I might just not fully understand it well enough to know if there are security implications or not.

Not at all.

In no uncertain terms:

Data API

  • can be called unauthenticated, authenticated as user, or authenticated with admin token
  • when unauthenticated / logged out, the type will return all Things and fields the logged out user can access through privacy rules
  • when authenticated as a user / logged in, the type will return all Things and fields the logged in out user can access through privacy rules (the data the privacy rules permit them to access)
  • when authenticated with admin token, the data API will bypass privacy rules and return all data

You can see this for yourself here:

  • https://bubble.io/api/1.1/obj/user - here are the public user data for Meta, which consist of bootcamp instructors.
  • You will see that you see partial data for most of the users visible (as just because they’re a bootcamp instructor doesn’t mean we see all their data
  • You will see that you see full data for your own User record (you are authenticated as a user, yourself, assuming you’re logged in to Bubble)
  • If you log out of Bubble, then visit the same URL, you will still see the user data that is intended to be public, despite being unauthenticated.

Workflow API

  • can be called unauthenticated if the ‘This workflow does not require authentication’ checkbox is checked
  • can always be called when logged in, or with an admin API token
  • will respect privacy rules (assuming ignore privacy rules is not checked) when unauthenticated or authenticated as a user. When using an Admin API token, it will ignore privacy rules

Yes, this would be expected behaviour. That’s because authentication logic happens before the request reaches the data API, hence if there’s an error, it returns it because it cannot authenticate you.

I’m not sure what you mean by restrict access - it’s probably just a language thing but we can’t restrict access, only permit it. What I intend to say is that through the Data API (and any Bubble searches), whether unauthenticated or authenticated as a user, the user can access all data their privacy rules permit them to access. If using an Admin API token, they can access all data irrespective of privacy rules.

You can disprove this for yourself just by visiting the Bubble data API: https://bubble.io/api/1.1/obj/user or https://bubble.io/api/1.1/obj/organization. Bubble has admin API tokens set up. I can still access the data API.

Similarly, I could call Bubble’s backend workflows as a logged in user, without an admin token. Bubble protects all of those within the workflows themselves (for example, with permission tokens passed as a parameter rather than header which is used as a condition) for this reason.

What would this even mean? Let’s say you did have to provide an admin API token to access the data API (you don’t) - okay, great. That bypasses privacy rules. My user authentication is now meaningless!

I’m not sure if we’re talking about different things, but if we’re talking about the same thing then you’re way off.

1 Like

This is what I am experiencing in testing…is this the same as what you experience?

I have the following setup

  1. Data type is exposed via API and there is an Admin Token

  2. Privacy Rules make it so that ‘everybody else’ can not find in searches or view fields

What I would expect with this setup is that if there is NOT an authorization Header present, Bubble would treat it the SAME WAY as if the authorization Header was not formed properly, had the wrong Admin Token, had no admin Token, BUT, that is not exactly the case. What happens is without an Authorization Header even present, the call succeeds, but no data fields are returned other than the ID that was passed into the call.

A. If Authorization Header has correct Admin Token - Can access the data and see all fields permitted by Privacy Rule
B. If Authorization Header has wrong token - get error and call fails completely with status code of 401
C. If Authorization Header has no token - get error and call fails completely with status code of 400
D. If Authorization Header is not formed properly (ie: has no Bearer to begin) - get error undefined that says value for Authorization is not correct undefined
E. If Authorization Header is not present at all - Call succeeds and can get response of the unique ID but can not see any other data fields values.

So I think the difference in ideas is that you are using terminology of ‘can be called’ whereas what I am talking about is access to data, and what data fields, unless you have different experiences than I am when testing in that setup.

This is all what we’d expect. If you provide an authorization header, and that header is invalid, it means you’ve attempted to authenticate, but failed, hence error.

If you don’t provide an authorization header, you have not attempted to authenticate and therefore there is no error - you have chosen to make the API call as a non-authenticated user. Compare that to providing an invalid authentication header - you have attempted to authenticate but have done so incorrectly, hence error.

There are other ways to authenticate beyond authorization header, mind you (e.g cookie).

  • if you attempt to authenticate and do so incorrectly, an error is returned
  • if you do not attempt to authenticate, you make the request (logically) as a non-authenticated user and data is restricted as such

Okay so your experience for A, B, C, D, E is the same.

1 Like

Thanks @georgecollier !

This solved it :white_check_mark:

  1. Backend workflow is called
  2. I am using a custom event to generate my list of things, then returning the list as a return value
  3. I am using “Return data from API”, setting the content-type as “application/json”, and I format my JSON manually.

Indeed, there doesn’t seem to be a limit with that. I just tried with random test data with 15’000 lines of JSON and it works well.

1 Like

This topic was automatically closed after 70 days. New replies are no longer allowed.