Noname is now Akamai API Security.
What is Rate Limiting?

Harold Bell
Rate limiting is a mechanism used to control the amount of data or requests that can be transmitted between two systems within a specified time period. It helps prevent abuse, protect system resources, and ensure fair usage for all users. By implementing rate limiting, organizations can mitigate the risk of server overload, improve network performance, and enhance overall security.

A digital resource, such as an application programming interface (API) is finite in nature. It will only be able to handle so many requests per minute or hour, assuming it’s deployed on a fixed amount of infrastructure, e.g., one virtual server. To prevent overloading such a resource, its owners will often apply “rate limiting,” a practice that restricts how many requests it will handle for each user in a given period of time. Rate limiting is also sometimes called “throttling,” because the process performs the digital equivalent of narrowing a pipe to restrict the flow of air.

Rate limiting serves multiple purposes. It ensures reliable access to digital resources that meets service quality expectations. With rate limiting, system owners do not have to invest in infrastructure they don’t need. Rate limiting is also an important tool for protecting digital assets from malicious or unauthorized use, as well as for being able to better understand your traffic needs to determine how to scale your environment.

What is rate limiting?

The best way to understand rate limiting is as a policy. Rate limiting is a set of rules that control the rate at which a user can access a digital resource. For example, how many times can a user attempt to log into a website in a minute? If there is no rate limiting policy, the answer is “as many times as he wants.” With rate limiting, the answer might be, “the user can attempt to log in up to three times in a given minute.” If he tries to log in a fourth time, he will be blocked.

Rate limiting occurs when a rate limiting policy gets enforced by some sort of hardware or software solution. Both the policy and its enforcement are necessary. Without the policy, the enforcement is meaningless. Without the policy enforcement point, the policy is meaningless. Rate limiting also implies the existence of some kind of monitoring tool that tracks the rate of resource usage—and flags problematic situations like a server “running hot” or a suspicious activity that indicates that a cyberattack is under way.

Why is rate limiting important?

Rate limiting is important because it provides a key operational element of service quality, cybersecurity, and finance. Regarding service quality, rate limiting prevents resources from getting overwhelmed by an excessive number of requests, a situation that can lead to service slowdowns or outages. With well-designed rate limiting policies, all eligible users can enjoy a similar quality of service (QoS). On a related front, rate limiting helps ensure fairness by preventing a user from monopolizing a digital resource.

For cybersecurity, rate limiting serves as a countermeasure against a number of different threats. A denial of service (DoS) attack, for instance, attempts to flood a resource with requests so it will shut down. Rate limiting can block the DoS attacker from accomplishing this goal. Rate limiting can similarly mitigate brute force attacks, which involve rapid-fire guessing of passwords. Rate limiting also works against credential stuffing, a variant on brute force wherein the attacker quickly tries different stolen username/password pairs to gain unauthorized access to a resource, e.g., a banking website.

Rate limiting can help a business with its finances, as well. A digital asset costs money, so the more efficiently it’s utilized, the better its financial outcome will be. Rate limit keeps usage of digital assets within predictable bounds. To understand why this matters, consider a scenario where a business has to purchase additional servers to keep up with a high volume of traffic caused by users with unlimited access. That’s not a worthwhile investment.

Looking at this issue from a different perspective, each service request carries a cost. It might be small, perhaps a fraction of a penny, when bandwidth, data center expenses and software/hardware depreciation are taken into account. However, if millions of unwanted service requests are choking the system, that will lead to a waste of money.

Rate limiting may also be part of the monetization of a digital asset. For example, a company might allow a user 5,000 API calls per week for $100. The API owner needs rate limiting to enforce this maximum number of API calls.

How does rate limiting work?

Rate limiting is based on a userID or a user’s Internet Protocol (IP) address. A rate limiting solution tracks the IP addresses associated with service requests. Because the IP address represents a unique code for each connection to the requested service, it enables the rate limiting solution to effectively block out-of-policy behavior.

The actual process of rate limiting involves keeping track of the total number of requests made by users from a given IP address. The rate limiting solution then compares the request activity with its policies. It can easily detect users that are violating the rules and stop them from continuing. In most cases, the rate limiting solution will send an error message to the user.

API pagination is also a tool used to control the rate at which API requests are made. It is used to ensure that the system is not overloaded and that data is retrieved efficiently. This technique also helps protect against malicious requests and reduces the risk of data breaches. By limiting the number of requests, it also helps reduce server load and increases overall performance.

What is rate limiting used for?

System owners use rate limiting for a variety of reasons, most of which have to do with QoS and security. The goals are almost always to keep systems running as expected, deliver a good user experience, and protect digital assets from malicious misuse. In particular:

  • Avoiding availability problems due to overuse of resources — Excessive demand on a resource, such as an API, whether it’s due to popularity or cybercrime, renders the resource less available to users who need it.
  • Protecting other areas of the IT estate — Rate limiting reduces the risk of network penetration or data breach, which can occur through a brute force or credential stuffing attack.
  • Managing quotas and contracts — A system owner may establish a quota for resource usage or require a contract that restricts the quantity and timing of access. Rate limiting makes it possible to enforce those agreements.
  • Controlling traffic and flow — Rate limiting keeps the flow of network traffic within a defined level, which helps avoid slowdowns and outages. It can also enable intelligent flow control with network traffic routed based on predictable volumes, e.g., by merging streams into a single device.
  • Controlling costs and CapEx — Rate limiting allows system owners to allocate traffic to digital resources in accordance with whatever level they established at the procurement stage—for example, a server was authorized for purchase based on a prediction that it would handle 10,000 service requests per hour. If actual service requests are greater, that could necessitate purchasing more servers, which were not in the original budget.

Types of rate limits

This article has focused on rate limiting based on a user’s service requests in a defined time period. However, there are many other ways to limit requests. For example, rate limiting rules can restrict the volume of requests based on frequency and total request volume. A user may be forbidden from trying to log into a site more than 10 times per minute. However, the user might also be forbidden from trying to log in more than 100 times per day. Both rules can apply. Otherwise, that user might try to log in 10 times per minute for all 1,440 minutes in a 24-hour period, which is not ideal from a QoS perspective.

It is also possible to do rate limiting based on location. Users from Germany might be allowed 100 log in attempts per day, while those in France get 200. Alternatively, this kind of rate limiting policy can trigger re-routing of service requests, e.g., sending traffic from an overworked server in Germany to one in France.


Rate limiting is an essential control and countermeasure for system owners who want to stay secure and prevent lapses in service. The practice also helps ensure desired financial outcomes for investments in digital resources. It’s a simple concept—restrict access based on rules about requests per minute, and so forth—but implementation requires close attention to detail. With effective rate limiting in effect, servers and APIs, among other infrastructure elements, will be available for those who are entitled to their use.

Rate Limiting FAQs

How does rate limiting protect APIs from abuse?

Rate limiting protects APIs from abuse by controlling the number of requests a user can make within a specific time frame. It does this by setting a threshold for the maximum number of requests allowed from a single user (or IP address) over a given period of seconds, minutes, or hours. Rate limiting prevents users from overwhelming the API with excessive requests that could lead to performance issues or opening security vulnerabilities via Denial of Service (DoS) attacks, API scraping, and more.

What are common strategies for implementing rate limiting?

Most API security best practices include these common strategies for implementing rate limiting: 

  • Token buckets: Allocate users a certain number of tokens, which replenish at a fixed rate. Each API request consumes one token, and the user’s token count will replenish slowly. This helps handle bursty traffic and varying request rates.
  • Leaky buckets:  Similar to the token buckets, this method controls the number of attempts each user gets for a given time. However, instead of accumulating tokens up to a set maximum, leaky buckets “leak” excess tokens to ensure a steady flow of requests, providing smoother rate limiting and preventing sudden spikes in traffic.
  • Fixed window counters: This strategy involves setting a fixed window of time and a maximum number of allowed requests. If the request count exceeds the limit, subsequent requests are denied until the window refreshes. While this strategy is easy to implement, it’s susceptible to bursty traffic patterns.

Can rate limiting affect the user experience, and how?

Rate-limiting APIs can potentially impact the user experience by causing delays or access issues, and fine-tuning is required to find the sweet spot between “security” and “user experience.”

Aggressively low rate limits may result in slower response times or denied errors for legitimate users. Inconsistent rate limiting can produce intermittent access issues and unreliable functioning. Fine-tuning rate limits based on expected traffic patterns can eliminate this issue. It’s a good idea to design applications that gracefully degrade functionality as needed while keeping core functionalities intact.

What are some best practices for communicating rate limits to users?

Transparency and clarity are top priorities in communicating API security rate limits to users. Error messages should use the HTTP status code “429 Too Many Requests” and indicate that rate limiting is in effect, providing specific details about the maximum number of requests allowed or the requests remaining before tokens are added or refreshed. 


Harold Bell

Harold Bell was the Director of Content Marketing at Noname Security. He has over a decade of experience in the IT industry with leading organizations such as Cisco, Nutanix, and Rubrik, and has been featured as an executive ghostwriter in Forbes Technology Council and Hacker News.

