Introducing the API Security Workshop Learn More  

API-03 
Excessive Data Exposure

Property 1=Excessive Data Exposure

Excessive data exposure occurs when an API responds with additional data which is expected to be filtered or ignored by the client. This is particularly bad when the extra data includes elements that are sensitive or covered by a regulatory requirement.

How does this happen?

Two unfortunate assumptions can be made during API software development. First, a developer might assume that only the company's clients will talk to the API and believe the client will ignore the extra data.  Regrettably, attackers won't confine themselves to only the official client and will choose to use the extra data.  Secondly, developers can use programming functions to turn internal API data into API responses automatically.  While this provides a productivity boost to the dev teams, it can also lead to extra data being sent in a response.

Continuing the online retailer example, an attacker uses software tools to appear to be the retailer's mobile app. Since the attacker controls the client, they will see all data being sent in the API's response including data expected to be ignored or inadvertently included. An attacker can also intercept traffic between the API and the client in certain circumstances putting the full response in the attacker's possession. Once an attacker sees additional data for one response, they will make additional requests to further 'scrape' as much data from the API as possible. The situation is similar to you thinking you're sharing a single photo from your cell phone but you actually shared all the photos - including those rather private ones.

Why is it important?

It's never a good look to expose sensitive data, even more so when there are privacy or regulatory requirements for that data to be protected. Fundamentally, you have to know how your APIs are responding to secure them appropriately. Even if the data isn't considered sensitive today, it may be tomorrow. Plus, why send data that isn't truly needed? By removing all extraneous data from responses, you can drive down the data transferred and make both the API and its clients more efficient.

While a single request with sensitive data may not be an existential problem, the same assumptions that lead to that single vulnerable response were likely used throughout the API and attackers know this. Once an instance of excessive data exposure is found, more will be looked for across all the available methods of an API. Like chumming the water, a smaller issue can lead to complete data compromise.

Finally, an attacker will take each discovered API method that has excessive data exposure and automate the request to pull all possible sensitive data from the API. These attacks are automated and, since they have what appears to be 'normal' responses, can be difficult to discover without API-focused runtime monitoring.

 

5

 

Example

Continuing with the online retailer example, below is a valid request of Alice's mobile to obtain her profile from the retailer. The retailer's mobile app on her phone makes the following HTTPS request:

GET /api/v2/customer/profile

Her mobile app then displays her preferred shipping address, her loyalty points, and her site/app preferences.

However, Mallory uses his own client to request his profile data from the API to directly view the API's response:

GET /api/v2/customer/profile

GET https://example.com/api/v2/customer/profile HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Content-Type: application/json
Authorization: Bearer eyJhbGciOiJIUzUxMiJ9.eyJzdWIiOiJlZEBleGFtcGxlLmNvbSIsImlhdCI6MTY1ODQ2MDMzMCwiZXhwIjoxNjU4NTQ2NzMwfQ.m0ha0L89xuLpLlLlybcaD2SfKp23BZfe_Hjjs-PprCN2sDvxpmWAcemN5yOq-nhx78Iu8EzLIJbqqc1d-q10UA
Connection: keep-alive
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-origin

{

"profile": {
"username": "alice",
"full_name": "Alice Example",
"first_name": "Alice",
"last_name": "Example",
"cust_id": "3816859381",

"profile": [{
                 "loyalty_num": "99874391",
                 "loyalty_points": "471",
                 "tier": "silver"

          },
          {

                 "payment": "visa",
                 "cc_num": "4111111111111111",
                 "exp": "07/25",
                 "ccv": "763"

          },
          {

                 "pref_shipping": "ground",
                 "addr": "8dbfa042-33b7-11ed-99a0-ef8a4b7a255f"

          }

       ]

    }

}

Mallory quickly notices that far more than the usually displayed information is in the API response. By automating the request and iterating through customer IDs, Mallory is able to scrape the credit card numbers used by the online retailers' customers.

How to test

Attackers and penetration testers look for excessive data exposure vulnerabilities first creating a method to review the raw responses from the API. Typically, this is done using a local proxy.

What is a local proxy?

A local proxy is a piece of software which accepts API requests such as HTTPS requests, stores them and forwards them on to the API originally requested. Such software allows someone to not only watch the traffic to and from an API but also modify the data. A local proxy allows a tester to simulate a legitimate API client without having to write a program specifically for each API.

Once the API traffic can be inspected, testers should pay special attention to API calls which ask for data that is likely to include internal API data such as user profiles, group or linked users, and similar requests. For example, by requesting a user's friends on a social media app, the response may include the full profile data of each friend. When an instance of excessive data exposure is found, it must be evaluated to determine how broad the exposure is and if the attack could be automated. Automated attacks are especially damaging since APIs are, by design, made to respond quickly to programmatic requests and that is exactly what attack automation does.

How to defend

Unfortunately for defenders, excessive data exposure can be hard to detect, especially on an already deployed API since the extra data is part of a normal request. Reviewing source code for use of functions which automatically convert internal API data to API responses can help but is language and framework dependent. Additionally, not all SAST tools will be able to alert when sensitive data is included in an API response.

A strong secure SDLC program which includes API security specific training in an important method to reduce the instances of excessive data exposure in APIs. By informing developers of the importance of only sending the required data in responses and the false assumption of client filtering, the vulnerability may be avoided altogether. While a proactive approach is preferred, having confidence by measuring the effectiveness of such training is vital.

One way to measure the effectiveness of training is to test the running API prior to deploying to production. Robust API Security testing tools can spot sensitive data in responses and flag those API endpoints for review. By adding API security testing to CICD systems, gaps in training can be found early and often.

Additionally, while a single request of a vulnerable API endpoint will appear to be 'normal' traffic, automated attacks will have multiple requests within a short period of time typically from a single client. Such data scraping will be visible to API-specific runtime monitoring. If the runtime monitoring includes deep enough inspection, sensitive data in API responses will be detected and provide alerts for corrective action.

How Noname can help

Nonane’s API security platform can help protect APIs with excessive data exposure in multiple ways.  Noname provides a ‘shift left’ approach to security testing so verbose API responses especially those with sensitive data can be surfaced early in the development process. It provides granular testing capabilities with templated configurations for many popular CICD platforms. Noname also allows for rapid and highly targeted re-testing of vulnerabilities previously discovered allowing for rapid confirmation of developer fixes. Early testing in pre-production environments can reduce or eliminate excessive data exposure issues before attackers have a chance to find them.

Noname also provides runtime protection that includes deep inspection and machine learning to analyze all traffic going to or from APIs. It understands multiple API types such as REST, GraphQL, SOAP, gRPC. By understanding API communication methods, deeply inspecting traffic and utilizing ML, Noname's platform can notice that API responses contain sensitive data and even enforce policies against the inclusion of that data in responses. It also includes anomaly detection and can alert on a surge of requests for a specific API endpoint typical of scraping data from an API. When an attack is discovered, Noname allows for a flexible range of responses to the alerts raised.  At the most basic, it will provide alerts in the Noname web UI. More interesting are the integrations with ticketing systems available which allow Noname alerts to be triaged by the security team. Additionally, Noname can provide fully automated responses by interacting with existing infrastructure to take actions like blocking an IP for a period of time or de-authenticating a client's token at the API gateway. Nonames runtime protection can also alert to the introduction of sensitive data in pre-production environments to address the exposure early in the development life cycle.

Finally, Noname's platform provides API posture management capabilities. Leveraging Noname's runtime monitoring, the platform will dynamically create an inventory of all APIS including those that include sensitive data in their responses. Noname inventories APIs based on the hostname, path/URI and HTTP method allowing for granular controls to be established across the entire API inventory. Beyond creating an inventory, data received and sent are auto-classified. The platform also allows for custom data types to be created to accommodate business-specific data. Using the auto-classified data, policies can be created to restrict sensitive data in API communications. For example, a policy to disallow credit card numbers from ever leaving an API could be created. Any policies on data are enforced by the platform's runtime protection including policies around sensitive data exposure and can include any of the responses mentioned above - manual, semi-automatic or full automatic responses.