Authorizing 10 Million API Calls Per Second: LinkedIn's Scalable Approach

Software Engineering

LinkedIn authorizes 10 million API calls/second with ACLs. Learn how they achieve fast authorization, quick changes, efficient data management, and robust monitoring at scale.

LinkedIn operates hundreds of microservices, facilitating communication at an astonishing average rate of tens of millions of API calls per second. In such a high-volume environment, robust security measures are paramount. Just as unwelcome scrutiny can compromise privacy, inadequate authorization controls can lead to severe data breaches if a service is compromised.

Access Control Lists (ACLs) represent a widely adopted approach to implementing these crucial authorization controls. ACLs enable the definition of specific permissions, dictating which users, groups, or processes can access particular objects like files, directories, applications, or network resources. Conceptually, an ACL functions as a table or list that meticulously outlines the permissible actions for a given object.

Consider this example of an ACL for a specific service:

(You can interact with this diagram on Eraser.io)

In the example ACL:

The "client-service" is permitted to execute GET requests on the "greeting" resource but is explicitly denied from performing PUT requests.
The "admin-service" holds broader permissions, allowing it to perform both GET and PUT requests on the "greeting" resource.

For every incoming request, the defined ACL is meticulously checked, and access is either granted or denied based on the established permission levels.

The Challenge at LinkedIn's Scale

While the fundamental concept of ACLs appears straightforward, the sheer scale of LinkedIn's operations introduces significant complexities. LinkedIn faces four primary challenges:

Speed: Authorization checks must be executed with minimal latency.
Timely Updates: ACL changes need to be propagated promptly across the entire service stack.
Management: Handling an extensive and evolving number of ACLs requires efficient management tools and processes.
Monitoring: Comprehensive monitoring of ACL checks is essential for security, auditing, and debugging.

The diagram below illustrates LinkedIn's architectural approach to addressing each of these challenges:

(You can interact with this diagram on Eraser.io)

Let's delve into how LinkedIn tackles each of these critical issues.

Solutions for Scalable Authorization

Fast Authorization Checks

To ensure rapid authorization decisions, LinkedIn deploys an authorization client module on every service. This module maintains relevant ACL data in local memory, effectively bypassing the need for network calls during authorization checks, thereby significantly reducing latency.

Delivering Prompt ACL Changes

ACL data is continuously refreshed in the background. The refresh rate is carefully calibrated to strike a balance between delivering timely updates and managing the load on the overall system infrastructure.

Efficient ACL Data Management

ACLs are persistently stored in LinkedIn’s proprietary Espresso database. To enhance latency and scalability, a look-aside Couchbase cache is utilized.

Maintaining consistency between the cache and the database is crucial. LinkedIn employs a change data capture (CDC) system, built on Brooklin, which notifies services whenever an ACL changes, triggering the clearance of stale cache entries.

Furthermore, a REST API, exposed via a management interface and a command-line tool, empowers developers to efficiently manage ACL data.

Monitoring ACL Data

Every authorization check is logged asynchronously using LinkedIn’s Kafka message queue. This comprehensive logging provides invaluable data for debugging, traffic analysis, auditing, and security investigations. Engineers gain access to these insights through LinkedIn's internal inGraphs monitoring system.

References:

Authorization at LinkedIn's Scale