13 Jul 2023
Technical note
3 minute read

Simplifying cloud security policies with AI

Data-driven policies can help organizations protect their data and compute resources

Data-driven policies can help organizations protect their data and compute resources

As enterprise applications shift to cloud-based, containerized microservices, security has become increasingly important. Our team recently developed an efficient way to improve cluster security in Kubernetes by using AI and an Open Policy Agent server to build data-driven policies. This protects the cluster, one of the 4Cs — cloud, cluster, container, and code — of cloud native security by leveraging policies that can generalize to any Kubernetes-based container orchestration system (Figure 1).

cluster-policies.pngFigure 1: The four 'Cs' of cloud native security.

A cluster has many types of security policies, ranging from those that define workload security standards to those that provide authentication services and rules. As cloud security becomes more complex, organizations have looked for ways to enforce security policies while minimizing the time that administrators and developers must spend defining and updating them, including setting policy limits based on real-time usage.

The cloud container platform can process enterprise workloads at Formula 1 speeds, but like a race car, it needs constant maintenance to keep workloads safe and secure. For example, an organization’s chief information security officer may want to enforce a least privilege access policy for interservice calls made from one microservice to another. Enforcing a policy like this involves creating and updating a whitelist of valid calls between services. With the help of AI, we can simplify the process.

Building data-driven policies with service mesh

Kubernetes supports policy definition and enforcement for various functionalities in the cluster. A service mesh like Istio can boost the capability of these policies. But these out-of-the-box features require fine-grained configurations for each cluster or workload to ensure good protection. They must also be adapted to work with high-level guidance and minimal maintenance. For example, consider interservice authorization policies in a cluster.

cluster workflow.pngFigure 2: Context driven policy enforcement for interservice authorization.

With Istio, we can add an external authorization service to route incoming requests (1, 2 in Figure 2) to an Open Policy Agent server for authorization. The policy agent’s job is to separate policy decision-making from enforcement, allowing policies for different layers of the stack to be defined and administered in a uniform way.

High-level security policies can be defined in an open policy agent server, with the policies importing dynamic content as data files, which summarize all updatable values in the defined policies. These can include allowed service-to-service calls or thresholds for data access rate limits. Pushing data files into the Open Policy Agent automatically eliminates the need for constant manual updates. Thus, it can evaluate the authorization request based on the cluster’s current conditions (3 in Figure 2). The decision to allow or deny a request is then sent to the policy enforcement point to either forward the request to the workload or reply with a forbidden access error.

Predicting service thresholds with AI

One of the difficulties of adopting policies is setting the right rate-limit thresholds for data access or an API. You might want to set a rate limit to improve performance, mitigate the impact of an attack, or prevent data leaks. But finding the best threshold can be challenging.

Today, the service mesh provides a wealth of detailed information that can be used to train AI models to predict these thresholds. Some of that information includes the size, source, destination, and time of service calls. For example, consider a security policy that limits how many computers can access a sensitive service at any given time. From the data collected by the service mesh, we can trace every request made to this sensitive service and their time stamps. By training our model, a long short-term memory (LSTM) algorithm, on this data, we’re able to predict hourly thresholds for this service.

The context server computes these thresholds (Figure 2) and pushes the updated values to the server as dynamic, context data. Policy definition can simply say if the API or data-access limit is above the predicted threshold access and should be denied or audited. There is no burden on the policy author to specify the limit. This gives us an automatic threshold setting that’s updated based on current cluster usage.

In summary, an Open Policy Agent server and AI can be used to improve a service mesh and build data-driven policies. In an increasingly cloud native world, techniques like this can make it easier for organizations to protect their data and compute resources.