Integrating-Data-Virtualization-in-Kerberos-authenticated-scenarios
Reading Time: 3 minutes

Kerberos has become the one-stop authentication option in secure corporate environments. From the data side, Kerberos provides multiple data storage and processing systems with the ability to offer a consistent single sign-on infrastructure throughout the entire corporate network. This invokes the need for data virtualization tools to provide adequate integration into these Kerberized environments.

The Massachusetts Institute of Technology (MIT) developed the first versions of Kerberos in the 1980’s, and its successive evolutions (currently Kerberos v5) have turned Kerberos into one of the must-have tools in modern industries.

Kerberos is a ticket-based system that maintains a single point of authentication (Authentication Server), without the need to enter a password. Instead, passwords (or public keys) are validated by the Kerberos client themself by attempting to decrypt the response from the AS with the password input by the user. The response is encrypted by the AS with the hashed password stored on the database for the user.

As a result of successful authentication, users are granted a Ticket-Granting Ticket generated by the KDC (Key Distribution Center), which the users can use from then on to request specific tickets for specific services from the Ticket Granting Service or TGS. So it is a mechanism that centralizes authentication but at the same time provides a way to specialize this authentication to a per-service level, so that the entire network of Kerberized corporate services is covered.

But this is just the basics. Things become increasingly complex as service tickets are proxied so that services can act on behalf of the originally authenticated users, or can be forwarded so that users authenticated on one machine can authenticate on other services from the first services they authenticated on. Also protocol adapter mechanisms might be needed in specific scenarios, like SPNEGO for using Kerberos with HTTP-based services.

So the power and flexibility of Kerberos comes at a cost: first, a high setup complexity and (as a consequence) a steep learning curve for IT teams; and second, it requires the participating Kerberos-authenticated services to actually implement Kerberos mechanisms deep into their internal security procedures, which again increases the complexity of the participating software itself.

Focusing on data virtualization systems, the fact that DV software is a middleware positioning its data integration engine midway between the clients using it and the data sources being accessed adds to this complexity. It adds a new entry to the collection of Kerberized services that might already be present in the corporate network but also, given its nature as a multi-origin data access point, it has to deal with the fact that there could be a difference in cardinality between northbound —client— Kerberos-authenticated endpoints (1) vs. southbound —data source— Kerberos-authenticated endpoints (n).

Kerberos at the data sources

So DV systems can potentially access data from multiple data sources, and these data sources might themselves be secured with Kerberos authentication. Therefore, these DV systems will have to provide adequate credentials to the data sources for accessing them.

Note however, that DV systems are services themselves, not users, so these credentials —or more specifically, Kerberos tickets— will need to be obtained in the first place from the clients by means of proxying or forwarding, so that the DV systems can act on the data source on behalf of the authenticated users and restricted to these user’s roles and applicable security policies. This makes southbound usage of Kerberos a fairly complicated mechanism to correctly architect.

Kerberos for authenticating at the DV system

On the other hand, enabling Kerberos for authentication on the DV service itself can be a (relatively) simpler task, but it will still need a variety of technologies to be involved given the diversity of options DV systems offer for northbound (client) connectivity. For instance, interfaces such as JDBC or ODBC might need a platform-native approach to the handling of Kerberos tickets using specific Kerberos libraries, whereas REST APIs exposed through HTTP interfaces would need to make use of artifacts such as SPNEGO.

Also, as exposed above, the presence of Kerberized data sources that are themselves authenticated via Kerberos might require from the DV system the correct connection of its northbound and southbound Kerberos support processes, so that once northbound authentication is achieved, the system can perform the required ticket proxying/delegation operations that will allow it to act on behalf of the user at the data sources.

So, in summary, Kerberos brings a lot of authentication flexibility to corporate networks and a fair amount of single-sign-on comfort for the end users, but its inherent complexity sometimes poses a technical challenge both for IT teams and, moreover, for the implementation of the software itself.

Denodo Labs