A Break Down of Snowflake Container Services and Kubernetes

Overview

With the public preview release of container services, companies can now deploy containers directly on Snowflake. If you are using Snowflake as your data warehouse and developing green-field data-first applications, Snowflake’s container services can benefit your business and data infrastructure.

Snowflake’s Container services allow us to bring applications closer to the data, ensuring no data leaves the Platform. This new service is built on top of Kubernetes, an open-source container orchestration system for automating software deployment, scaling, and management.

However, Snowflake’s containers also simplify some of the more complex elements of developing your software and data management with Kubernetes, such as the actual deployment of your Kubernetes cluster. This tradeoff makes container services a great choice for small-to-medium apps. It also simplifies the migration to self-hosted Kubernetes if you require more fine grained control over your infrastructure.

Furthermore, the release of container services allows you to integrate your service with Snowflake UDFs, allowing you to execute functions directly from your container using SQL. This is key to enabling event-driven workflows which can use Snowflake-native features, such as streams and tasks to automatically kick-off processes in your container.

The influence and similarities of Kubernetes and Snowflake container services are clear, but they may not be obvious to the uninitiated, nor are the differences between the two.

Below, 7Rivers takes a dive into these aspects of Snowflake’s container services and compares them to Kubernetes.

Understanding high-level container services architecture

Differences Between Kubernetes and Snowflake Container Services Config Files

Kubernetes object configuration files are used to manage resources running on a Kubernetes cluster such as a pod that runs an application or an ingress that handles traffic routing. Serving a similar function as Kubernetes spec files, the Snowflake container services specification files resemble Kubernetes config files.

Since we know Snowflake container services are running on Kubernetes, we can assume there is some sort of mapping or passthrough from Snowflake’s container services interface to Kubernetes. This mapping/passthrough mainly takes the form of the Snowflake service specification file. Below is a comparison of a Snowflake service spec YAML file and a Kubernetes pod definition YAML file.

If we compare the service specification file to a Kubernetes (K8) pod definition file, the containers section of the Snowflake service spec matches up with the K8s pod definition file. This section is most likely a limited passthrough to the Kubernetes definition.

In Kubernetes, we typically do not deploy single pods, but rather a ‘deployment resource,’ which would look similar to the example below. The Snowflake service spec is certainly simpler than setting up the Kubernetes deployment resource, while still maintaining high availability.

If we continue to look through the spec file below, we can see an “endpoints” block that specifies an endpoint, port, and if the endpoint is public. This makes the app available at a Snowflake-provided address. One of the most significant benefits of Snowflake’s container services is that security is handled by Snowflake.

With Kubernetes, the Load Balancer opens the cluster IP by default, to which anyone can connect. To configure Kubernetes to a similar level of security to that of Snowflakes, you must set up an additional security layer, such as an Identity-Aware Proxy. An example of the Snowflake endpoint (with security by default) compared to the Kubernetes Load Balancer (no security by default) is shown below

In this case, container services allow us to use the built-in Snowflake RBAC security, eliminating the need to set up a separate set of redundant privileges for our application. This is significant, and one of the many benefits of the Snowflake platform, because there is no additional security setup required, it simplifies your implementation, and is one less layer to manage. This is a game changer for any organization, but particularly for smaller teams who have the same security requirements of large organizations but a fraction of their budget.

Finally, if we want to persist data from our application in a volume, Snowflake container services offer several options. In the Snowflake volume block spec, we can select from and define a local, stage, or in-memory volume. This contrasts with creating a persistent volume and persistent volume claim in Kubernetes, which is its standard. While it is easier to stand up a simple volume in Snowflake container services, the volumes do not share data between instances.

Based on the spec and definition files we have examined so far, the Kubernetes to Snowflake resource mappings are roughly as follows:

Containers → K8s pods (simplified options)
Endpoints → Similar to simplified network K8s services like LoadBalancer
Volumes → Simple volume (on host)
Log Exporters → Automatically controlled by Kubernetes or user controlled with something like a logging-agent sidecar

To further break down the containers specification sections we can look at the Snowflake spec reference and use kubectl to examine the Kubernetes container specs.

Architecture Comparison Between Snowflake Services and Kubernetes Nodes

Snowflake services is an excellent choice for new application development, especially when you’re not sure if your application will reach the scale in which you require greater control over your Kubernetes cluster. From a high level, we can also compare the Snowflake service architecture to a generic Kubernetes architecture on Azure. At the root of both architectures are Kubernetes nodes (and the Kubernetes cluster, not shown in the Snowflake architecture) that run the pods/services.

Both architectures rely on the same base building blocks and container images, to deploy new services to a node on the cluster. At any point, if we want to migrate a Snowflake service to a self-hosted Kubernetes cluster, we could create new resources such as the Azure load balancer or persistent volumes and deploy to Azure or another cloud provider.

Snowflake Service Architecture

Conclusion

If you are using Snowflake as your data warehouse and developing green-field data-first applications, then Snowflake container services are a great choice for your infrastructure. You can be sure that your data stays secure within your Snowflake account, and if your needs outgrow Snowflake, there is minimal cost to moving to self-managed Kubernetes.

Reach out to us at 7Rivers to learn more about how we have deployed applications on Snowflake container services, and how we can help you get started with container services at contact@7riversinc.com or https://7riversinc.com/contact-us/.