7Rivers logo, just the name, in multiple colors

BLOG

wave
7Rivers logo on a blue gradient background with a blue and orange pattern overlaid on top

Beyond Chatbots: LLMs for Data Processing in Snowflake

Preston Blackburn

Share This

Data processing pipeline integration with LLMs benefits

Large Language Models (LLMs) have been getting a lot of attention (for good reason!) lately, but not many have seen practical implementations beyond chat bots. A key reason for this trend is the lack of data availability for the LLMs. Since chat bots only require user input (and, potentially, a vector database), minimal data engineering is required. This means chatbots can be quick to implement but may ignore a company’s most valuable resource – data. 

7Rivers believes combining Snowflake container services with LLMs will accelerate the integration of LLMs in data processing pipelines and make data more readily accessible for the LLMs. By integrating data processing pipelines with LLMs we can:

  • Simplify data architectures
  • Enable data-first event driven architectures
  • Eliminate data egress
  • Improve collaboration between DE and ML Engs through sharing a common platform (Snowflake)
  • Centralize MLOps and data sources

A simple architecture can get you started on integrating LLMs and your data processing pipelines. This architecture takes advantage of Snowflake container services to run everything natively on Snowflake. This eliminates any data egress and helps centralize workflows on a single platform. In the architecture diagram below, additional Snowflake native features are being used as well.

These Snowflake native features include  container services, image registry, UDFs, streams, and tasks. The LLM development process is abstracted away, and the focus is on the implementation of the LLM natively in a Snowflake pipeline. Using streams and tasks for incremental data processing is a common Snowflake design pattern, and using a UDF to call the LLM interface fits naturally in this pattern. A separate LLM interface from the LLM hosting service to allow for greater flexibility, including implementing additional functionalities without wasting GPU resources. Furthermore,additional  vector database services could be hosted alongside the LLM hosting service which the interface service could access to perform RAG if required.

Example

In this case,  product reviews are examined as they are streamed into Snowflake and loaded continuously using Snowpipe. The reviews need to be classified based on a positive or negative sentiment regarding a specific metric such as product quality, along with the reason behind the classification. The classified product reviews will then be saved to a table where they can later be reviewed. 

A benefit of this pipeline is that there is no overhead pulling the data out of Snowflake into another service to do the inference. The pipeline is contained in Snowflake and executed in an event-driven way. This enables Data Scientists and ML Engineers to more easily deploy their pipelines in production because they don’t need to worry about setting up any infrastructure outside of Snowflake. Spinning up infrastructure outside of Snowflake would typically require the aid of a DevOps team member and slow delivery velocity.

Furthermore, since the endpoint is exposed as a SQL UDF, Data Engineers will have no trouble using the UDFs  the ML Engineers expose into their own pipelines. It is  common for machine learning teams to pull data out of Snowflake into a separate pipeline that runs on a third-party service, which can cause issues with data like the pipeline being  out of sync with the Snowflake source of truth. By keeping everything in Snowflake, not only are there more opportunities to collaborate with the data engineering team, but it ensures  the data machine learning teams’ reference is the source of truth. 

Conclusion

Snowflake container services make it easy to use LLMs as part of your data pipelines, allowing you to unlock new insights from your data without worrying about complex architectures. If you’d like to learn more about implementing this architecture or how we deployed this pipeline reach us at https://7riversinc.com/contact-us/ 

Share This

Snowflake Hybrid Tables: Enhancing Transactional and Operational Workloads

See how Snowflake Hybrid Tables can change and enhance workloads, and why they’re the cutting edge of data warehousing and analytics.

Revolutionize Your Financial Data Management with Data Vault and Snowflake

Data Vault and Snowflake® can streamline financial data management for CFOs and offer powerful insights for making data-driven business decisions.

Retrieval-Augmented Generation (RAG) Basics

Get an overview of Retrieval-Augmented Generation (RAG) and learn how this technique is improving the results of Large Language Models (LLMs).

Prepping for Snowflake Polaris Part 1: Introducing Iceberg

Find out what’s possible now that Snowflake’s Polaris Catalog is open source, and how Apache Iceberg™ fits into the data warehousing process.

Funding Milestone: 7Rivers Inc. Raises Additional $2 Million for Expansion

7Rivers celebrates earning an additional $2 million in funding to further fuel expansion and reinforce their position in the technology consulting market.

Snowflake’s Document AI: An Advanced Document Processing Tool

Process documents faster and improve your business efficiency with Document AI by Snowflake.
footer wave

Unlock the potential of your organization’s data with help from 7Rivers. Contact us to explore what’s possible.