BLOG

Beyond Chatbots: LLMs for Data Processing in Snowflake

07/02/2024

Preston Blackburn

Data processing pipeline integration with LLMs benefits

Large Language Models (LLMs) have been getting a lot of attention (for good reason!) lately, but not many have seen practical implementations beyond chat bots. A key reason for this trend is the lack of data availability for the LLMs. Since chat bots only require user input (and, potentially, a vector database), minimal data engineering is required. This means chatbots can be quick to implement but may ignore a company’s most valuable resource – data.

7Rivers believes combining Snowflake container services with LLMs will accelerate the integration of LLMs in data processing pipelines and make data more readily accessible for the LLMs. By integrating data processing pipelines with LLMs we can:

Simplify data architectures
Enable data-first event driven architectures
Eliminate data egress
Improve collaboration between DE and ML Engs through sharing a common platform (Snowflake)
Centralize MLOps and data sources

A simple architecture can get you started on integrating LLMs and your data processing pipelines. This architecture takes advantage of Snowflake container services to run everything natively on Snowflake. This eliminates any data egress and helps centralize workflows on a single platform. In the architecture diagram below, additional Snowflake native features are being used as well.

These Snowflake native features include container services, image registry, UDFs, streams, and tasks. The LLM development process is abstracted away, and the focus is on the implementation of the LLM natively in a Snowflake pipeline. Using streams and tasks for incremental data processing is a common Snowflake design pattern, and using a UDF to call the LLM interface fits naturally in this pattern. A separate LLM interface from the LLM hosting service to allow for greater flexibility, including implementing additional functionalities without wasting GPU resources. Furthermore,additional vector database services could be hosted alongside the LLM hosting service which the interface service could access to perform RAG if required.

Example

In this case, product reviews are examined as they are streamed into Snowflake and loaded continuously using Snowpipe. The reviews need to be classified based on a positive or negative sentiment regarding a specific metric such as product quality, along with the reason behind the classification. The classified product reviews will then be saved to a table where they can later be reviewed.

A benefit of this pipeline is that there is no overhead pulling the data out of Snowflake into another service to do the inference. The pipeline is contained in Snowflake and executed in an event-driven way. This enables Data Scientists and ML Engineers to more easily deploy their pipelines in production because they don’t need to worry about setting up any infrastructure outside of Snowflake. Spinning up infrastructure outside of Snowflake would typically require the aid of a DevOps team member and slow delivery velocity.

Furthermore, since the endpoint is exposed as a SQL UDF, Data Engineers will have no trouble using the UDFs the ML Engineers expose into their own pipelines. It is common for machine learning teams to pull data out of Snowflake into a separate pipeline that runs on a third-party service, which can cause issues with data like the pipeline being out of sync with the Snowflake source of truth. By keeping everything in Snowflake, not only are there more opportunities to collaborate with the data engineering team, but it ensures the data machine learning teams’ reference is the source of truth.

Conclusion

Snowflake container services make it easy to use LLMs as part of your data pipelines, allowing you to unlock new insights from your data without worrying about complex architectures. If you’d like to learn more about implementing this architecture or how we deployed this pipeline reach us at https://7riversinc.com/contact-us/

Unleashing the power of AI with 7Rivers: Pioneering the Augmented Enterprise

7Rivers Editorial Team

10/14/2024

Snowflake Hybrid Tables: Enhancing Transactional and Operational Workloads

09/18/2024

See how Snowflake Hybrid Tables can change and enhance workloads, and why they’re the cutting edge of data warehousing and analytics.

Revolutionize Your Financial Data Management with Data Vault and Snowflake

09/04/2024

Data Vault and Snowflake® can streamline financial data management for CFOs and offer powerful insights for making data-driven business decisions.

Retrieval-Augmented Generation (RAG) Basics

08/22/2024

Get an overview of Retrieval-Augmented Generation (RAG) and learn how this technique is improving the results of Large Language Models (LLMs).

Prepping for Snowflake Polaris Part 1: Introducing Iceberg

08/09/2024

Find out what’s possible now that Snowflake’s Polaris Catalog is open source, and how Apache Iceberg™ fits into the data warehousing process.

Funding Milestone: 7Rivers Inc. Raises Additional $2 Million for Expansion

07/22/2024

7Rivers celebrates earning an additional $2 million in funding to further fuel expansion and reinforce their position in the technology consulting market.

Snowflake’s Document AI: An Advanced Document Processing Tool

07/19/2024

Process documents faster and improve your business efficiency with Document AI by Snowflake.

A Break Down of Snowflake Container Services and Kubernetes

04/02/2024

Snowflake released its container services which can benefit your business and its data infrastructure.

Embracing AI: A Beginners Guide to Transforming Your Business

02/28/2024

Building a sturdy foundation for AI with your Business goals in mind. Adopting AI is not just a technological upgrade, but a strategic business decision.

7Rivers Announces Ben Kerford as President

02/13/2024

7Rivers is thrilled to announce the appointment of Ben Kerford as its president. Kerford is poised to spearhead expansive growth initiatives for 7Rivers.

7Rivers and New Resources Consulting Announce Partnership to Serve Rapidly Growing AI Market

01/30/2024

7Rivers has formed a new strategic partnership with New Resources Consulting (NRC) to bring a variety of services, expertise and innovative AI business solutions

LISTEN: 7Rivers Founder & CEO Paul Stillmank Talks Tech on Experience MKE: Tech in MKE Podcast

10/25/2023

Take a listen to Paul Stillmank, CEO of 7Rivers, talk tech on experience Milwaukee on the Tech in MKE Podcast.