Snowflake Cortex AI: Running LLMs Directly on Your Data Warehouse
Genufy Team
Jun 3, 2025 · 8 min read
For years, running ML inference on your data warehouse meant one thing: ETL it out, run a model elsewhere, and ETL the results back. Snowflake Cortex AI eliminates that pipeline entirely.
What Cortex AI Actually Is
Cortex AI is a suite of LLM-powered functions — COMPLETE, SUMMARIZE, SENTIMENT, TRANSLATE, EXTRACT_ANSWER — that run natively inside Snowflake using SQL. There is no model to deploy, no Python environment to manage, no data to move.
"SELECT SNOWFLAKE.CORTEX.SENTIMENT(review_text) FROM customer_reviews — that's the entire inference pipeline.
Real-World Use Cases We've Deployed
We've used Cortex AI for three production use cases: real-time sentiment scoring on support ticket streams, automatic summarisation of long-form sales call transcripts stored in Snowflake, and structured data extraction from unstructured contract text loaded via Snowpipe.
Cost and Latency Profile
Cortex functions are billed per token, not per compute hour. For batch workloads this is very efficient. For streaming or near-real-time use cases, model the token cost carefully — a high-cardinality table with verbose text fields can burn credits quickly. We recommend running CORTEX functions inside a separate dedicated warehouse with auto-suspend set to 1 minute.