Databricks SQL Translator

AI-powered T-SQL to Databricks migration

RAG + GPT-4o pipeline

T-SQL to Databricks SQL Migration

This tool uses a custom RAG pipeline grounded in Databricks documentation combined with GPT-4o to translate SQL Server views and stored procedures into production-ready Databricks SQL. Below are real migration results from the AdventureWorks2022 database.

6
Objects migrated
3
Views
3
Procedures
90%
Success rate

RAG Pipeline Architecture

The translation engine uses a multi-stage Retrieval-Augmented Generation pipeline grounded in official Databricks documentation. Each SQL object goes through 6 stages with deterministic validation between LLM passes.

STEP 1
Split
Parse & classify SQL objects
STEP 2
Detect
Identify object type & name
STEP 3
Retrieve
RAG: embed + cosine search
STEP 4
Translate
GPT-4o pass 1 (draft)
STEP 5
Validate
Regex checks & signals
STEP 6
Repair
GPT-4o pass 2 (fix)
Corpus

235 documentation chunks from official Databricks SQL reference, pre-embedded with text-embedding-3-large (3072 dims).

Retrieval

Hybrid scoring: 70% cosine similarity + 30% keyword BM25. Type-biased ranking prioritizes relevant doc sections.

Two-pass LLM

Pass 1 generates draft with RAG context. Deterministic validator catches errors. Pass 2 repairs issues with targeted guidance.

Migration Results

Click any item to see the source T-SQL alongside the generated Databricks SQL. Each translation includes RAG-retrieved documentation citations and deterministic validation.

Translate Your Own SQL

API Key required

To run live translations, you need an Azure OpenAI API key. This service uses Azure GPT-4o and embedding models, which have an associated compute cost. Please contact me to discuss access and pricing.

catyvaras19@gmail.com

Your key is sent directly to Azure and is never stored on our servers.

Drop a .sql file here or click to browse

T-SQL file to translate to Databricks SQL