New in Snowflake: Java UDFs (with a Kotlin NLP example)
Snowflake now let’s you easily create Java UDFs, which is an incredibly powerful and versatile feature. Let’s check it out with by running a library written in Kotlin — to detect written languages. Out of GitHub and into your SQL code, in 3 easy steps.
Quick example: Detect written languages
Snowflake has made it really easy to create Java UDFs. You just need to do something like this:
create function add(x integer, y integer)
returns integer
language java
handler='Test.add'
as
$$
class Test {
public static int add(int x, int y) {
return x + y;
}
}
$$;
Then you can use that function in your SQL statements:
select add(1, 3)// 4
That’s easy — and then the real power comes with the ability to load and use jar packages. For example, to detect written languages using Lingua:
create or replace function detect_lang(x string)
returns string
language java
imports = ('@~/lingua-1.1.0-with-dependencies.jar')
handler='MyClass.detect'
as
$$
import com.github.pemistahl.lingua.api.*;
import static com.github.pemistahl.lingua.api.Language.*;class MyClass {
static LanguageDetector detector = LanguageDetectorBuilder.fromLanguages(ENGLISH, FRENCH, GERMAN, SPANISH).build(); public static String detect(String x) {
return detector.detectLanguageOf(x).toString();
}
}
$$;
If you have the right jar staged into your Snowflake account, that’s all it takes:
select $1, detect_lang($1)
from values('languages are awesome'),('hola mi amigo'),('hallo Freunde'),('je ne parle pas');
Notes
- I love the ability to write custom Java code while defining UDFs within SQL (Snowflake takes care of compiling it). This allows you to handle all your glue code in one place, within your SQL scripts and dbt projects.
Setup
Lingua is “The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike” (according to them). It’s written in Kotlin, which is a language that runs on the JVM.
To package this library into a single fat jar for Snowflake, I followed these steps:
- Clone the git project
- Build a jar with dependencies with Gradle.
git clone https://github.com/pemistahl/lingua.git
./gradlew jarWithDependencies
Then you can PUT
this jar into your Snowflake account with SnowSQL:
put 'file://build/libs/lingua-1.1.0-with-dependencies.jar' @~;
Read more
- “Welcome to Snowpark: New Data Programmability for the Data Cloud”
- “What JDK version does Snowflake support for Java UDFs?”
- How can I parse HTML with a Snowflake UDF?
- Let’s create fake data with few lines of SQL and Snowflake Java UDF
- Java UDF Snowflake docs
- pemistahl/lingua on GitHub
Acknowledgements
Java UDF support is in active development by a great team at Snowflake, including Elliott Brossard and Isaac Kunen. Stay tuned for more!
Thanks to Peter M. Stahl for the awesome Lingua library:
Want more?
- Try this out with a Snowflake free trial account — you only need an email address to get started.
- Find plenty of interesting datasets in the Snowflake Data Marketplace.
I’m Felipe Hoffa, Data Cloud Advocate for Snowflake. Thanks for joining me on this adventure. You can follow me on Twitter and LinkedIn, and check reddit.com/r/snowflake for the most interesting Snowflake news.