The Exciting Frontier of
Custom KSQL Functions
Hi, I'm Mitch
- Data Systems Engineer @Mailchimp
-
mitchseymour.com
- new dad ❤ thai food, retrowave
Agenda
- Motivation
- Terminology / Basics
- Remote services / models
- Embedded models
- Polyglot UDF experiment
- Summary
Why are custom KSQL functions important?
Why are custom KSQL functions exciting?
KSQL functions are shareable
They facilitate exploration of the
current technological landscape
Agenda
-
✔ Motivation
- Terminology / Basics
- Remote services / models
- Embedded models
- Polyglot UDF experiment
- Summary
UDFs
- User-defined functions
- Operate on a single row
- Stateless
UDAFs
- User-defined aggregate functions
- Multiple inputs, one output (aggregation)
- Stateful
Example I
Basic functions
The process of building custom KSQL functions is
easy and repeatable
Start with the business logic
Build and Deploy
(same as before)
Agenda
-
✔ Motivation
-
✔ Terminology / Basics
- Remote services / models
- Embedded models
- Polyglot UDF experiment
- Summary
Example II
Sentiment Analysis
Concepts
- Remote services
- Third party dependencies
Sentiment Analysis
- Product reception
- Outage impact
- Audience engagement
- Abusive content moderation
Natural Language API
Configs vs Environment Variables
Example III
Coversational interfaces
Concepts
- Exceptions
- Evolutionary UDFs
Dialogflow
"Organizations report a reduction of up to 70 percent in call, chat and/or email inquiries after implementing a VCA" - Gartner research
Use cases
- Chat bots
- Virtual assistants
- Improved customer service
Example
input sourced from user
"I would like to book a room" - user123
response generated by Dialogflow via KSQL
"I can help with that. Where would you like to reserve a room?"
hybrid training
- Pre-trained ML models
- User can also provide training data
How do we safely improve the model over time?
In event-driven architectures, this is easy
Error flows
.
_ ._ _ , _ ._
(_ ' ( ` )_ .__)
( ( ( ) `) ) _)
(__ (_ (_ . _) _) ,__)
`~~`\ ' . /`~~`
; ;
/ \
_____________/_ __ \_____________ .
- Fail fast
- Fail silently
- Dead letters
Agenda
-
✔ Motivation
-
✔ Terminology / Basics
-
✔ Remote services / models
- Embedded models
- Polyglot UDF experiment
- Summary
Example IV
Spam detection
- hid billions of dollars in debt from investors through accounting fraud
- emails made public by the Federal Energy Regulatory Commission
- let's build a spam detector
- training models is easy
- models can be exported to Java classes
Let's see how easy it is to
build & export a model with h2o
Remote
-
− Higher latency
-
− Less predictable failures
-
− No offline support
-
+ Simple integration
-
+ Built-in model management
Agenda
-
✔ Motivation
-
✔ Terminology / Basics
-
✔ Remote services / models
-
✔ Embedded models
- Polyglot UDF experiment
- Summary
- Polyglot programming
-
Democratize UDF development for non-Java developers
- This is experimental
Installing guest languages
Graal updater (gu)
$ gu install ruby
$ gu available
ComponentId Version Component name
----------------------------------------------------------------
python 1.0.0-rc15 Graal.Python
R 1.0.0-rc15 FastR
ruby 1.0.0-rc15 TruffleRuby
Now, let's create a Polyglot UDF!
Gotchas
- Need benchmarks. Initial tests show a start up penalty for some languages
- Using libs in guest languages may not always work
- Encountered silent and hard-to-debug failures
Possible for full integration into KSQL?
POC
- Multilingual UDFs in interactive mode
- Experimental KSQL language extensions
Inline Python UDF POC only
The POC shows that polyglot UDFs are possible...
Agenda
-
✔ Motivation
-
✔ Terminology / Basics
-
✔ Remote services / models
-
✔ Embedded models
-
✔ Polyglot UDF experiment
- Summary
Recap
What did we learn through these examples?
- Bootstrapping new projects
- Building
- Deploying
- Configuring
- Error handling
We have the ingredients for a rich ecosystem
There should be a community for sharing
KSQL functions
Then others can discover your function
Go build something exciting