A very quick
&
Data is stored in topics
Topics are split into partitions
#!
Topic
_____________________________________
Partition 1 | | | | | | | | | | | |
_____________________________________
_____________________________________
Partition 2 | | | | | | | | | | | |
_____________________________________
&
Producer
| _______ Consumer 1
| /
| ________________________________________ __ /
---> | | | | | | | | | | | |
________________________________________ <---------- Consumer 2
\
\
\ _______ Consumer 3
// simplified
Timestamp => int64
Key => bytes
Value => bytes
Stream processing in recent years
Kafka Streams is great, and we'll continue to use it for more complex topologies. But for simpler topologies, we can use...
ksql>CLI
ksql> show properties ;
Property | Value
------------------------------------------------------------------------------------------
ksql.extension.dir | ext
ksql.output.topic.name.prefix |
ksql.persistent.prefix | query_
ksql.schema.registry.url | http://localhost:8081
ksql.service.id | dev
ksql.sink.partitions | 4
ksql.sink.replicas | 1
ksql.sink.window.change.log.additional.retention | 1000000
ksql.statestore.suffix | _ksql_statestore
ksql.streams.application.id | KSQL_REST_SERVER_DEFAULT_APP_ID
ksql.streams.auto.offset.reset | earliest
ksql.streams.bootstrap.servers | localhost:9092
ksql.streams.cache.max.bytes.buffering | 10000000
ksql.streams.commit.interval.ms | 2000
...
------------------------------------------------------------------------------------------
ksql> SET 'ksql.udfs.enabled' = 'false';
ksql> SET 'auto.offset.reset' = 'latest';
ksql> show topics ;
Kafka Topic | Registered | Partitions | Partition Replicas | Consumers | ConsumerGroups
-----------------------------------------------------------------------------------------
api-logs | false | 4 | 1 | 0 | 0
locations | false | 4 | 1 | 0 | 0
-----------------------------------------------------------------------------------------
ksql> PRINT 'api-logs' ;
Format:JSON
{"ROWTIME":1537726054423,"ROWKEY":"null","response_code":401,"username":"trouble_maker"}
{"ROWTIME":1537726079008,"ROWKEY":"null","response_code":401,"username":"trouble_maker"}
{"ROWTIME":1537726044944,"ROWKEY":"null","response_code":401,"username":"trouble_maker"}
{"ROWTIME":1537726478848,"ROWKEY":"null","response_code":200,"username":"trouble_maker"}
{"ROWTIME":1537726073925,"ROWKEY":"null","response_code":200,"username":"good_user"}
...
Queries can represent Kafka data as either:
.....................
campaign_1 => created
campaign_2 => created
campaign_1 => sent
campaign_2 => updated
campaign_2 => sent
.....................
campaign_1 => sent
campaign_2 => sent
ksql> CREATE STREAM api_logs (
response_code INT,
username VARCHAR
)
WITH (kafka_topic='api-logs', value_format='JSON');
Message
----------------
Stream created
----------------
ksql> SHOW STREAMS ;
Stream Name | Kafka Topic | Format
------------------------------------
API_LOGS | api-logs | JSON
------------------------------------
ksql> describe API_LOGS ;
Name : API_LOGS
Field | Type
-------------------------------------------
ROWTIME | BIGINT (system)
ROWKEY | VARCHAR(STRING) (system)
RESPONSE_CODE | INTEGER
USERNAME | VARCHAR(STRING)
ksql> DESCRIBE EXTENDED API_LOGS ;
Name : API_LOGS
Type : STREAM
Key field :
Key format : STRING
Timestamp field : Not set - using <ROWTIME>
Value format : JSON
Kafka topic : api-logs (partitions: 4, replication: 1)
Field | Type
-------------------------------------------
ROWTIME | BIGINT (system)
ROWKEY | VARCHAR(STRING) (system)
RESPONSE_CODE | INTEGER
USERNAME | VARCHAR(STRING)
-------------------------------------------
Local runtime statistics
------------------------
messages-per-sec: 14.38
total-messages: 1208
last-message: 09/24/18 2:10:12 PM GMT
failed-messages: 0
last-failed: n/a
(Statistics of the local Ksql Server interaction with the Kafka topic API_LOGS)
The topic is registered now, which means it can be queried
ksql> show topics ;
Kafka Topic | Registered | Partitions | Partition Replicas | Consumers | ConsumerGroups
-----------------------------------------------------------------------------------------
api-logs | true | 4 | 1 | 0 | 0
locations | false | 4 | 1 | 0 | 0
-----------------------------------------------------------------------------------------
ksql> SELECT username, response_code
FROM API_LOGS ;
# output
trouble_maker | 401
trouble_maker | 401
good_user | 200
# query continues to run.
# output after 5s
mitch | 200
good_user | 200
It depends...
Queries can be either:
Non-persistent
ksql> SELECT username, response_code
FROM API_LOGS ;
Persistent
ksql> CREATE STREAM response_codes_only AS SELECT response_code FROM API_LOGS ;
ksql> show topics ;
Kafka Topic | Registered | Partitions | Partition Replicas | Consumers | ConsumerGroups
-------------------------------------------------------------------------------------------------
api-logs | true | 4 | 1 | 4 | 1
locations | false | 4 | 1 | 0 | 0
RESPONSE_CODES_ONLY | true | 4 | 1 | 0 | 0
------------------------------------------------------------------------------------------------
ksql> SELECT username, response_code
FROM API_LOGS
WHERE response_code = 401;
# output
trouble_maker | 401
trouble_maker | 401
trouble_maker | 401
# output after 20s
trouble_maker | 401
chaos_monkey | 401
ksql> SELECT username, count(*)
FROM API_LOGS
GROUP BY username;
# output
good_user | 1
trouble_maker | 2
ksql> SELECT username, count(*)
FROM API_LOGS
WINDOW TUMBLING (SIZE 1 MINUTE)
GROUP BY username;
# output
trouble_maker | 2
trouble_maker | 1
good_user | 1
WINDOW HOPPING (SIZE 20 SECONDS, ADVANCE BY 5 SECONDS)
WINDOW TUMBLING (SIZE 20 SECONDS)
WINDOW SESSION (300 second)
// use the timestamp in the message
Timestamp => int64
Key => bytes
Value => bytes
ksql> CREATE STREAM api_logs (
response_code INT,
username VARCHAR,
request_time LONG
)
WITH (KAFKA_TOPIC='api-logs',
VALUE_FORMAT='JSON',
TIMESTAMP='request_time');
ksql> CREATE TABLE locations (
username VARCHAR,
city VARCHAR
) WITH (kafka_topic='locations', key='username', value_format='JSON') ;
Message
---------------
Table created
---------------
ksql> show tables ;
Table Name | Kafka Topic | Format | Windowed
-----------------------------------------------------
LAST_LOCATIONS | last-locations | JSON | false
-----------------------------------------------------
ksql> describe LAST_LOCATIONS ;
Name : LAST_LOCATIONS
Field | Type
--------------------------------------
ROWTIME | BIGINT (system)
ROWKEY | VARCHAR(STRING) (system)
USERNAME | VARCHAR(STRING)
CITY | VARCHAR(STRING)
--------------------------------------
LEFT|
FULL|
INNER]
JOIN
ksql> CREATE STREAM api_logs_enriched AS
SELECT api_logs.username, api_logs.response_code, locations.city
FROM api_logs
LEFT JOIN locations on api_logs.username = locations.username;
Message
----------------------------
Stream created and running
----------------------------
ksql> SELECT *
FROM api_logs_enriched ;
# output
1537747980497 | trouble_maker | trouble_maker | 401 | Atlanta
1537748149782 | good_user | good_user | 401 | null
ksql> show functions ;
Function Name | Type
-------------------------------------
ABS | SCALAR
ARRAYCONTAINS | SCALAR
CEIL | SCALAR
CONCAT | SCALAR
COUNT | AGGREGATE
EXTRACTJSONFIELD | SCALAR
FETCH_FIELD_FROM_STRUCT | SCALAR
FLOOR | SCALAR
GEO_DISTANCE | SCALAR
IFNULL | SCALAR
LCASE | SCALAR
LEN | SCALAR
MASK | SCALAR
MASK_KEEP_LEFT | SCALAR
MASK_KEEP_RIGHT | SCALAR
MASK_LEFT | SCALAR
MASK_RIGHT | SCALAR
MAX | AGGREGATE
MIN | AGGREGATE
RANDOM | SCALAR
ROUND | SCALAR
STRINGTOTIMESTAMP | SCALAR
SUBSTRING | SCALAR
SUM | AGGREGATE
TIMESTAMPTOSTRING | SCALAR
TOPK | AGGREGATE
TOPKDISTINCT | AGGREGATE
TRIM | SCALAR
UCASE | SCALAR
-------------------------------------
ksql> describe function UCASE ;
Name : UCASE
Author : confluent
Version :
Overview :
Type : scalar
Jar : internal
Variations :
Arguments : VARCHAR
Returns : VARCHAR
Description :
@UdfDescription(name = "synthwave_name", description = "get your synthwave name")
public class SynthwaveName {
@Udf(description = "get your synthwave name")
public String synthwaveName(String name) {
// do something
return "";
}
}
ksql> SELECT name, synthwave_name(name)
FROM users;
# output
anthony | synth neo
coty | sunset laser
erin | force grid
garret | drive synth
kale | night synth
kasa | night synth
matt | cruise synth
mitch | cruise summer
peony | terror force
ksql> show queries ;
Query ID | Kafka Topic | Query String
-----------------------------------------------------------------
CSAS_RESPONSE_CODES_ONLY_1 | RESPONSE_CODES_ONLY | ...
------------------------------------------------------------------
ksql> explain CSAS_RESPONSE_CODES_ONLY_1 ;
...
Processing topology
-------------------
Topologies:
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000 (topics: [api-logs])
--> KSTREAM-MAPVALUES-0000000001
Processor: KSTREAM-MAPVALUES-0000000001 (stores: [])
--> KSTREAM-TRANSFORMVALUES-0000000002
<-- KSTREAM-SOURCE-0000000000
...
Sink: KSTREAM-SINK-0000000005 (topic: RESPONSE_CODES_ONLY)
<-- KSTREAM-MAPVALUES-0000000004
Consumers maintain their membership by
heart beating back to the cluster
#!
_
;#'#'. ~'#'#; |-|
#' `/ #' _ |=|
#. .# |-| |_| _
'#. .#' °º¤ø,¸¸,ø¤º°`°º¤ø,¸,ø¤°º¤ø,¸¸,ø¤ |=| |-|
'#. .#' |_| |=|
'#. .#' |_|
'#'
#!
-------------------------------------------------------
______
|. |°º¤ø,¸¸,ø¤º°`°º¤ø,¸,ø¤°º¤ø,¸¸,ø¤ Partition 1
| KSQL |
| '-, |
|______|°º¤ø,¸¸,ø¤º°`°º¤ø,¸,ø¤°º¤ø,¸¸,ø¤ Partition 2
------------------------------------------------------|
______
|. |°º¤ø,¸¸,ø¤º°`°º¤ø,¸,ø¤°º¤ø,¸¸,ø¤ Partition 3
| KSQL |
| '-, |
|______|°º¤ø,¸¸,ø¤º°`°º¤ø,¸,ø¤°º¤ø,¸¸,ø¤ Partition 4
-------------------------------------------------------
#!
-------------------------------------------------------
______
|. |°º¤ø,¸¸,ø¤º°`°º¤ø,¸,ø¤°º¤ø,¸¸,ø¤ Partition 1
| KSQL |
| '-, |
|______|°º¤ø,¸¸,ø¤º°`°º¤ø,¸,ø¤°º¤ø,¸¸,ø¤ Partition 2
------------------------------------------------------|
_ ._ _ , _ ._
(_ ' ( ` )_ .__)
( ( ( ) `) ) _)
(__ (_ (_ . _) _) ,__)
`~~`\ ' . /`~~`
; ;
/ \
_____________/_ __ \_____________
#!
-------------------------------------------------------
______
|. |°º¤ø,¸¸,ø¤º°`°º¤ø,¸,ø¤°º¤ø,¸¸,ø¤ Partition 1
| KSQL |°º¤ø,¸¸,ø¤º°`°º¤ø,¸,ø¤°º¤ø,¸¸,ø¤ Partition 2
| '-, |°º¤ø,¸¸,ø¤º°`°º¤ø,¸,ø¤°º¤ø,¸¸,ø¤ Partition 3
|______|°º¤ø,¸¸,ø¤º°`°º¤ø,¸,ø¤°º¤ø,¸¸,ø¤ Partition 4
------------------------------------------------------|
+-------+
| ˥ΌSʞ |
._ _ , _ .`~~ +-------+ /`~~` _ .`~~`\ '_ .`~~_ _
# clone the repo
$ git clone git@github.com:magicalpipelines/ksql-helm-chart.git
# deploy locally
$ ./script/build && ./script/deploy
# start the CLI
$ kubectl exec -it <POD_NAME> ksql
#!
===========================================
= _ __ _____ ____ _ =
= | |/ // ____|/ __ \| | =
= | ' /| (___ | | | | | =
= | < \___ \| | | | | =
= | . \ ____) | |__| | |____ =
= |_|\_\_____/ \___\_\______| =
= =
= Streaming SQL Engine for Apache Kafka® =
===========================================
Copyright 2017-2018 Confluent Inc.
CLI v5.0.0, Server v5.0.0 located at http://localhost:8088
Having trouble? Type 'help' (case-insensitive) for a rundown of how things work!
ksql>