Kafka connect many to many tables in MSSQL

Kafka connect many to many tables in MSSQL



I'm currently looking into Kafka Connect to stream some of our databases to a data lake. To test out Kafka Connect I've setup a database with one of our project databases in. So far so good.



Next step I configured Kafka Connect with mode following properties:



"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"timestamp.column.name": "updated_at,created_at",
"incrementing.column.name": "id",
"dialect.name": "SqlServerDatabaseDialect",
"validate.non.null": "false",
"tasks.max": "1",
"mode": "timestamp+incrementing",
"topic.prefix": "mssql-jdbc-",
"poll.interval.ms": "10000",



While this works for the majority of my tables where I got an ID and a created_at / updated_at field, it won't work for my tables where I solved my many-to-many relationships with a table in between and a composite key. Note that I'm using the generic JDBC configuration with a JDBC driver from Microsoft.



Is there a way to configure Kafka Connect for these special cases?






Have you tried using the query configuration options to write a join, for example? Otherwise, the documented Kafka pattern is using Kafka Streams to join your "lookup table" topics with the "data table" topics

– cricket_007
Sep 9 '18 at 15:23



query






@cricket_007 I don't quite understand. I'm under the impression that the query parameter can be used when you want to track specific queries. In my case I want to track the complete database. It could be that I'm wrong though. :-)

– Vince V.
Sep 10 '18 at 6:51






The incrementing timestamp options are also "specific queries", no I'm not sure I understand your concern

– cricket_007
Sep 10 '18 at 13:04




1 Answer
1



Instead of one connector to pull all of your tables, you may need to create multiple ones. This would be the case if you want to use different methods for fetching the data, or different ID/timestamp columns.
As @cricket_007 says, you can use the query option to pull back the results of a query—which could be a SELECT expressing your multi-table join. Even when pulling data from a single table object, the JDBC connector itself is just issuing a SELECT * from the given table, with a WHERE predicate to restrict the rows selected based on the incrementing ID/timestamp.


query


SELECT


SELECT *


WHERE



The alternative is to use log-based change data capture (CDC), and stream all changes directly from the database into Kafka.



Whether you use JDBC or log-based CDC, you can use stream processing to resolve joins in Kafka itself. An example of this is Kafka Streams or KSQL. I've written about the latter a lot here.



You might also find this article useful describing in detail your options for integrating databases with Kafka.



Disclaimer: I work for Confluent, the company behind the open-source KSQL project.



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)