Concept
Basically it seemed really “intuitive” at first. In the former articles, we introduced the SQL database data export (output) to Elasticsearch and querying by Kibana through Logstash. It’s simply showed the capability of structured data transparency to unstructured data storage. So it should be even easier to load unstructured data to structured database, right?
Why?
We want to make the unstructured data able to analyze and connect to the other unstructured data related. For that purpose, we transformed the data to structured.
Before everything
Well… I thought it was that easy at first. Of course we ignore the data preparation in Elasticsearch. For example, the Elasticsearch index setting, Logstash parsing rules, and the data insight we want to explore. We assume those concepts are already confirmed with our business logic. It’s the most important part. Totally not the technology we leveraged. Please be really careful about this. I don’t want to hear any feedback like “data lake is useless”… and so on. Buying weapons are not a big deal if you have enough budget. Aiming on the target is the issue you need to train yourself. Good luck. I’m going to introduce how we do that.
How?
I tried to use Logstash to fulfill the data transparency. Because the performance and setting is really good and easy. As a data pipeline trigger, Logstash is like our dreamed tool. We can send data everywhere and set it by “SQL” as a bridge. Through the Logstash bridge, we are able to input and output the data no matter it’s unstructured data or structured data.
Environment & Setting
AWS EC2 instance
Windows Server 2016
JRE jre-8u211-windows-x64
ELK 7.5.0
It will be easier to use Linux. But nothing is without trade-off. Use anything you compared.
I assume you like my article… and set as the following below:
OK. now we need to install the “logstash” output JDBC plug-in. It’s very important if you don’t have the plug-in. Ya! You DO NOT need a “input” plug-in because it’s default set. But you need the output JDBC plug-in. It’s really important.
> bin\logstash-plugin.bat install logstash-output-jdbc
If you already have had the plug-in, make sure it’s up-to-date.
> bin\logstash-plugin.bat update logstash-output-jdbc
We used our input as Elasticsearch and output as SQL server 2017. The config is:
We created an ELK.conf
input
{
elasticsearch {
hosts => [“localhost:9200″]
index => “yourindex”
}
}
output {
jdbc{
driver_jar_path => “C:\MSSQLJDBC\sqljdbc_4.2\enu\jre8\sqljdbc42.jar”
connection_string => “jdbc:sqlserver://localhost:1433;databaseName=databasename;user=username;password=userpassword”
statement => [ “INSERT into DB.TMP.samsamt (w, test1, test2, kkk) VALUES(?,?,?,?)”, “client_num”, “consignee_name”, “@timestamp”,”host” ]
}
}
And the statement will be the transform design part. You need to mapping the columns you want in statement.
(w, test1, test2, kkk)
is the part of structured table column.
“client_num”, “consignee_name”, “@timestamp”,”host”
is the mapping column in Elasticsearch.
What’s the importance of this “EASY” skill? so what?
We’re now commonly using MongoDB as our production database to solve the bottleneck of I/O and online performance.
Imagine that we can connect to MongoDB with ELK. Querying the streaming production data in ELK by Kibana. Analyze the data by structured database. Boom! Not anymore an imagination.
https://www.itread01.com/content/1545202813.html
You’re welcome.
Operation
> bin\logstash.bat -f config\elk.conf
Check the result
#Output Success when you see this
Check count(*)
Well… that’s all. Free tools and huge benefits. What do you expect for? ME? haha…
Happy new year 2020!
Originally published at http://datamansamxiao.wordpress.com on December 25, 2019.