Powershell module which is optimised for fast import process. Please be aware that, this powershell script is only supports index action which creates the indices automatically
So the file content should be like
{ "field1" : "value1" }
{ "field1" : "value1" }
{ "field1" : "value1" }
{ "index":{}} will be added for each line by script which also keeps the file size less.
Script has tested on a 12.800.000 rows with a file size 4 gb
Total completion time is 12 minutes
Windows 10 64 Bit
Intel Xeon CPU E3-1535M v5 @ 2.90GHz
32 GB Ram
To generate, we will run the bcp command at below from command prompt which is fast way to export for big-data
bcp "SELECT (SELECT Name, Surname FOR JSON PATH, WITHOUT_ARRAY_WRAPPER, INCLUDE_NULL_VALUES) FROM [Database].[dbo].[Table];" queryout C:\export.json -c -S ".\SQLExpress" -d master -U "sa" -P "XXXX"' -e c:\error_out.log -o c:\output_out.log -T
Run the following api call on Kibana Devtools console (or any http client) to improve the bulk index api performance
Reference: https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-indexing-speed.html
PUT /_settings {
"index" : {
"refresh_interval" : "-1",
"number_of_replicas" : "0"
}
}
To import generated authlogs json file, we will use the following powerscript module file BulkElasticSearchImport.psm1
- Import the module with the following command line: Import-Module .\BulkElasticSearchImport.psm1
- Remove the module with the following command line: Remove-Module BulkElasticSearchImport
- Set 10000 max line count for optimum performance. it takes approximately 12 minutes for 12 millions row
Usage: Bulk-Import ".\export.json" 10000 "http://localhost:9200/indexname/doc/" "username" "password"
Run the following api call on Kibana Devtools console or any http client
PUT /_settings {
"index" : {
"refresh_interval" : "1s",
"number_of_replicas" : "1"
}
}
POST /indexname/_refresh