[Recipes] Elasticsearch Bulk API for Batch Updates (Elastic Cloud)

Problem: 

Explore the elasticsearch bulk API for doing batch operations.

Solution Summary: 

We can perform operations in batches using the _bulk API. 

Prerequisites: 

Need to have a working elastic stack configuration with Elasticsearch and Kibana. You may also use any other HTTP clients instead of Kibana. 

Solution Steps: 

Case 1.a - Upload accounts.json to Elastic Cloud from Terminal (Mac)

We will upload a sample json dataset provided by elastic.co here.

  1. Save the JSON file as accounts.json

  2. Run following if in Mac: curl -u elastic:<password> -H "Content-Type: application/json" -X PUT <http-endpoint>/accounts/_doc/_bulk?pretty --data-binary "@accounts.json"

    1. Non-mac users can either install cUrl or use any http client you are familiar with.

 

Account JSON Records will have following structure:

{"index":{"_id":"1"}}

{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}

 

Tip (Hack): If you use curl frequently to work with Elasticsearch, you can override the default curl within your home folder's bin directory to always add -H "Content-Type: application/json". Any script under home folder's bin directory takes precedence over other scripts. 

 

Case 1.b - Add opening_date to 10 documents through bulk update from Kibana Devtools

POST accounts/_doc/_bulk
{ "update" : {"_id" : 1}}
{ "doc" : {"opening_date" : "2018/01/01"} }
{ "update" : {"_id" : 2}}
{ "doc" : {"opening_date" : "2018/02/04"} }
{ "update" : {"_id" : 3}}
{ "doc" : {"opening_date" : "2018/03/09"} }
{ "update" : {"_id" : 4}}
{ "doc" : {"opening_date" : "2018/04/16"} }
{ "update" : {"_id" : 5}}
{ "doc" : {"opening_date" : "2018/05/25"} }
{ "update" : {"_id" : 6}}
{ "doc" : {"opening_date" : "2018/01/01"} }
{ "update" : {"_id" : 7}}
{ "doc" : {"opening_date" : "2018/02/04"} }
{ "update" : {"_id" : 8}}
{ "doc" : {"opening_date" : "2018/03/09"} }
{ "update" : {"_id" : 9}}
{ "doc" : {"opening_date" : "2018/04/16"} }
{ "update" : {"_id" : 10}}
{ "doc" : {"opening_date" : "2018/05/25"} }

Note: 

  1. For update, line 2 needs a script or doc as given above. Index create/insert does not need (see next case).

 

Case 2 - Create / Insert with _bulk API from Console

POST /student/external/_bulk?pretty

POST /student/_doc/_bulk?pretty
{"index":{"_id":"1"}}
{"name": "John Doe", "age" : 20}
{"index":{"_id":"2"}}
{"name": "Doe John", "age" : 30}
{"index":{"_id":"3"}}
{"name": "Doe Doe", "age" : 40}
{"index":{"_id":"4"}}
{"name": "Doe Doe", "age" : 50}

 

Case 3 - Delete with _bulk API from Console

POST /student/_doc/_bulk?pretty
{"delete":{"_id":"2"}}

Recipe Tags: 

Learn Serverless from Serverless Programming Cookbook

Contact

Please first use the contact form or facebook page messaging to connect.

Offline Contact
We currently connect locally for discussions and sessions at Bangalore, India. Please follow us on our facebook page for details.
WhatsApp (Primary): (+91) 7411174113
Phone (Escalations): (+91) 7411174114

Business newsletter

Complete the form below, and we'll send you an e-mail every now and again with all the latest news.

About

CloudMaterials is my blog to share notes and learning materials on Cloud and Data Analytics. My current focus is on Microsoft Azure and Amazon Web Services (AWS).

I like to write and I try to document what I learn to share with others. I believe that knowledge is useless unless you share it; the more you share, the more you learn.

Recent comments

Photo Stream