The smallest unit of data you store in Elasticsearch. Think of it as a JSON object, like:
{
"id": "123",
"title": "Lord of the Rings",
"author": "J.R.R. Tolkien",
"genre": "Fantasy"
}
An index is like a database in SQL terms. Each index contains many documents and maintains an inverted index for super-fast searching. An inverted index is essentially a mapping from term → documents.
| Term | Docs |
|---|---|
| "lord" | [1] |
| "rings" | [1] |
| "fantasy" | [1] |
| "hobbit" | [2] |
Now when you search “lord rings,” Elasticsearch doesn’t scan every document—it jumps straight to where those words exist. ⚡
To scale horizontally, an index is split into shards. Each shard is an independent Lucene index that handles part of the data. You can imagine shards like slices of pizza 🍕 — each stores a portion of the documents, and together they make a full pie (index). You can configure:
So the cluster is like a beehive, nodes are bees, and shards are the honeycombs storing nectar (data). 🍯🐝
Before data is stored or queried, Elasticsearch uses analyzers to process text.