Skip to main content

How to manage cluster lifecycle with the REST API

Guide

Monitor node startup, initialize a cluster, and verify health using REST endpoints on Apache Ignite 3 or GridGain 9.

ignite3gridgain9
Foundational|15 min|operations
Tested onApache Ignite 3.1.0GridGain 9.1.8

Prerequisites

Overview

Every Apache Ignite 3 and GridGain 9 node exposes a REST management API on port 10300. The API is available as soon as the node's HTTP server starts, before the cluster is initialized. This guide walks through each REST endpoint in the order you use them during startup: check node state, verify discovery, initialize the cluster, and confirm health. Both products share the same REST API paths and response formats. The differences are the license field in the GridGain 9 initialization request and the product name and version number in the version response.

Use this guide when you need scriptable cluster management, want to diagnose startup issues, or are building automation around cluster lifecycle.

Already initialized?

If you completed How to Start a Cluster with Docker Compose and your cluster is already running, every step in this guide still works. You will see "state": "STARTED" instead of "STARTING" in Steps 1-3, and the initialization request in Step 4 returns 200 OK as a safe no-op. No need to tear down and rebuild. Follow along as-is to learn the REST endpoints.

Check node state

Query the node state endpoint to confirm the REST API is reachable. This is the most reliable boot indicator across all product versions.

curl -s http://localhost:10300/management/v1/node/state | jq .

Expected response (before initialization):

{
"name": "node1",
"state": "STARTING"
}

A node reports one of three states:

StateMeaning
STARTINGThe node process is running but has not joined a cluster
STARTEDThe node has joined a cluster and is fully operational
STOPPINGThe node is shutting down

The REST server starts partway through the boot sequence, before all components are ready. If the connection is refused, the HTTP server has not started yet. Retry after a few seconds.

STARTING is expected at this point. It confirms the node is running and ready for further interaction.

Checkpoint: The endpoint returns 200 OK with "state": "STARTING". If the connection is refused, the node is still booting. Retry after 3-5 seconds.

Verify node identity and version

Confirm you are talking to the expected node and check the product version.

curl -s http://localhost:10300/management/v1/node/info | jq .

Expected response:

{
"name": "node1",
"jdbcPort": 10800
}

Check the product version:

curl -s http://localhost:10300/management/v1/node/version | jq .
{
"version": "3.1.0",
"product": "Apache Ignite"
}

Confirm node discovery

Before initializing, verify that all nodes have discovered each other through the internal cluster network.

curl -s http://localhost:10300/management/v1/cluster/topology/physical | jq .

Expected response (abbreviated):

[
{
"name": "node1",
"address": {
"host": "172.18.0.2",
"port": 3344
},
"metadata": {
"restHost": "172.18.0.2",
"httpPort": 10300,
"httpsPort": -1
}
},
{
"name": "node2",
"address": {
"host": "172.18.0.3",
"port": 3344
},
"metadata": {
"restHost": "172.18.0.3",
"httpPort": 10300,
"httpsPort": -1
}
},
{
"name": "node3",
"address": {
"host": "172.18.0.4",
"port": 3344
},
"metadata": {
"restHost": "172.18.0.4",
"httpPort": 10300,
"httpsPort": -1
}
}
]

The physical topology lists every node that has announced itself over port 3344, regardless of whether a cluster has been initialized. Each entry includes the node's name (persistent across restarts), internal communication address, and REST API metadata. The id field (UUID) changes on every restart and is omitted above for brevity.

Wait until all expected nodes appear before proceeding. Node names are case-sensitive and must match exactly when referenced in the initialization request. Nodes that have not discovered each other cannot participate in cluster formation.

Checkpoint: The response lists all three nodes (node1, node2, node3). If a node is missing, check that its container is running with docker compose ps and wait for discovery to complete (typically 10-30 seconds after container start).

Initialize the cluster

Cluster initialization is a one-time operation that creates the Cluster Management Group (CMG) and MetaStorage RAFT groups, assigns a cluster name, and opens the cluster for client connections. Send the request to any single node. It propagates to all discovered nodes.

curl -X POST http://localhost:10300/management/v1/cluster/init \
-H "Content-Type: application/json" \
-d '{
"metaStorageNodes": ["node1", "node2", "node3"],
"cmgNodes": ["node1", "node2", "node3"],
"clusterName": "my-cluster"
}'

A successful initialization returns HTTP 200 with an empty response body. The endpoint blocks until the receiving node has fully joined the cluster, so the response itself confirms that at least one node has transitioned to STARTED.

The request fields:

FieldRequiredDescription
metaStorageNodesNoNodes hosting the MetaStorage RAFT group. Use an odd number (1, 3, or 5) for quorum.
cmgNodesNoNodes hosting the CMG RAFT group. Defaults to metaStorageNodes if omitted.
clusterNameYesHuman-readable cluster name. Must not be blank.
clusterConfigurationNoInitial cluster configuration in HOCON format.
licenseGG9 onlyLicense key content. Not applicable for Apache Ignite 3.

If you omit both metaStorageNodes and cmgNodes, the cluster auto-selects nodes: all nodes for clusters of 1-3, three nodes for clusters of 4, and five nodes for clusters of 5 or more. Nodes are selected alphabetically by name.

note

Initialization is idempotent. Calling the endpoint on an already-initialized cluster returns 200 OK. The CMG RAFT group applies only the first initialization; subsequent calls are safe no-ops.

Checkpoint: The curl command returns with no output and exit code 0 (HTTP 200). If it hangs, verify that all three nodes appear in the physical topology (previous step).

Verify cluster health

After initialization completes, verify that the cluster is operational and all nodes have joined.

Query the cluster state:

curl -s http://localhost:10300/management/v1/cluster/state | jq .

Expected response:

{
"cmgNodes": [
"node1",
"node2",
"node3"
],
"msNodes": [
"node1",
"node2",
"node3"
],
"igniteVersion": {
"major": 3,
"minor": 1,
"maintenance": 0,
"patch": ""
},
"clusterTag": {
"clusterName": "my-cluster",
"clusterId": "..."
}
}

The cmgNodes field lists nodes hosting the Cluster Management Group and msNodes lists nodes hosting MetaStorage. Both should contain all three nodes for a standard 3-node development cluster. The igniteVersion field reflects the product version: Apache Ignite 3 reports 3.1.0 and GridGain 9 reports its own version (e.g., 9.1.8).

Verify that each node has transitioned to STARTED:

for port in 10300 10301 10302; do
echo "Port $port: $(curl -s http://localhost:$port/management/v1/node/state | jq -r .state)"
done

Expected output:

Port 10300: STARTED
Port 10301: STARTED
Port 10302: STARTED
note

On GridGain 9.1.18+ and Apache Ignite 3.2.0+, health probe endpoints provide an additional readiness signal. Query /health/readiness for a 200 UP response after initialization, or 503 DOWN before. On older versions, these endpoints return 404. The node state endpoint is the reliable cross-version indicator.

Checkpoint: The cluster state response lists all three nodes in both cmgNodes and msNodes, and every node reports STARTED. The cluster is fully operational and accepting client connections on ports 10800-10802.

Explore post-initialization endpoints

Before initialization, most REST endpoints return 409 CONFLICT. After initialization, the full API becomes available. Query the logical topology to see nodes that are participating in distributed operations:

curl -s http://localhost:10300/management/v1/cluster/topology/logical | jq .

This returns the same node list as the physical topology, but only includes nodes that have successfully joined the cluster. Other endpoints now available include:

  • /management/v1/cluster/state - Cluster metadata (CMG nodes, MetaStorage nodes, version)
  • /management/v1/configuration/cluster - Read and update cluster-wide configuration (HOCON)
  • /management/v1/cluster/topology/logical - Nodes participating in distributed operations

Checkpoint: The logical topology returns the same three nodes as the physical topology. All post-initialization endpoints return 200 OK instead of 409 CONFLICT.

Troubleshooting

Connection refused on port 10300

The REST server has not started yet. The HTTP server starts partway through the node boot sequence. Wait 5-10 seconds after container start and retry. Verify the container is running with docker compose ps.

Node state stuck on STARTING

STARTING is the expected state before cluster initialization. The node is waiting for an init command. If you have already initialized, the node may not have discovered the existing cluster. Check the physical topology endpoint to verify that the node can see the other cluster members. If the node is isolated, check Docker network connectivity.

409 CONFLICT on cluster state or configuration endpoints

The cluster has not been initialized. Before initialization, only a subset of endpoints are available (node state, node info, node version, physical topology, and cluster init). All other endpoints return 409 CONFLICT with the message "Cluster is not initialized." Complete the initialization step first.

400 Bad Request: "Cluster name must not be empty"

The clusterName field in the init request is blank or missing. Provide a non-empty cluster name in the JSON body.

500 Internal Server Error: "Node is not present in the physical topology"

A node name in metaStorageNodes or cmgNodes does not match any node in the physical topology. Node names are case-sensitive and must match exactly. Query the physical topology endpoint to verify the correct node names before retrying.

409 CONFLICT: "REST temporarily unavailable" after initialization

The node is briefly transitioning through its post-initialization startup phase. During this window (typically a few seconds), most endpoints are temporarily restricted. Retry after 3-5 seconds. If it persists beyond 30 seconds, check the node logs with docker compose logs node1.

Init request hangs without returning

The init endpoint blocks until the receiving node joins the cluster. If it hangs, one or more nodes referenced in metaStorageNodes may not be reachable. Verify all nodes appear in the physical topology from the node receiving the request. Check Docker network connectivity and ensure no firewall rules block port 3344 between containers.

GridGain 9: License error during initialization

The $(cat gridgain-license.json) command substitution reads the file relative to your current working directory. Verify that gridgain-license.json exists in the directory where you are running curl (the same directory as docker-compose.yml). Check the file content with cat gridgain-license.json to confirm it contains valid JSON. Licenses expire; download a fresh one from gridgain.com/tryfree if yours has lapsed.

These endpoints form the foundation for scripted cluster management. You can extend them into health-check scripts, CI/CD pipeline steps, or monitoring integrations that poll node state and cluster topology.