Thursday, December 20, 2018

Recovering from a search head cluster where some members have broken kvstore

The following is some pseudocode / ansible for fixing your search head cluster when you have bad kvstores or orphaned members.

- name: Restart splunk
  command: "sudo -H -u splunk /opt/splunk/bin/splunk restart"

- name: back up folders
  command: "cp -Rp {{ splunk_home }} {{ splunk_backup }}"

- name: Stop splunk
  command: {{ splunk_home }}/bin/splunk stop

- name: splunk clean raft
  command: "{{ splunk_home }}/bin/splunk clean raft --answer-yes -auth admin:changeme"

- name: clean kvstore clustering
  command: "{{ splunk_home }}/bin/splunk clean kvstore --cluster --answer-yes -auth admin:changeme"

- name: start search head with search head with good kvstore
  command: {{ splunk_home }}/bin/splunk start
  when: "good_search_head"

- name: bootstrap shc with good_search_head
  command: "{{ splunk_home }}/bin/splunk bootstrap shcluster-captain -servers_list "https://<member_ip>:8089" -auth admin:changeme"
  when: "good_search_head"

- name: verify status
  command: "{{ splunk_home }}/bin/splunk show shcluster-status -auth admin:changeme"
  when: "good_search_head"

- name: Stop splunk
  command: {{ splunk_home }}/bin/splunk stop

- name: splunk clean raft on good_search_head
  command: "{{ splunk_home }}/bin/splunk clean raft --answer-yes -auth admin:changeme"
  when: "good_search_head"

- name: start all search heads in cluster
  command: {{ splunk_home }}/bin/splunk start

- name: bootstrap shc with good_search_head

  command: "{{ splunk_home }}/bin/splunk bootstrap shcluster-captain -servers_list "https://<member_ip>:8089,https://<member_ip>:8089,https://<member_ip>:8089,https://<member_ip>:8089" -auth admin:changeme"

References:
https://answers.splunk.com/answers/482209/why-is-the-kv-store-status-is-showing-as-starting.html
*note splunk clean raft is a better way to clear it

No comments:

Post a Comment