Ran into this error today when trying to rebuild my CEPH cluster. After removing all CEPH packages I was working through the Red Hat CEPH 1.3 install guide and was running the command below.
# salt ‘*’ state.highstate
The output of the command above indicated that one of the minions was not healthy.
osd01.lab.localdomain:
Minion did not return. [Not connected]
So I logged into the problematic minion and attempted to start salt-minion manually.
# systemctl start salt-minion
Salt-minion failed to start and barfed out the errors below.
salt-minion.service – The Salt Minion
Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; enabled)
Active: failed (Result: exit-code) since Wed 2015-08-19 09:17:08 EDT; 35min ago
Process: 841 ExecStart=/usr/bin/salt-minion (code=exited, status=1/FAILURE)
Main PID: 841 (code=exited, status=1/FAILURE)
CGroup: /system.slice/salt-minion.serviceAug 19 09:17:08 osd01.lab.localdomain salt-minion[841]: File “/usr/lib/python2.7/site-packages/salt/payload.py”, line 204, in send_auto
Aug 19 09:17:08 osd01.lab.localdomain salt-minion[841]: return self.send(enc, load, tries, timeout)
Aug 19 09:17:08 osd01.lab.localdomain salt-minion[841]: File “/usr/lib/python2.7/site-packages/salt/payload.py”, line 196, in send
Aug 19 09:17:08 osd01.lab.localdomain salt-minion[841]: return self.serial.loads(self.socket.recv())
Aug 19 09:17:08 osd01.lab.localdomain salt-minion[841]: File “/usr/lib/python2.7/site-packages/salt/payload.py”, line 95, in loads
Aug 19 09:17:08 osd01.lab.localdomain salt-minion[841]: return msgpack.loads(msg, use_list=True)
Aug 19 09:17:08 osd01.lab.localdomain salt-minion[841]: File “msgpack/_unpacker.pyx”, line 142, in msgpack._unpacker.unpackb (msgpack/_unpacker.cpp:142)
Aug 19 09:17:08 osd01.lab.localdomain salt-minion[841]: msgpack.exceptions.ExtraData: unpack(b) received extra data.
Aug 19 09:17:08 osd01.lab.localdomain systemd[1]: salt-minion.service: main process exited, code=exited, status=1/FAILURE
Aug 19 09:17:08 osd01.lab.localdomain systemd[1]: Unit salt-minion.service entered failed state.
After a bit of googling I was able to figure out that the issue was related to the local salt cache files. so I took the scorched earth approach and removed them. Note that this was performed on the minion – osd01.
# cd /var/cache/salt/
# rm -Rf minion
Now lets restart salt-minion
# systemctl restart salt-minion
And check its status.
# systemctl status salt-minion
salt-minion.service – The Salt Minion
Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; enabled)
Active: active (running) since Wed 2015-08-19 09:59:06 EDT; 6s ago
Main PID: 2501 (salt-minion)
CGroup: /system.slice/salt-minion.service
├─2501 /usr/bin/python /usr/bin/salt-minion
└─2729 /usr/bin/python /usr/bin/salt-minionAug 19 09:59:06 osd01.lab.localdomain systemd[1]: Starting The Salt Minion…
Aug 19 09:59:06 osd01.lab.localdomain systemd[1]: Started The Salt Minion.
Excellent, issue resolved. Apparently other issues, such as mismatched salt rpms between the master and the minion can also cause this error, however that was not the case for me.