Unable to Decode FSMap during Pacific Upgrade
Problem
While upgrading from Nautilus (v14) to Pacific (v16), a user reports that the new v16 mon will not start, crashing with this error:
unable to decode FSMap: void FSMap::decode(ceph::buffer::v15_2_0::list::const_iterator&) no longer understand old encoding version v < 7: Malformed input
The cluster does not currently have any MDS running or CephFS configured, but may have had a CephFS configured in the past.
Solution
The mon database likely contains old incompatible fsmap data which is not readable by v16 ceph-mon daemons. These must be cleaned up prior to upgrading to Pacific.
- First, downgrade the crashing mon back to Nautilus, and confirm that all MONs are running v14 and have quorum.
- Use these commands to temporarily create then remove a CephFS:
# ceph osd pool create data 32 replicated
# ceph osd pool create meta 32 replicated
# ceph fs new cephfs meta data
# ceph fs fail cephfs
# ceph fs rm cephfs --yes-i-really-mean-it
# ceph config set global mon_allow_pool_delete true
# ceph osd pool rm data data --yes-i-really-really-mean-it
# ceph osd pool rm meta meta --yes-i-really-really-mean-it
# ceph config set global mon_allow_pool_delete false
- Next trim out the old incompatible fsmap objects as follows:
# ceph fs dump
e4 <--- use what ever number you will get
...
# echo epoch is 4
epoch is 4
# ceph config set mon mon_mds_force_trim_to 3 # one less than 4 <---- use what ever number you will get (X-1)
# ceph config set mon paxos_service_trim_min 1
# ceph fs dump 2 # repeat until you can verify cannot access e-2
Error ENOENT: <---- epoch eX has been trimmed and hence it is not reachable
# ceph config rm mon mon_mds_force_trim_to
# ceph config rm mon paxos_service_trim_min