Skip to main content

Unable to Decode FSMap during Pacific Upgrade

Problem

While upgrading from Nautilus (v14) to Pacific (v16), a user reports that the new v16 mon will not start, crashing with this error:

unable to decode FSMap: void FSMap::decode(ceph::buffer::v15_2_0::list::const_iterator&) no longer understand old encoding version v < 7: Malformed input

The cluster does not currently have any MDS running or CephFS configured, but may have had a CephFS configured in the past.

Solution

The mon database likely contains old incompatible fsmap data which is not readable by v16 ceph-mon daemons. These must be cleaned up prior to upgrading to Pacific.

  1. First, downgrade the crashing mon back to Nautilus, and confirm that all MONs are running v14 and have quorum.
  2. Use these commands to temporarily create then remove a CephFS:
# ceph osd pool create data 32 replicated
# ceph osd pool create meta 32 replicated
# ceph fs new cephfs meta data
# ceph fs fail cephfs
# ceph fs rm cephfs --yes-i-really-mean-it
# ceph config set global mon_allow_pool_delete true
# ceph osd pool rm data data --yes-i-really-really-mean-it
# ceph osd pool rm meta meta --yes-i-really-really-mean-it
# ceph config set global mon_allow_pool_delete false
  1. Next trim out the old incompatible fsmap objects as follows:
# ceph fs dump
e4 <--- use what ever number you will get
...
# echo epoch is 4
epoch is 4
# ceph config set mon mon_mds_force_trim_to 3 # one less than 4 <---- use what ever number you will get (X-1)
# ceph config set mon paxos_service_trim_min 1
# ceph fs dump 2 # repeat until you can verify cannot access e-2
Error ENOENT: <---- epoch eX has been trimmed and hence it is not reachable
# ceph config rm mon mon_mds_force_trim_to
# ceph config rm mon paxos_service_trim_min