Skip to main content

50 posts tagged with "operation"

View All Tags

ceph-mgr recreate sqlite database for healthdevice module

· One min read
Joachim Kraftmayer
Managing Director at Clyso

if you had to recreate the device_health or .mgr pool, the healthdevice module is missing his sqlite3 database structure. You have recreate the structure manually.

crash events

backtrace": [
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 373, in serve\n self.scrape_all()",
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 425, in scrape_all\n self.put_device_metrics(device, data)",
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 500, in put_device_metrics\n self._create_device(devid)",
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 487, in _create_device\n cursor = self.db.execute(SQL, (devid,))",
"sqlite3.InternalError: unknown operation"
apt install libsqlite3-mod-ceph libsqlite3-mod-ceph-dev

create database

clyso@compute-21:~$ sqlite3 -cmd '.load libcephsqlite.so' -cmd '.open file:///.mgr:devicehealth/main.db?vfs=ceph'
main: "" r/w
SQLite version 3.39.1 2022-07-13 19:41:41
Enter ".help" for usage hints.
sqlite>

list databases

clyso@compute-21:~$ sqlite3 -cmd '.load libcephsqlite.so' -cmd '.databases'
main: "" r/w
SQLite version 3.39.1 2022-07-13 19:41:41
Enter ".help" for usage hints.
sqlite>

create table

clyso@compute-21:~$ sqlite3 -cmd '.load libcephsqlite.so' -cmd '.open file:///.mgr:devicehealth/main.db?vfs=ceph'
SQLite version 3.39.1 2022-07-13 19:41:41
Enter ".help" for usage hints.
sqlite> CREATE TABLE IF NOT EXISTS MgrModuleKV (
key TEXT PRIMARY KEY,
value NOT NULL
) WITHOUT ROWID;
sqlite> INSERT OR IGNORE INTO MgrModuleKV (key, value) VALUES ('__version', 0);
sqlite> .tables
Device DeviceHealthMetrics MgrModuleKV
sqlite>

sources

https://ceph.io/en/news/blog/2021/new-in-pacific-sql-on-ceph https://docs.ceph.com/en/latest/rados/api/libcephsqlite/ https://docs.ceph.com/en/latest/rados/api/libcephsqlite/#usage https://github.com/ceph/ceph/blob/main/src/pybind/mgr https://github.com/ceph/ceph/blob/main/src/pybind/mgr/devicehealth/module.py

ceph osd set-group

· One min read
Joachim Kraftmayer
Managing Director at Clyso

If you don't want to set flags for the whole cluster, like noout or noup. Then you can also use ceph osd set-group and ceph osd unset-group to set the appropriate flag for a group of osds or even whole hosts.

ceph osd set-group <flags> <who>
ceph osd unset-group <flags> <who>

for example set noout for a whole host with osds

ceph osd set-group noout clyso-ceph-node3
``

```bash
root@clyso-ceph-node1:~# ceph health detail
HEALTH_WARN 1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags set
[WRN] OSD_FLAGS: 1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags set
host clyso-ceph-node3 has flags noout
ceph osd unset-group noout clyso-ceph-node3
root@clyso-ceph-node1:~# ceph health detail
HEALTH_OK
root@clyso-ceph-node1:

Sources:

docs.ceph.com/en/quincy/rados/operations/health-checks/#osd-flags

ceph mds increase memory without downtime

· One min read
Joachim Kraftmayer
Managing Director at Clyso

get config (default: 4G)

ceph daemon mds.&lt;mds-id&gt; config get mds_cache_memory_limit
ceph daemon /var/run/ceph/<fsid>/<mds-id> config get mds_cache_memory_limit
ceph tell mds.storefs-a config show |grep mds_cache_memory_limit

set config on the fly not persistent (to 64 GB)

ceph daemon mds.<mds.id> config set mds_cache_memory_limit 68719476736
ceph daemon /var/run/ceph/<fsid>/<mds-id> set mds_cache_memory_limit 68719476736
ceph tell mds.storefs-a injectargs --mds_cache_memory_limit 68719476736

persist config ( to 64 GB)

ceph config set mds mds_cache_memory_limit 68719476736

ceph iscsi - get the global gateway.conf

· One min read
Joachim Kraftmayer
Managing Director at Clyso

We have two options to get the gateway.conf:

gwcli

gwcli export mode=copy

or

rados

rados -p iscsi get gateway.conf /root/gateway.conf

At the moment there is no way to update or write the gateway.conf via gwcli command. So the only option is to use the rados command line tool.

note

Be careful when editing the content manually, it requires care and expertise.

sources

docs.ceph.com/en/latest/man/8/rados

manpages.ubuntu.com/manpages/jammy/man8/gwcli.8.html

ceph warning - HEALTH_WARN pools have many more objects per pg than average

· 2 min read
Joachim Kraftmayer
Managing Director at Clyso

You might have wondered how to get rid of the warning "ceph warning - pools have many more objects per pg than average", because you want to see your cluster in HEALTH_OK status. The option to change the thresholds for this warning is: mon_pg_warn_max_object_skew

Especially for start in production of a ceph cluster or a new pool you can set the threshold high. After a certain time you should always check the value and adjust it if necessary.

An important note for the option is that it must be set on ceph mgr, often you can find posts that set the option on ceph mon and see no effect on ceph status.

The cluster status commands 

ceph status

or

ceph health detail

shows the following warning:

[WRN] MANY_OBJECTS_PER_PG: 1 pools have many more objects per pg than average pool test objects per pg (2079) is more than 11.6798 times cluster average (178)

note

To disable the warning completely the value of mon_pg_warn_max_object_skew must be set to 0 or a negative number.

Verify the default value

ceph config get mgr mon_pg_warn_max_object_skew10.000000

inject the value:

ceph tell mgr.a injectargs '--mon_pg_warn_max_object_skew 50'

verify the value:

ceph tell mgr.a config get mon_pg_warn_max_object_skew
{
"mon_pg_warn_max_object_skew": "50.000000"
}

set the value persistent, e.g. 50 times higher

ceph config set mgr mon_pg_warn_max_object_skew 50
ceph config get mgr mon_pg_warn_max_object_skew
50.000000

ceph-volume - create WAL/DB on separate device for existing OSD

· 2 min read
Joachim Kraftmayer
Managing Director at Clyso

ceph-volume can be used to create for a existing OSD a new WAL/DB on a faster device without the need to recreate the OSD.

ceph-volume lvm new-db --osd-id 15 --osd-fsid FSID --target cephdb/cephdb1
--> NameError: name 'get_first_lv' is not defined
this is a bug in ceph-volume v16.2.7 that will be fixed in v16.2.8
[https://github.com/ceph/ceph/pull/44209](https://github.com/ceph/ceph/pull/44209)

First create a new Logical Volume on the Device that will hold the new WAL/DB

vgcreate cephdb /dev/sdb
Volume group "cephdb" successfully created
lvcreate -L 100G -n cephdb1 cephdb
Logical volume "cephdb1" created.

Now stop running OSD and if it was deactivated ( cephadm ) then activate it on the host

systemctl stop ceph-FSID@osd.0.service
ceph-volume lvm activate --all --no-systemd

Create new WAL/DB on new Device

ceph-volume lvm new-db --osd-id 0 --osd-fsid OSD-FSID --target cephdb/cephdb1
--> Making new volume at /dev/cephdb/cephdb1 for OSD: 0 (/var/lib/ceph/osd/ceph-0)
Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-0/block.db
Running command: /bin/chown -R ceph:ceph /dev/dm-1
--> New volume attached.

Migrate existing WAL/DB to new Device

ceph-volume lvm migrate --osd-id 0 --osd-fsid OSD-FSID --from data --target cephdb/cephdb1
--> Migrate to existing, Source: ['--devs-source', '/var/lib/ceph/osd/ceph-0/block'] Target: /var/lib/ceph/osd/ceph-0/block.db
--> Migration successful.

Deactivate OSD and start it

ceph-volume lvm deactivate 0
Running command: /bin/umount -v /var/lib/ceph/osd/ceph-0
stderr: umount: /var/lib/ceph/osd/ceph-0 unmounted
systemctl start ceph-FSID@osd.0.service

ceph osd migrate DB to larger ssd/flash device

· 2 min read
Joachim Kraftmayer
Managing Director at Clyso

First we wanted to use ceph-bluestore-tool bluefs-bdev-new-wal. However, it turned out that it is not possible to ensure that the second DB is actually used. For this reason, we decided to migrate the entire bluefs of the osd to an ssd/flash.

bluestore

Verify the current osd bluestore setup

ceph-bluestore-tool show-label –dev device []

Verify the current size of the osd bluestore DB

ceph-bluestore-tool  bluefs-bdev-sizes –path <osd path>
ceph-bluestore-tool bluefs-bdev-migrate –path osd path –dev-target new-device –devs-source device1 [–devs-source device2]

Verify the size of the osd bluestore DB after the migration

ceph-bluestore-tool  bluefs-bdev-sizes –path <osd path>

if the size does not correspond to the new target size execute the following command:

ceph-bluestore-tool bluefs-bdev-expand –path osd path

Instruct BlueFS to check the size of its block devices and, if they have expanded, make use of the additional space. Please note >that only the new files created by BlueFS will be allocated on the preferred block device if it has enough free space, and the >existing files that have spilled over to the slow device will be gradually removed when RocksDB performs compaction. In other >words, if there is any data spilled over to the slow device, it will be moved to the fast device over time. https://docs.ceph.com/en/octopus/man/8/ceph-bluestore-tool/#commands

Verify the new osd bluestore setup

ceph-bluestore-tool show-label –dev device []

Update

You might be interested in a migration method on a higher layer with ceph-volume lvm.

docs.clyso.com/blog/ceph-volume-ceph-osd-migrate-db-to-larger-ssd-flash-device/

Appendix

I'm trying to figure out the appropriate process for adding a separate SSD block.db to an existing OSD. From what I gather the two steps are: 1. Use ceph-bluestore-tool bluefs-bdev-new-db to add the new db device 2. Migrate the data ceph-bluestore-tool bluefs-bdev-migrate I followed this and got both executed fine without any error. Yet when the OSD got started up, it keeps on using the integrated block.db instead of the new db. The block.db link to the new db device was deleted. Again, no error, just not using the new db www.spinics.net/lists/ceph-users/msg62357.html

Sources

docs.ceph.com/en/octopus/man/8/ceph-bluestore-tool

tracker.ceph.com/attachments/download/4478/bluestore.png

www.suse.com/support/kb/doc/?id=000020276

ceph-volume - ceph osd migrate DB to larger ssd/flash device

· One min read
Joachim Kraftmayer
Managing Director at Clyso

But, I already mentioned it (for a bit different case) in newer versions there is ceph-volume lvm migrate [1] which I think allows to do the same but in much simpler way. I have not tried it to and the documentation is not very clear to me so one need to experiment with this before writing exact instructions. We might also need to use new-db [2] and new-wal [3] commands before running migrate but I am not sure they are needed for this particular case.

[1] https://docs.ceph.com/en/latest/ceph-volume/lvm/migrate/

[2] https://docs.ceph.com/en/latest/ceph-volume/lvm/newdb/

[3] https://docs.ceph.com/en/latest/ceph-volume/lvm/newwal/

How To Identify OSD(s) affected by PG Dup Bug

· One min read
Joachim Kraftmayer
Managing Director at Clyso

follow-up to osd(s) with unlimited ram growth

There is a way to check a ceph cluster if there are any OSDs affected by the "PG Dup Bug" by running following command:

ceph tell osd.\* perf dump |grep 'osd_pglog\|^osd\.&#91;0-9]'

This will provide you a list of all OSDs in the cluster containing 2 Parameters:

  1. osd_pglog_bytes
  2. osd_pglog_items

"osd_pglog_items" counter is a sum of "normal" log entries, dup entries and some other things. Taking that osd_target_pg_log_entries_per_osd is 300.000 by default, we may assume that about 300.000 items are "normal" pg log entries, and if "osd_pglog_items" counter is much higher than this it is most likely due to dups. Example:

osd.32: { "osd_pglog_bytes": 1925908608, "osd_pglog_items": 17418324 }

osd_pglog_items = 17.418.324 - 300.000 = probably about 17 Million PG Dups

Running a manual check against this OSD with the commands in with unlimited ram growth revealed 1 PG with 17.090.093 entries. So this is a quick and easy way to identify problematic OSD(s) without the need to stop all OSDs and manually run commands.

Sources:

github.com/ceph/ceph/blob/master/src/common/options/global.yaml.in#L2951