Skip to main content

48 posts tagged with "operation"

View All Tags

· One min read
Joachim Kraftmayer

But, I already mentioned it (for a bit different case) in newer versions there is ceph-volume lvm migrate [1] which I think allows to do the same but in much simpler way. I have not tried it to and the documentation is not very clear to me so one need to experiment with this before writing exact instructions. We might also need to use new-db [2] and new-wal [3] commands before running migrate but I am not sure they are needed for this particular case.

[1] https://docs.ceph.com/en/latest/ceph-volume/lvm/migrate/

[2] https://docs.ceph.com/en/latest/ceph-volume/lvm/newdb/

[3] https://docs.ceph.com/en/latest/ceph-volume/lvm/newwal/

· One min read
Joachim Kraftmayer

Ceph cluster health state

root@clyso-ceph:/home/ceph# ceph -s
cluster:
id: FEBB01CC-4AA5-4C1D-80C4-9D91901467C8
health: HEALTH_WARN
256 pgs not deep-scrubbed in time
266 pgs not scrubbed in time

Admin has noticed that deep-scrubbing has fallen behind.

Possible actions

If impacting ceph performance is not a concern. osd max scrubs and osd scrub load threshold can be carefully be adapted, but that can have a huge impact on ceph cluster performance.

Show current config

root@clyso-ceph:/home/ceph# ceph config show osd.1 osd_max_scrubs
1
root@clyso-ceph:/home/ceph#
root@clyso-ceph:/home/ceph# ceph config get osd osd_scrub_load_threshold
0.500000
root@clyso-ceph:/home/ceph#

Set osd_max_scrub

ceph config set osd_max_scrubs 2

Verify setting

Ceph config database

root@ceph-vm-az1-1:/home/kraftmayerj# ceph config get osd osd_max_scrubs
2
root@ceph-vm-az1-1:/home/kraftmayerj#

Ceph osd active settings (osd.1)

root@clyso-ceph:/home/ceph# ceph config show osd.1 osd_max_scrubs
2
root@clyso-ceph:/home/ceph#

Sources

osd_max_scrubs

osd_scrub_load_threshold

osd_scrub_during_recovery

· One min read
Joachim Kraftmayer

At the request of a customer, we were asked to adapt the performance and capacity of their existing Ceph cluster to the growth in ongoing production.

As a result, without maintenance windows and without service interruption, we have increased the Ceph cluster by 3700% during operation.

· One min read
Joachim Kraftmayer

On behalf of the customer, we provided the optimal Kubernetes platform for ONAP as a managed service.

ONAP is a comprehensive platform for orchestration, management, and automation of network and edge computing services for network operators,

cloud providers, and enterprises. Real-time, policy-driven orchestration and automation of physical and virtual network functions enables rapid automation of new services and complete lifecycle management critical for 5G and next-generation networks.

· One min read
Joachim Kraftmayer

After a reboot of the MDS Server it can happen that the CephFS Filesystem becomes read-only:

HEALTH_WARN 1 MDSs are read only
[WRN] MDS_READ_ONLY: 1 MDSs are read only
mds.XXX(mds.0): MDS in read-only mode
[https://tracker.ceph.com/issues/58082](https://tracker.ceph.com/issues/58082)

In the MDS log you will find following entry

log_channel(cluster) log [ERR] : failed to commit dir 0x1 object, errno -22 mds.0.11963 unhandled write error (22) Invalid argument, force readonly... mds.0.cache force file system read-only log_channel(cluster) log [WRN] : force file system read-only mds.0.server force_clients_readonly

https://tracker.ceph.com/issues/58082

This is a known upstream issue thought the fix is still not merged

As a workaround you can use following steps:

ceph config set mds mds_dir_max_commit_size 80
ceph fs fail <fs_name>
ceph fs set <fs_name> joinable true

If not successful you may need to increase the mds_dir_max_commit_size, e.g. to 160

· One min read
Joachim Kraftmayer

if you had to recreate the device_health or .mgr pool, the healthdevice module is missing his sqlite3 database structure. You have recreate the structure manually.

crash events

backtrace": &#91;
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 373, in serve\n self.scrape_all()",
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 425, in scrape_all\n self.put_device_metrics(device, data)",
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 500, in put_device_metrics\n self._create_device(devid)",
" File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 487, in _create_device\n cursor = self.db.execute(SQL, (devid,))",
"sqlite3.InternalError: unknown operation"
apt install libsqlite3-mod-ceph libsqlite3-mod-ceph-dev

create database

clyso@compute-21:~$ sqlite3 -cmd '.load libcephsqlite.so' -cmd '.open file:///.mgr:devicehealth/main.db?vfs=ceph'
main: "" r/w
SQLite version 3.39.1 2022-07-13 19:41:41
Enter ".help" for usage hints.
sqlite>

list databases

clyso@compute-21:~$ sqlite3 -cmd '.load libcephsqlite.so' -cmd '.databases'
main: "" r/w
SQLite version 3.39.1 2022-07-13 19:41:41
Enter ".help" for usage hints.
sqlite>

create table

clyso@compute-21:~$ sqlite3 -cmd '.load libcephsqlite.so' -cmd '.open file:///.mgr:devicehealth/main.db?vfs=ceph'
SQLite version 3.39.1 2022-07-13 19:41:41
Enter ".help" for usage hints.
sqlite> CREATE TABLE IF NOT EXISTS MgrModuleKV (
key TEXT PRIMARY KEY,
value NOT NULL
) WITHOUT ROWID;
sqlite> INSERT OR IGNORE INTO MgrModuleKV (key, value) VALUES ('__version', 0);
sqlite> .tables
Device DeviceHealthMetrics MgrModuleKV
sqlite>

sources

https://ceph.io/en/news/blog/2021/new-in-pacific-sql-on-ceph https://docs.ceph.com/en/latest/rados/api/libcephsqlite/ https://docs.ceph.com/en/latest/rados/api/libcephsqlite/#usage https://github.com/ceph/ceph/blob/main/src/pybind/mgr https://github.com/ceph/ceph/blob/main/src/pybind/mgr/devicehealth/module.py

· One min read
Joachim Kraftmayer

If you don't want to set flags for the whole cluster, like noout or noup. Then you can also use ceph osd set-group and ceph osd unset-group to set the appropriate flag for a group of osds or even whole hosts.

ceph osd set-group <flags> <who>
ceph osd unset-group <flags> <who>

for example set noout for a whole host with osds

ceph osd set-group noout clyso-ceph-node3
``

```bash
root@clyso-ceph-node1:~# ceph health detail
HEALTH_WARN 1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags set
[WRN] OSD_FLAGS: 1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags set
host clyso-ceph-node3 has flags noout
ceph osd unset-group noout clyso-ceph-node3
root@clyso-ceph-node1:~# ceph health detail
HEALTH_OK
root@clyso-ceph-node1:

Sources:

docs.ceph.com/en/quincy/rados/operations/health-checks/#osd-flags