3 posts tagged with "pg"

ceph Quincy release with bugfix for PGLog dups

November 5, 2022 · One min read

Managing Director at Clyso

Our bugfix from earlier this year was published in the ceph quincy release 17.2.4.

Trimming of PGLog dups is now controlled by size instead of the version. This fixes the PGLog inflation issue that was happening when online (in OSD) trimming jammed after a PG split operation. Also, a new offline mechanism has been added: ceph-objectstore-tool now has a trim-pg-log-dups op that targets situations where an OSD is unable to boot due to those inflated dups. If that is the case, in OSD logs the “You can be hit by THE DUPS BUG” warning will be visible. Relevant tracker: https://tracker.ceph.com/issues/53729"

osds with unlimited ram growth

how to identify osds affected by pg dup bug

Sources

https://docs.ceph.com/en/latest/releases/quincy/#v17-2-4-quincy

ceph warning - HEALTH_WARN pools have many more objects per pg than average

June 22, 2022 · 2 min read

Joachim Kraftmayer

Managing Director at Clyso

You might have wondered how to get rid of the warning "ceph warning - pools have many more objects per pg than average", because you want to see your cluster in HEALTH_OK status. The option to change the thresholds for this warning is: mon_pg_warn_max_object_skew

Especially for start in production of a ceph cluster or a new pool you can set the threshold high. After a certain time you should always check the value and adjust it if necessary.

An important note for the option is that it must be set on ceph mgr, often you can find posts that set the option on ceph mon and see no effect on ceph status.

The cluster status commands

ceph status

ceph health detail

shows the following warning:

[WRN] MANY_OBJECTS_PER_PG: 1 pools have many more objects per pg than average pool test objects per pg (2079) is more than 11.6798 times cluster average (178)

note

To disable the warning completely the value of mon_pg_warn_max_object_skew must be set to 0 or a negative number.

Verify the default value

ceph config get mgr mon_pg_warn_max_object_skew10.000000

inject the value:

ceph tell mgr.a injectargs '--mon_pg_warn_max_object_skew 50'

verify the value:

ceph tell mgr.a config get mon_pg_warn_max_object_skew
{
"mon_pg_warn_max_object_skew": "50.000000"
}

set the value persistent, e.g. 50 times higher

ceph config set mgr mon_pg_warn_max_object_skew 50

ceph config get mgr mon_pg_warn_max_object_skew
50.000000

How To Identify OSD(s) affected by PG Dup Bug

April 25, 2022 · One min read

Joachim Kraftmayer

Managing Director at Clyso

follow-up to osd(s) with unlimited ram growth

There is a way to check a ceph cluster if there are any OSDs affected by the "PG Dup Bug" by running following command:

ceph tell osd.\* perf dump |grep 'osd_pglog\|^osd\.&#91;0-9]'

This will provide you a list of all OSDs in the cluster containing 2 Parameters:

osd_pglog_bytes
osd_pglog_items

"osd_pglog_items" counter is a sum of "normal" log entries, dup entries and some other things. Taking that osd_target_pg_log_entries_per_osd is 300.000 by default, we may assume that about 300.000 items are "normal" pg log entries, and if "osd_pglog_items" counter is much higher than this it is most likely due to dups. Example:

osd.32: { "osd_pglog_bytes": 1925908608, "osd_pglog_items": 17418324 }

osd_pglog_items = 17.418.324 - 300.000 = probably about 17 Million PG Dups

Running a manual check against this OSD with the commands in with unlimited ram growth revealed 1 PG with 17.090.093 entries. So this is a quick and easy way to identify problematic OSD(s) without the need to stop all OSDs and manually run commands.

Sources:

github.com/ceph/ceph/blob/master/src/common/options/global.yaml.in#L2951

related posts​

Sources​

note​

related posts

Sources

note