50 posts tagged with "operation"

Kubernetes upgrade 1.31

March 12, 2025 · 2 min read

Head of Kubernetes at Clyso

We see on some Kubernetes cluster upgrading from 1.30 -> 1.31 following errors on cilium, coredns, kube-proxy, ... pods on Control Planes:

Warning  Failed     15s (x3 over 12s)  kubelet            Error: services have not yet been read at least once, cannot construct envvars

The pods will not start on the updated Control Plane, so we must do it with a little workaround to ensure a seamless upgrade.

Upgrade the Cluster without errors

First of all if you face that problem, no worries, you can easily rollback the kubeadm/kubelet and patch it then. You can also exchange the kubectl but its not needed.

Rollback of kubeadm (Debian based OS)

Redownload the old kubeadm/kubelet and restart with systemctl:

# Rollback to 1.30
wget -P /usr/local/bin https://dl.k8s.io/release/v1.30.X/bin/linux/amd64/kubeadm
wget -P /usr/local/sbin https://dl.k8s.io/release/v1.30.X/bin/linux/amd64/kubelet
systemctl restart kubelet

Upgrade the Cluster before update the binaries

Download the new kubeadm to your home directory and rename it to avoid confusion:

wget https://dl.k8s.io/release/v1.31.X/bin/linux/amd64/kubeadm
mv kubeadm kubeadm-v1.31.X
chmod +x kubeadm-v1.31.X
./kubeadm-v1.31.X   upgrade apply -y v1.31.X

If you have a cluster with more then one Control Plane update all Control Plane Nodes first before exchanging the binaries and reboot the Nodes.

And thats it !

If you still have problems with upgrading your Kubernetes clusters let us know. We can help you!

CLYSO: Kubernetes Analyzer

February 24, 2025 · 3 min read

Dominik Rieder

Head of Kubernetes at Clyso

In 2023, Clyso released the Ceph Analyzer, giving your operations teams a great tool for inspecting the health of your Ceph clusters, offering in-depth reporting and recommendations to fix many non-trivial issues. Two years later, we are pleased to announce the release of Clyso Kubernetes Analyzer!

Get your 30 days of Kubernetes Analyzer now!

What does Clyso Kubernetes Analyzer do?

Is your Cluster in a good shape, or do you think it is? We will check it for you!

Features:

Comprehensive Cluster Inspection: Perform a full inspection of all cluster components with a single command.
Pod Inspection: Retrieve detailed information about pods, including container statuses and logs from the last restart.
Node Inspection: Gather system information and statuses from all nodes in the cluster.
Component Inspection: Inspect critical Kubernetes components like CoreDNS, etcd, CNI, and CSI.
Certificate Expiration Verification: Check the expiration dates of Kubernetes certificates to prevent unexpected outages.
Health Checks: Perform health checks on cluster components to ensure they are functioning correctly.

The Analyzer will check your cluster and give you a report with recommendations on any problems it found.

Required Permissions:

To run the Full System Analysis Tool, the following Kubernetes permissions are required. You can create a ClusterRole with the necessary permissions using the following YAML configuration:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: full-system-analysis-role
rules:
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get", "list"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "create", "delete"]
- apiGroups: [""]
  resources: ["pods/exec", "pods/log"]
  verbs: ["create", "get"]
- apiGroups: [""]
  resources: ["pods/portforward"]
  verbs: ["create"]
- apiGroups: [""]
  resources: ["services", "endpoints"]
  verbs: ["get", "list"]
- apiGroups: [""]
  resources: ["namespaces"]
  verbs: ["get", "list", "create", "delete"]
- apiGroups: [""]
  resources: ["persistentvolumes", "persistentvolumeclaims"]
  verbs: ["get", "list", "create", "delete"]
- apiGroups: ["storage.k8s.io"]
  resources: ["storageclasses", "csidrivers"]
  verbs: ["get", "list"]
- apiGroups: ["apps"]
  resources: ["deployments", "statefulsets", "daemonsets"]
  verbs: ["get", "list"]

This ClusterRole grants the necessary permissions to inspect and manage various Kubernetes resources.

How to use it

Grab you a free Demo from here
You will receive a .zip file that you have to unzip:

$ unzip fsa_tool_clyso_0.16.0_linux_amd64.zip
 LICENSE
 config
 fsa

Now you see a fsa and a config file. This two files you have to copy on a control-plane node where it can execute kubectl and kubeadm or if not possible (f.e. on Talos Linux) to a machine that can reach the cluster and has a configured kubectl.
Create a config directory on your "$HOME" or your choosen user and copy the config file into it:

## Create the directory
$ mkdir $HOME/.fsa

## Copy the config
$ cp config $HOME/.fsa/config

Copy also the fsa Binary to your place where you like to execute it:

## Copy the fsa
$ cp fsa $HOME/fsa

Execute now the Binary, this will generate a file called report.json:

$ ./fsa inspect all -o json
...
JSON report written to report.json

Upload the report.json to https://analyzer.clyso.com/#/analyzer/kubernetes
Check your report and improve your Cluster!

Let us know if you are missing something or find improvements!

ceph-volume - ceph osd migrate DB to larger ssd/flash device

September 9, 2024 · One min read

Joachim Kraftmayer

Managing Director at Clyso

UPDATE docs.clyso.com/blog/ceph-volume-ceph-osd-migrate-db-to-larger-ssd-flash-device/

ceph-volume is a proper way of doing db migration. It uses ceph-bluestore-tool for the actual data movement and some other low-level stuff but LVM metadata updates are performed exclusivly by ceph-volume.

[1] https://docs.ceph.com/en/latest/ceph-volume/lvm/migrate/

[2] https://docs.ceph.com/en/latest/ceph-volume/lvm/newdb/

[3] https://docs.ceph.com/en/latest/ceph-volume/lvm/newwal/

How to Configure Deep-Scrubbing

January 4, 2024 · One min read

Joachim Kraftmayer

Managing Director at Clyso

Ceph cluster health state

root@clyso-ceph:/home/ceph# ceph -s
  cluster:
    id:     FEBB01CC-4AA5-4C1D-80C4-9D91901467C8
    health: HEALTH_WARN
            256 pgs not deep-scrubbed in time
            266 pgs not scrubbed in time

Admin has noticed that deep-scrubbing has fallen behind.

Possible actions

If impacting ceph performance is not a concern. osd max scrubs and osd scrub load threshold can be carefully be adapted, but that can have a huge impact on ceph cluster performance.

Show current config

root@clyso-ceph:/home/ceph# ceph config show osd.1 osd_max_scrubs
1
root@clyso-ceph:/home/ceph#

root@clyso-ceph:/home/ceph# ceph config get osd osd_scrub_load_threshold
0.500000
root@clyso-ceph:/home/ceph#

Set osd_max_scrub

ceph config set osd_max_scrubs 2

Verify setting

Ceph config database

root@ceph-vm-az1-1:/home/kraftmayerj# ceph config get osd osd_max_scrubs
2
root@ceph-vm-az1-1:/home/kraftmayerj#

Ceph osd active settings (osd.1)

root@clyso-ceph:/home/ceph# ceph config show osd.1 osd_max_scrubs
2
root@clyso-ceph:/home/ceph#

Sources

osd_max_scrubs

osd_scrub_load_threshold

osd_scrub_during_recovery

Backup Kubernetes with Velero

July 27, 2023 · One min read

Joachim Kraftmayer

Managing Director at Clyso

We habe backuped kubernetes cluster before using Ark.

The Ark project was then acquired by VMWare circa 2019 and renamed Velero.

Currently, we are supporting another customer in backing up data stored in multiple Kubernetes environments to be ready for disaster recovery.

Factor 37 – Ceph Cluster Scaling

July 27, 2023 · One min read

Joachim Kraftmayer

Managing Director at Clyso

At the request of a customer, we were asked to adapt the performance and capacity of their existing Ceph cluster to the growth in ongoing production.

As a result, without maintenance windows and without service interruption, we have increased the Ceph cluster by 3700% during operation.

Managed Ceph with Kubernetes

July 27, 2023 · One min read

Joachim Kraftmayer

Managing Director at Clyso

A customer approached us to help them build their new IaaS platform.

We helped him to evaluate the current situation and worked out a concept together with the customer from the selection of the location, the necessary hardware to the operation and implemented it together within a month.

ONAP in Managed Kubernetes

July 27, 2023 · One min read

Joachim Kraftmayer

Managing Director at Clyso

On behalf of the customer, we provided the optimal Kubernetes platform for ONAP as a managed service.

ONAP is a comprehensive platform for orchestration, management, and automation of network and edge computing services for network operators,

cloud providers, and enterprises. Real-time, policy-driven orchestration and automation of physical and virtual network functions enables rapid automation of new services and complete lifecycle management critical for 5G and next-generation networks.

Scale Out 150+ – kubernetes with ceph in the hyperscalers

July 27, 2023 · One min read

Joachim Kraftmayer

Managing Director at Clyso

Since 2021, we have supported one of our customers in the architecture and operation of more than 150 Kubernetes clusters with Ceph worldwide in all hyperscalers, such as GCP, AWS, Microsoft Azure and AliCloud.

Fix CephFS Filesystem Read-Only

January 6, 2023 · One min read

Joachim Kraftmayer

Managing Director at Clyso

After a reboot of the MDS Server it can happen that the CephFS Filesystem becomes read-only:

HEALTH_WARN 1 MDSs are read only
[WRN] MDS_READ_ONLY: 1 MDSs are read only
    mds.XXX(mds.0): MDS in read-only mode
[https://tracker.ceph.com/issues/58082](https://tracker.ceph.com/issues/58082)

In the MDS log you will find following entry

log_channel(cluster) log [ERR] : failed to commit dir 0x1 object, errno -22 mds.0.11963 unhandled write error (22) Invalid argument, force readonly... mds.0.cache force file system read-only log_channel(cluster) log [WRN] : force file system read-only mds.0.server force_clients_readonly

https://tracker.ceph.com/issues/58082

This is a known upstream issue thought the fix is still not merged

As a workaround you can use following steps:

ceph config set mds mds_dir_max_commit_size 80
ceph fs fail <fs_name>
ceph fs set <fs_name> joinable true

If not successful you may need to increase the mds_dir_max_commit_size, e.g. to 160

Upgrade the Cluster without errors

Rollback of kubeadm (Debian based OS)​

Upgrade the Cluster before update the binaries​

What does Clyso Kubernetes Analyzer do?​

How to use it​

Let us know if you are missing something or find improvements!​

Ceph cluster health state​

Possible actions​

Show current config​

Set osd_max_scrub​

Verify setting​

Ceph config database​

Ceph osd active settings (osd.1)​

Sources​

Rollback of kubeadm (Debian based OS)

Upgrade the Cluster before update the binaries

What does Clyso Kubernetes Analyzer do?

How to use it

Let us know if you are missing something or find improvements!

Ceph cluster health state

Possible actions

Show current config

Set osd_max_scrub

Verify setting

Ceph config database

Ceph osd active settings (osd.1)

Sources