Skip to main content

50 posts tagged with "operation"

View All Tags

Kubernetes upgrade 1.31

· 2 min read
Dominik Rieder
Head of Kubernetes at Clyso

We see on some Kubernetes cluster upgrading from 1.30 -> 1.31 following errors on cilium, coredns, kube-proxy, ... pods on Control Planes:

Warning  Failed     15s (x3 over 12s)  kubelet            Error: services have not yet been read at least once, cannot construct envvars

The pods will not start on the updated Control Plane, so we must do it with a little workaround to ensure a seamless upgrade.

Upgrade the Cluster without errors

First of all if you face that problem, no worries, you can easily rollback the kubeadm/kubelet and patch it then. You can also exchange the kubectl but its not needed.

Rollback of kubeadm (Debian based OS)

Redownload the old kubeadm/kubelet and restart with systemctl:

# Rollback to 1.30
wget -P /usr/local/bin https://dl.k8s.io/release/v1.30.X/bin/linux/amd64/kubeadm
wget -P /usr/local/sbin https://dl.k8s.io/release/v1.30.X/bin/linux/amd64/kubelet
systemctl restart kubelet

Upgrade the Cluster before update the binaries

Download the new kubeadm to your home directory and rename it to avoid confusion:

wget https://dl.k8s.io/release/v1.31.X/bin/linux/amd64/kubeadm
mv kubeadm kubeadm-v1.31.X
chmod +x kubeadm-v1.31.X
./kubeadm-v1.31.X upgrade apply -y v1.31.X

If you have a cluster with more then one Control Plane update all Control Plane Nodes first before exchanging the binaries and reboot the Nodes.

And thats it !

If you still have problems with upgrading your Kubernetes clusters let us know. We can help you!

CLYSO: Kubernetes Analyzer

· 3 min read
Dominik Rieder
Head of Kubernetes at Clyso

In 2023, Clyso released the Ceph Analyzer, giving your operations teams a great tool for inspecting the health of your Ceph clusters, offering in-depth reporting and recommendations to fix many non-trivial issues. Two years later, we are pleased to announce the release of Clyso Kubernetes Analyzer!

Get your 30 days of Kubernetes Analyzer now!

What does Clyso Kubernetes Analyzer do?

Is your Cluster in a good shape, or do you think it is? We will check it for you!

Features:

  • Comprehensive Cluster Inspection: Perform a full inspection of all cluster components with a single command.
  • Pod Inspection: Retrieve detailed information about pods, including container statuses and logs from the last restart.
  • Node Inspection: Gather system information and statuses from all nodes in the cluster.
  • Component Inspection: Inspect critical Kubernetes components like CoreDNS, etcd, CNI, and CSI.
  • Certificate Expiration Verification: Check the expiration dates of Kubernetes certificates to prevent unexpected outages.
  • Health Checks: Perform health checks on cluster components to ensure they are functioning correctly.

The Analyzer will check your cluster and give you a report with recommendations on any problems it found.

Required Permissions:

To run the Full System Analysis Tool, the following Kubernetes permissions are required. You can create a ClusterRole with the necessary permissions using the following YAML configuration:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: full-system-analysis-role
rules:
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "create", "delete"]
- apiGroups: [""]
resources: ["pods/exec", "pods/log"]
verbs: ["create", "get"]
- apiGroups: [""]
resources: ["pods/portforward"]
verbs: ["create"]
- apiGroups: [""]
resources: ["services", "endpoints"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["namespaces"]
verbs: ["get", "list", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumes", "persistentvolumeclaims"]
verbs: ["get", "list", "create", "delete"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses", "csidrivers"]
verbs: ["get", "list"]
- apiGroups: ["apps"]
resources: ["deployments", "statefulsets", "daemonsets"]
verbs: ["get", "list"]

This ClusterRole grants the necessary permissions to inspect and manage various Kubernetes resources.

How to use it

  1. Grab you a free Demo from here
  2. You will receive a .zip file that you have to unzip:
$ unzip fsa_tool_clyso_0.16.0_linux_amd64.zip
LICENSE
config
fsa
  1. Now you see a fsa and a config file. This two files you have to copy on a control-plane node where it can execute kubectl and kubeadm or if not possible (f.e. on Talos Linux) to a machine that can reach the cluster and has a configured kubectl.
  2. Create a config directory on your "$HOME" or your choosen user and copy the config file into it:
## Create the directory
$ mkdir $HOME/.fsa

## Copy the config
$ cp config $HOME/.fsa/config
  1. Copy also the fsa Binary to your place where you like to execute it:
## Copy the fsa
$ cp fsa $HOME/fsa
  1. Execute now the Binary, this will generate a file called report.json:
$ ./fsa inspect all -o json
...
JSON report written to report.json
  1. Upload the report.json to https://analyzer.clyso.com/#/analyzer/kubernetes
  2. Check your report and improve your Cluster!

Let us know if you are missing something or find improvements!

ceph-volume - ceph osd migrate DB to larger ssd/flash device

· One min read
Joachim Kraftmayer
Managing Director at Clyso

UPDATE docs.clyso.com/blog/ceph-volume-ceph-osd-migrate-db-to-larger-ssd-flash-device/

ceph-volume is a proper way of doing db migration. It uses ceph-bluestore-tool for the actual data movement and some other low-level stuff but LVM metadata updates are performed exclusivly by ceph-volume.

[1] https://docs.ceph.com/en/latest/ceph-volume/lvm/migrate/

[2] https://docs.ceph.com/en/latest/ceph-volume/lvm/newdb/

[3] https://docs.ceph.com/en/latest/ceph-volume/lvm/newwal/

How to Configure Deep-Scrubbing

· One min read
Joachim Kraftmayer
Managing Director at Clyso

Ceph cluster health state

root@clyso-ceph:/home/ceph# ceph -s
cluster:
id: FEBB01CC-4AA5-4C1D-80C4-9D91901467C8
health: HEALTH_WARN
256 pgs not deep-scrubbed in time
266 pgs not scrubbed in time

Admin has noticed that deep-scrubbing has fallen behind.

Possible actions

If impacting ceph performance is not a concern. osd max scrubs and osd scrub load threshold can be carefully be adapted, but that can have a huge impact on ceph cluster performance.

Show current config

root@clyso-ceph:/home/ceph# ceph config show osd.1 osd_max_scrubs
1
root@clyso-ceph:/home/ceph#
root@clyso-ceph:/home/ceph# ceph config get osd osd_scrub_load_threshold
0.500000
root@clyso-ceph:/home/ceph#

Set osd_max_scrub

ceph config set osd_max_scrubs 2

Verify setting

Ceph config database

root@ceph-vm-az1-1:/home/kraftmayerj# ceph config get osd osd_max_scrubs
2
root@ceph-vm-az1-1:/home/kraftmayerj#

Ceph osd active settings (osd.1)

root@clyso-ceph:/home/ceph# ceph config show osd.1 osd_max_scrubs
2
root@clyso-ceph:/home/ceph#

Sources

osd_max_scrubs

osd_scrub_load_threshold

osd_scrub_during_recovery

Factor 37 – Ceph Cluster Scaling

· One min read
Joachim Kraftmayer
Managing Director at Clyso

At the request of a customer, we were asked to adapt the performance and capacity of their existing Ceph cluster to the growth in ongoing production.

As a result, without maintenance windows and without service interruption, we have increased the Ceph cluster by 3700% during operation.

ONAP in Managed Kubernetes

· One min read
Joachim Kraftmayer
Managing Director at Clyso

On behalf of the customer, we provided the optimal Kubernetes platform for ONAP as a managed service.

ONAP is a comprehensive platform for orchestration, management, and automation of network and edge computing services for network operators,

cloud providers, and enterprises. Real-time, policy-driven orchestration and automation of physical and virtual network functions enables rapid automation of new services and complete lifecycle management critical for 5G and next-generation networks.

Fix CephFS Filesystem Read-Only

· One min read
Joachim Kraftmayer
Managing Director at Clyso

After a reboot of the MDS Server it can happen that the CephFS Filesystem becomes read-only:

HEALTH_WARN 1 MDSs are read only
[WRN] MDS_READ_ONLY: 1 MDSs are read only
mds.XXX(mds.0): MDS in read-only mode
[https://tracker.ceph.com/issues/58082](https://tracker.ceph.com/issues/58082)

In the MDS log you will find following entry

log_channel(cluster) log [ERR] : failed to commit dir 0x1 object, errno -22 mds.0.11963 unhandled write error (22) Invalid argument, force readonly... mds.0.cache force file system read-only log_channel(cluster) log [WRN] : force file system read-only mds.0.server force_clients_readonly

https://tracker.ceph.com/issues/58082

This is a known upstream issue thought the fix is still not merged

As a workaround you can use following steps:

ceph config set mds mds_dir_max_commit_size 80
ceph fs fail <fs_name>
ceph fs set <fs_name> joinable true

If not successful you may need to increase the mds_dir_max_commit_size, e.g. to 160