Since 2021, we have supported one of our customers in the architecture and operation of more than 150 Kubernetes clusters with Ceph worldwide in all hyperscalers, such as GCP, AWS, Microsoft Azure and AliCloud.
Terraform in Multi Cloud Deployment
Use of programming languages to define customised infrastructure environments using Terraform. Use of language constructs and tools that follow Infrastructure as Code (IoC) design patterns.
Use of functions and libraries within programming languages to develop complex infrastructure projects.
Use programming languages such as TypeScript, Python and Go to map multi-cloud environments. With the modular and open architecture to include hundreds of vendors and thousands of module definitions.
timeseries monitoring
After more than 10 years of experience with different evolutionary stages of timeseries monitoring systems and the growing need for metrics, we decided to replace database and file based solutions.
To store metrics for the long term, we rely on object storage technology to provide almost unlimited storage capacity in the backend.
The Object Storage is provided to us by multiple Ceph clusters. We are now also able to dynamically connect alternative storage locations, such as AWS and/or GCP, as needed.
TISAX compliant storage solution with Ceph
*TISAX* (Trusted Information Security Assessment Exchange) is a standard for information security defined by the automotive industry. A large number of automotive manufacturers and suppliers in the German automotive industry have required many business partners to have existing TISAX certification since 2017.
[https://de.wikipedia.org/wiki/TISAX](https://de.wikipedia.org/wiki/TISAX)
On behalf of the customer, we support the customer in replacing its existing storage solution and introducing and commissioning Ceph as a future-proof and TISAX-compliant storage solution for its internal processes and data volumes.
The customer decided to connect its existing environment with NFS and Kerberos authentication and for its private cloud to connect via RBD.
Reddit Challenge Accepted - Is 10k IOPs achievable with NVMes?
Hello Ceph community! It's that time again for another blog post! Recently, a user on the ceph subreddit asked whether Ceph could deliver 10K IOPs in a combined random read/write FIO workload from one client. The setup consists of 6 nodes with 2 4GB FireCuda NVMe drives each. They wanted to know if anyone would mind benchmarking a similar setup and report the results. Here at Clyso we are actively working on improving the Ceph code to achieve higher performance. We have our own tests and configurations for evaluating our changes to the code, but it just so happens that one of the places we do our work (the upstream ceph community performance lab!) appears to be a good match for testing this user's request. We decided to sit down for a couple of hours and give it a try. u/DividedbyPi, one of our friends over at 45drives.com, wrote that they are also going to give it a shot and report the results on their youtube channel in the coming weeks. We figure this could be a fun way to get results from multiple vendors. Let's see what happens!
Cephalocon 2023
Cephalocon 2023 in Amsterdam saw the entire Ceph community out in force. And Clyso had a big presence with six talks and as host of the whole event!
Ceph Day NYC 2023
The Ceph Community hosted its first post-pandemic event at the Bloomberg offices in New York City. Ceph Day NYC was a great success!
Ceph Reef vs Quincy RBD Performance
Clyso's Mark Nelson has written the first part in a series looking at performance testing of the upcoming Ceph Reef release vs the previous Quincy release. See the blog post here!
Please feel free to contact us if you are interested in Ceph support or performance consulting!
ceph - how do disable mclock scheduler
After more than 4 years of development, mclock is the default scheduler for ceph quincy (version 17).If you don't want to use the scheduler, you can disable it with the option osd_op_queue.
WPQ was the default before Ceph Quincy and the change requires a restart of the OSDs.
Source:
https://docs.ceph.com/en/quincy/rados/configuration/osd-config-ref/#confval-osd_op_queue"
https://docs.ceph.com/en/quincy/rados/configuration/osd-config-ref/#qos-based-on-mclock>"
Fix CephFS Filesystem Read-Only
After a reboot of the MDS Server it can happen that the CephFS Filesystem becomes read-only:
HEALTH_WARN 1 MDSs are read only
[WRN] MDS_READ_ONLY: 1 MDSs are read only
mds.XXX(mds.0): MDS in read-only mode
[https://tracker.ceph.com/issues/58082](https://tracker.ceph.com/issues/58082)
In the MDS log you will find following entry
log_channel(cluster) log [ERR] : failed to commit dir 0x1 object, errno -22 mds.0.11963 unhandled write error (22) Invalid argument, force readonly... mds.0.cache force file system read-only log_channel(cluster) log [WRN] : force file system read-only mds.0.server force_clients_readonly
This is a known upstream issue thought the fix is still not merged
As a workaround you can use following steps:
ceph config set mds mds_dir_max_commit_size 80
ceph fs fail <fs_name>
ceph fs set <fs_name> joinable true
If not successful you may need to increase the mds_dir_max_commit_size, e.g. to 160