Running a Distributed Ceph Cluster on a Kubernetes Cluster


This guide automates installing a native ceph cluster inside a running kubernetes native cluster. It requires creating and attaching 3 additional hard drive disk images to 3 kubernetes cluster vm’s (tested on CentOS 7). This guide assumes your kubernetes cluster is using kvm with virsh for running the attach-disk commands (it was tested with kubernetes version 1.13.3).

By default, the disk images will be installed at: /cephdata/m[123]/k8-centos-m[123]. These disks will be automatically partitioned and formatted using ceph zap, and zap will format each disk using the recommended XFS filesystem.


This is a work in progress and things will likely change. This guide will be updated as progress proceeds.


This installer was built to replace Rook-Ceph after encountering cluster stability issues after ~30 days of uptime in 2019. The steps are taken from the Ceph Helm installer:

Add the Ceph Mon Cluster Service FQDN to /etc/hosts

Before starting, please ensure each kubernetes vm has the following entries in /etc/hosts:


sudo echo "    ceph-mon.ceph.svc.cluster.local" >> /etc/hosts


sudo echo "    ceph-mon.ceph.svc.cluster.local" >> /etc/hosts


sudo echo "    ceph-mon.ceph.svc.cluster.local" >> /etc/hosts


Missing this step can result in some debugging

Build KVM HDD Images

Change to the ceph directory.

cd ceph

Generate 100 GB hdd images for the ceph cluster with 1 qcow2 image for each of the three vm’s:


The files are saved here:

├── m1
│   └── k8-centos-m1
├── m2
│   └── k8-centos-m2
└── m3
    └── k8-centos-m3

Attach KVM Images to VMs

This will attach each 100 GB image to the correct vm: m1, m2 or m3


Format Disks in VM

With automatic ssh root login access, you can run this to partition, mount and format each of the new images:


Please be careful running this as it can delete any previously saved data.


Please be aware that fdisk can also hang and requires hard rebooting the cluster if orphaned fdisk processes get stuck. Please let me know if you have a way to get around this. There are many discussions like the process that would not die about this issue on the internet.


Install Ceph on All Kubernetes Nodes

Please add ceph-common, centos-release-ceph-luminous and lsof to all kubernetes node vm’s before deploying ceph.

For additional set up please refer to the official ceph docs:

For CentOS 7 you can run the ./ceph/ script or the commands:

sudo rpm --import ""
sudo yum install -y ceph-common centos-release-ceph-luminous lsof

Deploy Ceph Cluster

Ceph requires running a local Helm repo server (just like the Redis cluster does) and building then installing chart to get the cluster pods running.


Watch all Ceph Logs with Kubetail

With kubetail installed you can watch all the ceph pods at once with:


or manually with:

kubetail ceph -c cluster-log-tailer -n ceph

Show Pods

View the ceph cluster pods with:

Getting Ceph pods with:
kubectl get pods -n ceph

NAME                                        READY   STATUS      RESTARTS   AGE
ceph-mds-85b4fbb478-wjmxb                   1/1     Running     1          4m38s
ceph-mds-keyring-generator-pvh4l            0/1     Completed   0          4m38s
ceph-mgr-588577d89f-w8p8v                   1/1     Running     1          4m38s
ceph-mgr-keyring-generator-76l5r            0/1     Completed   0          4m38s
ceph-mon-429mk                              3/3     Running     0          4m39s
ceph-mon-6fvv6                              3/3     Running     0          4m39s
ceph-mon-75n4t                              3/3     Running     0          4m39s
ceph-mon-check-549b886885-cb64q             1/1     Running     0          4m38s
ceph-mon-keyring-generator-q26p2            0/1     Completed   0          4m38s
ceph-namespace-client-key-generator-bbvt2   0/1     Completed   0          4m38s
ceph-osd-dev-vdb-96v7h                      1/1     Running     0          4m39s
ceph-osd-dev-vdb-g9zkg                      1/1     Running     0          4m39s
ceph-osd-dev-vdb-r5fxr                      1/1     Running     0          4m39s
ceph-osd-keyring-generator-6pg77            0/1     Completed   0          4m38s
ceph-rbd-provisioner-5cf47cf8d5-kbfvt       1/1     Running     0          4m38s
ceph-rbd-provisioner-5cf47cf8d5-pwj4s       1/1     Running     0          4m38s
ceph-rgw-7b9677854f-8d7s5                   1/1     Running     1          4m38s
ceph-rgw-keyring-generator-284kp            0/1     Completed   0          4m38s
ceph-storage-keys-generator-bc6dq           0/1     Completed   0          4m38s

Check Cluster Status

With the cluster running you can quickly check the cluster status with:

Getting Ceph cluster status:

kubectl -n ceph exec -ti ceph-mon-check-549b886885-cb64q -c ceph-mon -- ceph -s
    id:     aa06915f-3cf6-4f74-af69-9afb41bf464d
    health: HEALTH_OK

    mon: 3 daemons, quorum,,
    mds: cephfs-1/1/1 up  {0=mds-ceph-mds-85b4fbb478-wjmxb=up:active}
    osd: 3 osds: 3 up, 3 in
    rgw: 1 daemon active

    pools:   7 pools, 148 pgs
    objects: 208 objects, 3359 bytes
    usage:   325 MB used, 284 GB / 284 GB avail
    pgs:     148 active+clean

Validate a Pod can Mount a Persistent Volume on the Ceph Cluster in Kubernetes

Run these steps to walk through integration testing your kubernetes cluster can host persistent volumes for pods running on a ceph cluster inside kubernetes. This means your data is backed to an attached storage disk on the host vm in:


If any of these steps fail please refer to the Kubernetes Ceph Cluster Debugging Guide

ls /cephdata/*/*
/cephdata/m1/k8-centos-m1  /cephdata/m2/k8-centos-m2  /cephdata/m3/k8-centos-m3

Create PVC

kubectl apply -f test/pvc.yml

Verify PVC is Bound

kubectl get pvc | grep test-ceph
test-ceph-pv-claim        Bound    pvc-a715256d-38c3-11e9-8e7c-525400275ad4   1Gi        RWO            ceph-rbd          46s

Create Pod using PVC as a mounted volume

kubectl apply -f test/mount-pv-in-pod.yml

Verify Pod has Mounted Volume inside Container

kubectl describe pod ceph-tester

Verify Ceph is Handling Data


Getting Ceph osd status:
kubectl -n ceph exec -it ceph-rgw-7b9677854f-lcr77 -- ceph osd status
| id |         host        |  used | avail | wr ops | wr data | rd ops | rd data |   state   |
| 0  | |  141M | 94.8G |    0   |     0   |    1   |    16   | exists,up |
| 1  | |  141M | 94.8G |    0   |     0   |    0   |     0   | exists,up |
| 2  | |  141M | 94.8G |    0   |     0   |    0   |     0   | exists,up |

Delete Ceph Tester Pod

kubectl delete -f test/mount-pv-in-pod.yml

Recreate Ceph Tester Pod

kubectl apply -f test/mount-pv-in-pod.yml

View Logs from Previous Pod

kubectl logs -f $(kubectl get po | grep ceph-tester | awk '{print $1}')

Notice the last entries in the log show the timestamp changed in the logs like:

kubectl logs -f $(kubectl get po | grep ceph-tester | awk '{print $1}')
total 20
drwx------    2 root     root         16384 Feb 25 07:31 lost+found
-rw-r--r--    1 root     root            29 Feb 25 07:33 updated
Filesystem                Size      Used Available Use% Mounted on
/dev/rbd0               975.9M      2.5M    957.4M   0% /testing
last update:
Mon Feb 25 07:33:34 UTC 2019
Mon Feb 25 08:29:27 UTC 2019

Cleanup Ceph Tester Pod

kubectl delete -f test/mount-pv-in-pod.yml
kubectl delete -f test/pvc.yml

Kubernetes Ceph Cluster Debugging Guide

Confirm Ceph OSD pods are using the KVM Mounted Disks

If the cluster is in a HEALTH_WARN state with a message about low on available space:

Getting Ceph cluster status:

kubectl -n ceph exec -ti ceph-mon-kjcqq -c ceph-mon -- ceph -s
    id:     747d4fc1-2d18-423a-96fe-43419f8fe9cd
    health: HEALTH_WARN
            mons, are low on available space

Then please confirm the vms all mounted the correct storage disks for ceph. This could be due to your /etc/fstab entries failing to mount (say after a cluster reboot), which we can quickly check with:


If you see something like:

Checking Ceph OSD Pod Mountpoints for /dev/vdb1:

checking: ceph-osd-dev-vdb-5dv8l
kubectl -n ceph exec -it ceph-osd-dev-vdb-5dv8l -- df -h /var/lib/ceph/
failed: ceph-osd-dev-vdb-5dv8l is using /dev/mapper/centos-root
checking: ceph-osd-dev-vdb-s77lh
kubectl -n ceph exec -it ceph-osd-dev-vdb-s77lh -- df -h /var/lib/ceph/
failed: ceph-osd-dev-vdb-s77lh is using /dev/mapper/centos-root
checking: ceph-osd-dev-vdb-vxvd7
kubectl -n ceph exec -it ceph-osd-dev-vdb-vxvd7 -- df -h /var/lib/ceph/
failed: ceph-osd-dev-vdb-vxvd7 is using /dev/mapper/centos-root
detected at least one Ceph OSD mount failure
Please review the Ceph debugging guide: for more details on how to fix this issue

Then the correct storage disk(s) failed to mount correctly, and ceph is using the wrong disk for extended, persistent storage on the vm. This can put your ceph cluster into a HEALTH_WARN state as seen above in the cluster status script.

To fix this error, please either use the ./ (if you are ok reformatting all previous ceph data on the disks) or manually with the following steps:

  1. Fix /etc/fstab on all vms


    Only run these steps when the cluster can be taken down as it will interrupt services

    Confirm the /etc/fstab entry has the correct value:

    cat /etc/fstab | grep vdb1
    /dev/vdb1 /var/lib/ceph  xfs     defaults    0 0

    For any vm that does not have the /etc/fstab entry, please run these commands as root to set them up manually:

  2. Delete the bad mountpoint: /var/lib/ceph

    rm -rf /var/lib/ceph
  3. Add the new /dev/vdb entry to /etc/fstab

    sudo echo "/dev/vdb1 /var/lib/ceph  xfs     defaults    0 0" >> /etc/fstab
  4. Mount the disk

    mount /dev/vdb1 /var/lib/ceph
  5. Uninstall Ceph


    Running ./ will impact any pods using the ceph-rbd storageClass

  6. Reinstall Ceph or Reboot all impacted vms

  7. Confirm the Mounts Worked


The ceph-tester failed to start

If your integration test fails mounting the test persistent volume follow these steps to try and debug the issue:

Check if the ceph-mon service is missing a ClusterIP:

get svc -n ceph
ceph-mon   ClusterIP   None            <none>        6789/TCP   11m
ceph-rgw   ClusterIP   <none>        8088/TCP   11m

See if there is a log in the ceph-tester showing the error.

kubectl describe po ceph-tester

May show something similar to this for why it failed:

server name not found: ceph-mon.ceph.svc.cluster.local

If ceph-mon.ceph.svc.cluster.local is not found, manually add it to /etc/hosts on all nodes.

m1 node:

# on m1 /etc/hosts add:    ceph-mon.ceph.svc.cluster.local

Confirm connectivity

telnet ceph-mon.ceph.svc.cluster.local 6789

m2 node:

# on m2 /etc/hosts add:    ceph-mon.ceph.svc.cluster.local

Confirm connectivity

telnet ceph-mon.ceph.svc.cluster.local 6789

m3 node:

# on m3 /etc/hosts add:    ceph-mon.ceph.svc.cluster.local

Confirm connectivity

telnet ceph-mon.ceph.svc.cluster.local 6789

If connectivity was fixed on all the kubernetes nodes then please ./ and then reinstall with ./

If not please continue to the next debugging section below.

Orphaned fdisk Processes

If you have to use the ./ -f to uninstall and re-partition the disk images, there is a chance the partition tool fdisk can hang. If this happens it should hang the ./ -f and be detected by the user or the script (hopefully).

If your cluster hits this issue I have to reboot my server.


This guide does not handle single kubernetes vm outages at the moment.

For the record, here’s some attempts to kill this process:

root@master3:~# ps auwwx | grep fdisk
root     18516  0.0  0.0 112508   976 ?        D    06:33   0:00 fdisk /dev/vdb
root     21957  0.0  0.0 112704   952 pts/1    S+   06:37   0:00 grep --color fdisk
root@master3:~# kill -9 18516
root@master3:~# ps auwwx | grep fdisk
root     18516  0.0  0.0 112508   976 ?        D    06:33   0:00 fdisk /dev/vdb
root     22031  0.0  0.0 112704   952 pts/1    S+   06:37   0:00 grep --color fdisk
root@master3:~# strace -p 18516
strace: Process 18516 attached
# no more logs after waiting +60 seconds
strace: Process 18516 attached
[1]+  Stopped                 strace -p 18516
# so did strace just die by touching that pid?

What is fdisk using on the filesystem?

Notice multiple ssh pipe resources are in use below. Speculation here: are those pipes the fdisk wait prompt over a closed ssh session (I am guessing but who knows)?

root@master3:~# lsof -p 18516
fdisk   18516 root  cwd    DIR  253,0       271 100663361 /root
fdisk   18516 root  rtd    DIR  253,0       285        64 /
fdisk   18516 root  txt    REG  253,0    200456  33746609 /usr/sbin/fdisk
fdisk   18516 root  mem    REG  253,0 106070960      1831 /usr/lib/locale/locale-archive
fdisk   18516 root  mem    REG  253,0   2173512  33556298 /usr/lib64/
fdisk   18516 root  mem    REG  253,0     20112  33556845 /usr/lib64/
fdisk   18516 root  mem    REG  253,0    261488  33556849 /usr/lib64/
fdisk   18516 root  mem    REG  253,0    164240  33556291 /usr/lib64/
fdisk   18516 root    0r  FIFO    0,9       0t0    847143 pipe
fdisk   18516 root    1w  FIFO    0,9       0t0    845563 pipe
fdisk   18516 root    2w  FIFO    0,9       0t0    845564 pipe
fdisk   18516 root    3u   BLK 252,16     0t512      1301 /dev/vdb

Stop strace that will prevent gdb tracing next:

root@master3:~# ps auwwx | grep 26177
root     14082  0.0  0.0 112704   952 pts/0    S+   07:02   0:00 grep --color 26177
root     26177  0.0  0.0   7188   600 ?        S    06:41   0:00 strace -p 18516
root@master3:~# kill -9 26177

gdb also hangs when trying this stackoverflow:

gdb -p 18516
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-110.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
Attaching to process 18516

If a vm gets to this point then the server gets rebooted.

Here are other operational debugging tools that were used with cluster start up below:

Check osd pods

When setting up new devices with kubernetes you will see the osd pods failing and here is a tool to describe one of the pods quickly:


Watch the Ceph Mon Logs with Kubetail

kubetail ceph-mon -c cluster-log-tailer -n ceph

Attach Successful but Mounting a Ceph PVC fails

Even if the cluster is stable, your pv’s can attach but fail to mount due to:

Type     Reason                  Age                 From                          Message
----     ------                  ----                ----                          -------
Normal   Scheduled               3m25s               default-scheduler             Successfully assigned default/busybox-mount to
Normal   SuccessfulAttachVolume  3m25s               attachdetach-controller       AttachVolume.Attach succeeded for volume "pvc-907ae639-3880-11e9-85a5-525400275ad4"
Warning  FailedMount             82s                 kubelet,  Unable to mount volumes for pod "busybox-mount_default(24ac4333-3881-11e9-85a5-525400275ad4)": timeout expired waiting for volumes to attach or mount for pod "default"/"busybox-mount". list of unmounted volumes=[storage]. list of unattached volumes=[storage default-token-6f9vj]
Warning  FailedMount             45s (x8 over 109s)  kubelet,  MountVolume.WaitForAttach failed for volume "pvc-907ae639-3880-11e9-85a5-525400275ad4" : fail to check rbd image status with: (executable file not found in $PATH), rbd output: ()

To fix this please:

  1. Install ceph-common on each kubernetes node.

  2. Uninstall the ceph cluster with:

    ./ -f
  3. Delete Remaining pv’s

    kubectl delete --ignore-not-found pv $(kubectl get pv | grep ceph-rbd | grep -v rook | awk '{print $1}')

Previous Cluster Cleanup Failed

Please run the if you see this kind of error when running the

Getting Ceph cluster status:

kubectl -n ceph exec -ti ceph-mon-p9tvw -c ceph-mon -- ceph -s
2019-02-24 06:02:12.468777 7f90f6509700  0 librados: client.admin authentication error (1) Operation not permitted
[errno 1] error connecting to the cluster
command terminated with exit code 1

OSD Issues

When debugging ceph osd issues, please start by reviewing the pod logs with:


OSD Pool Failed to Initialize

Depending on how many disks and the capacity of the ceph cluster, your first time creating the osd pool startup may hit an error during this command:

kubectl -n ceph exec -ti ${pod_name} -c ceph-mon -- ceph osd pool create rbd 256

With an error like:

creating osd pool
Error ERANGE:  pg_num 256 size 3 would mean 840 total pgs, which exceeds max 600 (mon_max_pg_per_osd 200 * num_in_osds 3)
command terminated with exit code 34
initializing osd
rbd: error opening default pool 'rbd'
Ensure that the default pool has been created or specify an alternate pool name.
command terminated with exit code 2

Please reduce the number at the end of the ceph osd pool create rbd 256 to:

kubectl -n ceph exec -ti ${pod_name} -c ceph-mon -- ceph osd pool create rbd 100

OSD Pod Prepare is Unable to Zap

To fix this error below, make sure the ceph-overrides.yaml is using the correct /dev/vdb path:

Traceback (most recent call last):
File "/usr/sbin/ceph-disk", line 9, in <module>
    load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
File "/usr/lib/python2.7/dist-packages/ceph_disk/", line 5717, in run
File "/usr/lib/python2.7/dist-packages/ceph_disk/", line 5668, in main
File "/usr/lib/python2.7/dist-packages/ceph_disk/", line 4737, in main_zap
File "/usr/lib/python2.7/dist-packages/ceph_disk/", line 1681, in zap
    raise Error('not full block device; cannot zap', dev)
ceph_disk.main.Error: Error: not full block device; cannot zap: /dev/vdb1

OSD unable to find IP Address

To fix this error below, make sure to either remove the network definitions in the ceph-overrides.yaml.

+ exec /usr/bin/ceph-osd --cluster ceph -f -i 2 --setuser ceph --setgroup disk
2019-02-24 08:53:40.592021 7f4313687e00 -1 unable to find any IP address in networks '' interfaces ''

Cluster Status Tools

Show All


Show Cluster Status

Getting Ceph status:
kubectl -n ceph exec -it ceph-rgw-7b9677854f-k6hj7 -- ceph status
    id:     384880f1-23f3-4a83-bff8-93624120a4cf
    health: HEALTH_OK

    mon: 3 daemons, quorum,,
    mds: cephfs-1/1/1 up  {0=mds-ceph-mds-85b4fbb478-9fhf4=up:active}
    osd: 3 osds: 3 up, 3 in
    rgw: 1 daemon active

    pools:   6 pools, 48 pgs
    objects: 208 objects, 3359 bytes
    usage:   324 MB used, 284 GB / 284 GB avail
    pgs:     48 active+clean

Show Ceph DF

Getting Ceph df:
kubectl -n ceph exec -it ceph-rgw-7b9677854f-k6hj7 -- ceph df
    284G      284G         323M          0.11
    NAME                    ID     USED     %USED     MAX AVAIL     OBJECTS
    .rgw.root               1      1113         0        92261M           4
    cephfs_data             2         0         0        92261M           0
    cephfs_metadata         3      2246         0        92261M          21
    default.rgw.control     4         0         0        92261M           8
    default.rgw.meta        5         0         0        92261M           0
    default.rgw.log         6         0         0        92261M           0

Show Ceph OSD Status

Getting Ceph osd status:
kubectl -n ceph exec -it ceph-rgw-7b9677854f-k6hj7 -- ceph osd status
| id |         host        |  used | avail | wr ops | wr data | rd ops | rd data |   state   |
| 0  | |  107M | 94.8G |    1   |    18   |    0   |    13   | exists,up |
| 1  | |  107M | 94.8G |    3   |   337   |    0   |     0   | exists,up |
| 2  | |  108M | 94.8G |    5   |   315   |    1   |   353   | exists,up |

Show Ceph Rados DF

Getting Ceph rados df:
kubectl -n ceph exec -it ceph-rgw-7b9677854f-k6hj7 -- rados df
.rgw.root           1113       4      0     12                  0       0        0     12 8192      4 4096
cephfs_data            0       0      0      0                  0       0        0      0    0      0    0
cephfs_metadata     2246      21      0     63                  0       0        0      0    0     42 8192
default.rgw.control    0       8      0     24                  0       0        0      0    0      0    0

total_objects    33
total_used       323M
total_avail      284G
total_space      284G


To uninstall the ceph cluster and leave the mounted KVM disks /dev/vdb untouched:


Uninstall and Reformat KVM Images

To uninstall the ceph cluster and reformat the mounted KVM disks /dev/vdb:


Running this will destroy all data across the cluster by reformatting the /dev/vdb block devices in each vm

./ -f