Update 2024-08-10-recovering-ceph-cluster.md

This commit is contained in:
0x3bb 2024-08-10 14:28:24 +00:00
parent df00d13f84
commit 9ee96d37a5

View File

@ -33,8 +33,7 @@ the old OSDs from the tree and letting replication work its magic.
I noticed although the block pool was replicated, we had lost all our RADOS I noticed although the block pool was replicated, we had lost all our RADOS
object storage. object storage.
![](/b/images/ec-2-1.png)
![](./images/ec-2-1.png)
The erasure-coding profile was `k=2, m=1`. That meant we could only lose 2 The erasure-coding profile was `k=2, m=1`. That meant we could only lose 2
OSDs, which had already happened. OSDs, which had already happened.
@ -76,7 +75,7 @@ pods, I saw (but did not understand) the problem:
`osd/ECUtil.h: 34: FAILED ceph_assert(stripe_width % stripe_size == 0)` `osd/ECUtil.h: 34: FAILED ceph_assert(stripe_width % stripe_size == 0)`
![](./images/ec-3-2.png) ![](/b/images/ec-3-2.png)
With the _"fixed"_ configuration, what I had actually done is split the object With the _"fixed"_ configuration, what I had actually done is split the object
store pool across _5_ OSDs. We had _3_. store pool across _5_ OSDs. We had _3_.
@ -534,6 +533,7 @@ I rebuilt the mon data, using the existing RocksDB kv store.
This would have worked without the backup, but I was interested to see the This would have worked without the backup, but I was interested to see the
`osdmaps` trimmed due to the other 2 removed OSDs. `osdmaps` trimmed due to the other 2 removed OSDs.
```
[root@he-prod-k3s-controlplane-ch-a-1 ceph]# ceph-objectstore-tool --type bluestore --data-path /var/lib/ceph/osd/ceph-0/ --op update-mon-db --mon-store-path /tmp/mon-a/data/ [root@he-prod-k3s-controlplane-ch-a-1 ceph]# ceph-objectstore-tool --type bluestore --data-path /var/lib/ceph/osd/ceph-0/ --op update-mon-db --mon-store-path /tmp/mon-a/data/
osd.0 : 3099 osdmaps trimmed, 635 osdmaps added. osd.0 : 3099 osdmaps trimmed, 635 osdmaps added.
``` ```