(1)查看集群狀態(tài),發(fā)現(xiàn)2個(gè)osd 狀態(tài)為down
成都創(chuàng)新互聯(lián)專注于企業(yè)成都營(yíng)銷網(wǎng)站建設(shè)、網(wǎng)站重做改版、太倉(cāng)網(wǎng)站定制設(shè)計(jì)、自適應(yīng)品牌網(wǎng)站建設(shè)、成都h5網(wǎng)站建設(shè)、商城網(wǎng)站制作、集團(tuán)公司官網(wǎng)建設(shè)、成都外貿(mào)網(wǎng)站制作、高端網(wǎng)站制作、響應(yīng)式網(wǎng)頁(yè)設(shè)計(jì)等建站業(yè)務(wù),價(jià)格優(yōu)惠性價(jià)比高,為太倉(cāng)等各大城市提供網(wǎng)站開(kāi)發(fā)制作服務(wù)。
[root@node140 /]# ceph -s
cluster:
id: 58a12719-a5ed-4f95-b312-6efd6e34e558
health: HEALTH_ERR
noout flag(s) set
2 osds down
1 scrub errors
Possible data damage: 1 pg inconsistent
Degraded data redundancy: 1633/10191 objects degraded (16.024%), 84 pgs degraded, 122 pgs undersized
services:
mon: 2 daemons, quorum node140,node142 (age 3d)
mgr: admin(active, since 3d), standbys: node140
osd: 18 osds: 16 up (since 3d), 18 in (since 5d)
flags noout
data:
pools: 2 pools, 384 pgs
objects: 3.40k objects, 9.8 GiB
usage: 43 GiB used, 8.7 TiB / 8.7 TiB avail
pgs: 1633/10191 objects degraded (16.024%)
261 active+clean
84 active+undersized+degraded
38 active+undersized
1 active+clean+inconsistent
(2)查看osd狀態(tài)
[root@node140 /]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 9.80804 root default
-2 3.26935 host node140
0 hdd 0.54489 osd.0 up 1.00000 1.00000
1 hdd 0.54489 osd.1 up 1.00000 1.00000
2 hdd 0.54489 osd.2 up 1.00000 1.00000
3 hdd 0.54489 osd.3 up 1.00000 1.00000
4 hdd 0.54489 osd.4 up 1.00000 1.00000
5 hdd 0.54489 osd.5 up 1.00000 1.00000
-3 3.26935 host node141
12 hdd 0.54489 osd.12 up 1.00000 1.00000
13 hdd 0.54489 osd.13 up 1.00000 1.00000
14 hdd 0.54489 osd.14 up 1.00000 1.00000
15 hdd 0.54489 osd.15 up 1.00000 1.00000
16 hdd 0.54489 osd.16 up 1.00000 1.00000
17 hdd 0.54489 osd.17 up 1.00000 1.00000
-4 3.26935 host node142
6 hdd 0.54489 osd.6 up 1.00000 1.00000
7 hdd 0.54489 osd.7 down 1.00000 1.00000
8 hdd 0.54489 osd.8 down 1.00000 1.00000
9 hdd 0.54489 osd.9 up 1.00000 1.00000
10 hdd 0.54489 osd.10 up 1.00000 1.00000
11 hdd 0.54489 osd.11 up 1.00000 1.00000
(3)osd 7 osd 8狀態(tài)查看,已經(jīng)failed了,重啟也無(wú)法啟動(dòng)
[root@node140 /]# systemctl status ceph-osd@8.service
● ceph-osd@8.service - Ceph object storage daemon osd.8
Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Fri 2019-08-30 17:36:50 CST; 1min 20s ago
Process: 433642 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=1/FAILURE)
Aug 30 17:36:50 node140 systemd[1]: Failed to start Ceph object storage daemon osd.8.
Aug 30 17:36:50 node140 systemd[1]: Unit ceph-osd@8.service entered failed state.
Aug 30 17:36:50 node140 systemd[1]: ceph-osd@8.service failed.
Aug 30 17:36:50 node140 systemd[1]: ceph-osd@8.service holdoff time over, scheduling restart.
Aug 30 17:36:50 node140 systemd[1]: Stopped Ceph object storage daemon osd.8.
Aug 30 17:36:50 node140 systemd[1]: start request repeated too quickly for ceph-osd@8.service
Aug 30 17:36:50 node140 systemd[1]: Failed to start Ceph object storage daemon osd.8.
Aug 30 17:36:50 node140 systemd[1]: Unit ceph-osd@8.service entered failed state.
Aug 30 17:36:50 node140 systemd[1]: ceph-osd@8.service failed.
(4)osd硬盤故障,狀態(tài)變化
osd硬盤故障,狀態(tài)變?yōu)閐own。在經(jīng)過(guò)mod osd down out interval 設(shè)定的時(shí)間間隔后,ceph將其標(biāo)記為out,并開(kāi)始進(jìn)行數(shù)據(jù)遷移恢復(fù)。 為了降低影響可以先關(guān)閉,待硬盤更換完成后再開(kāi)啟
[root@node140 /]# cat /etc/ceph/ceph.conf
[global]
mon osd down out interval = 900
(5)停止數(shù)據(jù)均衡
[root@node140 /]# for i in noout nobackfill norecover noscrub nodeep-scrub;do ceph osd set $i;done
(6)定位i故障盤
[root@node140 /]# ceph osd tree | grep -i down
7 hdd 0.54489 osd.7 down 0 1.00000
8 hdd 0.54489 osd.8 down 0 1.00000
(7)卸載故障的節(jié)點(diǎn)
[root@node142 ~]# umount /var/lib/ceph/osd/ceph-7
[root@node142 ~]# umount /var/lib/ceph/osd/ceph-8
(8)從crush map 中移除osd
[root@node142 ~]# ceph osd crush remove osd.7
removed item id 7 name 'osd.7' from crush map
[root@node142 ~]# ceph osd crush remove osd.8
removed item id 8 name 'osd.8' from crush map
(9)刪除故障osd的密鑰
[root@node142 ~]# ceph auth del osd.7
updated
[root@node142 ~]# ceph auth del osd.8
updated
(10)刪除故障osd
[root@node142 ~]# ceph osd rm 7
removed osd.7
[root@node142 ~]# ceph osd rm 8
removed osd.8
[root@node142 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 8.71826 root default
-2 3.26935 host node140
0 hdd 0.54489 osd.0 up 1.00000 1.00000
1 hdd 0.54489 osd.1 up 1.00000 1.00000
2 hdd 0.54489 osd.2 up 1.00000 1.00000
3 hdd 0.54489 osd.3 up 1.00000 1.00000
4 hdd 0.54489 osd.4 up 1.00000 1.00000
5 hdd 0.54489 osd.5 up 1.00000 1.00000
-3 3.26935 host node141
12 hdd 0.54489 osd.12 up 1.00000 1.00000
13 hdd 0.54489 osd.13 up 1.00000 1.00000
14 hdd 0.54489 osd.14 up 1.00000 1.00000
15 hdd 0.54489 osd.15 up 1.00000 1.00000
16 hdd 0.54489 osd.16 up 1.00000 1.00000
17 hdd 0.54489 osd.17 up 1.00000 1.00000
-4 2.17957 host node142
6 hdd 0.54489 osd.6 up 1.00000 1.00000
9 hdd 0.54489 osd.9 up 1.00000 1.00000
10 hdd 0.54489 osd.10 up 1.00000 1.00000
11 hdd 0.54489 osd.11 up 1.00000 1.00000
[root@node142 ~]#
(11)更換故障硬盤查看盤符,然后重建
[root@node142 ~]# ceph-volume lvm create --data /dev/sdd
[root@node142 ~]# ceph-volume lvm create --data /dev/sdc
(12)
[root@node142 ~]# ceph-volume lvm list
(13)待新osd添加crush map后,重新開(kāi)啟集群禁用標(biāo)志
for i in noout nobackfill norecover noscrub nodeep-scrub;do ceph osd unset $i;done