Linux中如何實(shí)現(xiàn)進(jìn)程D狀態(tài)死鎖檢測

小編給大家分享一下Linux中如何實(shí)現(xiàn)進(jìn)程D狀態(tài)死鎖檢測，相信大部分人都還不怎么了解，因此分享這篇文章給大家參考一下，希望大家閱讀完這篇文章后大有收獲，下面讓我們一起去了解一下吧！

目前創(chuàng)新互聯(lián)已為超過千家的企業(yè)提供了網(wǎng)站建設(shè)、域名、虛擬空間、網(wǎng)站托管、企業(yè)網(wǎng)站設(shè)計(jì)、石景山網(wǎng)站維護(hù)等服務(wù)，公司將堅(jiān)持客戶導(dǎo)向、應(yīng)用為本的策略，正道將秉承"和諧、參與、激情"的文化，與客戶和合作伙伴齊心協(xié)力一起成長，共同發(fā)展。

Linux的進(jìn)程存在多種狀態(tài)，如TASK_RUNNING的運(yùn)行態(tài)、EXIT_DEAD的停止態(tài)和TASK_INTERRUPTIBLE的接收信號(hào)的等待狀態(tài)等等(可在include/linux/sched.h中查看)。其中有一種狀態(tài)等待為TASK_UNINTERRUPTIBLE，稱為D狀態(tài)，該種狀態(tài)下進(jìn)程不接收信號(hào)，只能通過wake_up喚醒。處于這種狀態(tài)的情況有很多，例如mutex鎖就可能會(huì)設(shè)置進(jìn)程于該狀態(tài)，有時(shí)候進(jìn)程在等待某種IO資源就緒時(shí)(wait_event機(jī)制)會(huì)設(shè)置進(jìn)程進(jìn)入該狀態(tài)。一般情況下，進(jìn)程處于該狀態(tài)的時(shí)間不會(huì)太久，但若IO設(shè)備出現(xiàn)故障或者出現(xiàn)進(jìn)程死鎖等情況，進(jìn)程就可能長期處于該狀態(tài)而無法再返回到TASK_RUNNING態(tài)。因此，內(nèi)核為了便于發(fā)現(xiàn)這類情況設(shè)計(jì)出了hung task機(jī)制專門用于檢測長期處于D狀態(tài)的進(jìn)程并發(fā)出告警。本文分析內(nèi)核hung task機(jī)制的源碼并給出一個(gè)示例演示。

一、hung task機(jī)制分析

內(nèi)核在很早的版本中就已經(jīng)引入了hung task機(jī)制，本文以較新的Linux 4.1.15版本源碼為例進(jìn)行分析，代碼量并不多，源代碼文件為kernel/hung_task.c。

首先給出整體流程框圖和設(shè)計(jì)思想：

Linux中如何實(shí)現(xiàn)進(jìn)程D狀態(tài)死鎖檢測

圖 D狀態(tài)死鎖流程圖

其核心思想為創(chuàng)建一個(gè)內(nèi)核監(jiān)測進(jìn)程循環(huán)監(jiān)測處于D狀態(tài)的每一個(gè)進(jìn)程(任務(wù))，統(tǒng)計(jì)它們在兩次檢測之間的調(diào)度次數(shù)，如果發(fā)現(xiàn)有任務(wù)在兩次監(jiān)測之間沒有發(fā)生任何的調(diào)度則可判斷該進(jìn)程一直處于D狀態(tài)，很有可能已經(jīng)死鎖，因此觸發(fā)報(bào)警日志打印，輸出進(jìn)程的基本信息，?；厮菀约凹拇嫫鞅４嫘畔⒁怨﹥?nèi)核開發(fā)人員定位。

下面詳細(xì)分析實(shí)現(xiàn)方式：

[cpp] view plain copy  在CODE上查看代碼片派生到我的代碼片 static int __init hung_task_init(void)   {       atomic_notifier_chain_register(&panic_notifier_list, &panic_block);       watchdog_task = kthread_run(watchdog, NULL, "khungtaskd");          return 0;   }   subsys_initcall(hung_task_init);

首先，若在內(nèi)核配置中啟用了該機(jī)制，在內(nèi)核的subsys初始化階段就會(huì)調(diào)用hung_task_init()函數(shù)啟用功能，首先向內(nèi)核的panic_notifier_list通知鏈注冊回調(diào)：

[cpp] view plain copy  在CODE上查看代碼片派生到我的代碼片 static struct notifier_block panic_block = {       .notifier_call = hung_task_panic,   };

在內(nèi)核觸發(fā)panic時(shí)就會(huì)調(diào)用該hung_task_panic()函數(shù)，這個(gè)函數(shù)的作用稍后再看。繼續(xù)往下初始化，調(diào)用kthread_run()函數(shù)創(chuàng)建了一個(gè)名為khungtaskd的線程，執(zhí)行watchdog()函數(shù)，立即嘗試調(diào)度執(zhí)行。該線程就是專用于檢測D狀態(tài)死鎖進(jìn)程的后臺(tái)內(nèi)核線程。

[cpp] view plain copy  在CODE上查看代碼片派生到我的代碼片 /*   * kthread which checks for tasks stuck in D state   */   static int watchdog(void *dummy)   {       set_user_nice(current, 0);          for ( ; ; ) {           unsigned long timeout = sysctl_hung_task_timeout_secs;              while (schedule_timeout_interruptible(timeout_jiffies(timeout)))               timeout = sysctl_hung_task_timeout_secs;              if (atomic_xchg(&reset_hung_task, 0))               continue;              check_hung_uninterruptible_tasks(timeout);       }          return 0;   }

本進(jìn)程首先設(shè)置優(yōu)先級(jí)為0，即一般優(yōu)先級(jí)，不影響其他進(jìn)程。然后進(jìn)入主循環(huán)(每隔timeout時(shí)間執(zhí)行一次)，首先讓進(jìn)程睡眠，設(shè)置的睡眠時(shí)間為

CONFIG_DEFAULT_HUNG_TASK_TIMEOUT，可以通過內(nèi)核配置選項(xiàng)修改，默認(rèn)值為120s，睡眠結(jié)束被喚醒后判斷原子變量標(biāo)識(shí)reset_hung_task，若被置位則跳過本輪監(jiān)測，同時(shí)會(huì)清除該標(biāo)識(shí)。該標(biāo)識(shí)通過reset_hung_task_detector()函數(shù)設(shè)置(目前內(nèi)核中尚無其他程序使用該接口)：

[cpp] view plain copy  在CODE上查看代碼片派生到我的代碼片 void reset_hung_task_detector(void)   {       atomic_set(&reset_hung_task, 1);   }   EXPORT_SYMBOL_GPL(reset_hung_task_detector);

接下來循環(huán)的***即為監(jiān)測函數(shù)check_hung_uninterruptible_tasks()，函數(shù)入?yún)楸O(jiān)測超時(shí)時(shí)間。

[cpp] view plain copy  在CODE上查看代碼片派生到我的代碼片 /*   * Check whether a TASK_UNINTERRUPTIBLE does not get woken up for   * a really long time (120 seconds). If that happens, print out   * a warning.   */   static void check_hung_uninterruptible_tasks(unsigned long timeout)   {       int max_count = sysctl_hung_task_check_count;       int batch_count = HUNG_TASK_BATCHING;       struct task_struct *g, *t;          /*       * If the system crashed already then all bets are off,       * do not report extra hung tasks:       */       if (test_taint(TAINT_DIE) || did_panic)           return;          rcu_read_lock();       for_each_process_thread(g, t) {           if (!max_count--)               goto unlock;           if (!--batch_count) {               batch_count = HUNG_TASK_BATCHING;               if (!rcu_lock_break(g, t))                   goto unlock;           }           /* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */           if (t->state == TASK_UNINTERRUPTIBLE)               check_hung_task(t, timeout);       }    unlock:       rcu_read_unlock();   }

首先檢測內(nèi)核是否已經(jīng)DIE了或者已經(jīng)panic了，如果是則表明內(nèi)核已經(jīng)crash了，無需再進(jìn)行監(jiān)測了，直接返回即可。注意這里的did_panic標(biāo)識(shí)在前文中的panic通知鏈回調(diào)函數(shù)中hung_task_panic()置位：

[cpp] view plain copy  在CODE上查看代碼片派生到我的代碼片 static int   hung_task_panic(struct notifier_block *this, unsigned long event, void *ptr)   {       did_panic = 1;          return NOTIFY_DONE;   }

接下去若尚無觸發(fā)內(nèi)核crash，則進(jìn)入監(jiān)測流程并逐一檢測內(nèi)核中的所有進(jìn)程(任務(wù)task)，該過程在RCU加鎖的狀態(tài)下進(jìn)行，因此為了避免在進(jìn)程較多的情況下加鎖時(shí)間過長，這里設(shè)置了一個(gè)batch_count，一次最多檢測HUNG_TASK_BATCHING個(gè)進(jìn)程。于此同時(shí)用戶也可以設(shè)定***的檢測個(gè)數(shù)max_count=sysctl_hung_task_check_count，默認(rèn)值為***PID個(gè)數(shù)PID_MAX_LIMIT(通過sysctl命令設(shè)置)。

函數(shù)調(diào)用for_each_process_thread()函數(shù)輪詢內(nèi)核中的所有進(jìn)程(任務(wù)task)，僅對(duì)狀態(tài)處于TASK_UNINTERRUPTIBLE狀態(tài)的進(jìn)程進(jìn)行超時(shí)判斷，調(diào)用check_hung_task()函數(shù)，入?yún)閠ask_struct結(jié)構(gòu)和超時(shí)時(shí)間(120s)：

[cpp] view plain copy  在CODE上查看代碼片派生到我的代碼片 static void check_hung_task(struct task_struct *t, unsigned long timeout)   {       unsigned long switch_count = t->nvcsw + t->nivcsw;          /*       * Ensure the task is not frozen.       * Also, skip vfork and any other user process that freezer should skip.       */       if (unlikely(t->flags & (PF_FROZEN | PF_FREEZER_SKIP)))           return;          /*       * When a freshly created task is scheduled once, changes its state to       * TASK_UNINTERRUPTIBLE without having ever been switched out once, it       * musn't be checked.       */       if (unlikely(!switch_count))           return;          if (switch_count != t->last_switch_count) {           t->last_switch_count = switch_count;           return;       }          trace_sched_process_hang(t);          if (!sysctl_hung_task_warnings)           return;          if (sysctl_hung_task_warnings > 0)           sysctl_hung_task_warnings--;

首先通過t->nvcsw和t->nivcsw的計(jì)數(shù)累加表示進(jìn)程從創(chuàng)建開始至今的調(diào)度次數(shù)總和，其中t->nvcsw表示進(jìn)程主動(dòng)放棄CPU的次數(shù)，t->nivcsw表示被強(qiáng)制搶占的次數(shù)。隨后函數(shù)判斷幾個(gè)標(biāo)識(shí)：(1)如果進(jìn)程被frozen了那就跳過檢測;(2)調(diào)度次數(shù)為0的不檢測。

接下來判斷從上一次檢測時(shí)保存的進(jìn)程調(diào)度次數(shù)和本次是否相同，若不相同則表明這輪timeout(120s)時(shí)間內(nèi)進(jìn)程發(fā)生了調(diào)度，則更新該調(diào)度值返回，否則則表明該進(jìn)程已經(jīng)有timeout(120s)時(shí)間沒有得到調(diào)度了，一直處于D狀態(tài)。接下來的trace_sched_process_hang()暫不清楚作用，然后判斷sysctl_hung_task_warnings標(biāo)識(shí)，它表示需要觸發(fā)報(bào)警的次數(shù)，用戶也可以通過sysctl命令配置，默認(rèn)值為10，即若當(dāng)前檢測的進(jìn)程一直處于D狀態(tài)，默認(rèn)情況下此處每2分鐘發(fā)出一次告警，一共發(fā)出10次，之后不再發(fā)出告警。下面來看告警代碼：

[cpp] view plain copy  在CODE上查看代碼片派生到我的代碼片 /*   * Ok, the task did not get scheduled for more than 2 minutes,   * complain:   */   pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",       t->comm, t->pid, timeout);   pr_err("      %s %s %.*s\n",       print_tainted(), init_utsname()->release,       (int)strcspn(init_utsname()->version, " "),       init_utsname()->version);   pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\""       " disables this message.\n");   sched_show_task(t);   debug_show_held_locks(t);      touch_nmi_watchdog();

這里會(huì)在控制臺(tái)和日志中打印死鎖任務(wù)的名稱、PID號(hào)、超時(shí)時(shí)間、內(nèi)核tainted信息、sysinfo、內(nèi)核棧barktrace以及寄存器信息等。如果開啟了debug lock則打印鎖占用的情況，并touch nmi_watchdog以防止nmi_watchdog超時(shí)(對(duì)于我的ARM環(huán)境無需考慮nmi_watchdog)。

[cpp] view plain copy  在CODE上查看代碼片派生到我的代碼片 if (sysctl_hung_task_panic) {       trigger_all_cpu_backtrace();       panic("hung_task: blocked tasks");   }

***如果設(shè)置了sysctl_hung_task_panic標(biāo)識(shí)則直接觸發(fā)panic(該值可通過內(nèi)核配置文件配置也可以通過sysctl設(shè)置)。

二、示例演示

演示環(huán)境：樹莓派b(Linux 4.1.15)

1、首先確認(rèn)內(nèi)核配置選項(xiàng)以確認(rèn)開啟hung stak機(jī)制

[cpp] view plain copy  在CODE上查看代碼片派生到我的代碼片 #include      #include      #include      #include       DEFINE_MUTEX(dlock);      static int __init dlock_init(void)   {       mutex_lock(&dlock);       mutex_lock(&dlock);               return 0;   }      static void __exit dlock_exit(void)    {       return;   }      module_init(dlock_init);     module_exit(dlock_exit);     MODULE_LICENSE("GPL");

本示例程序定義了一個(gè)mutex鎖，然后在模塊的init函數(shù)中重復(fù)加鎖，人為造成死鎖現(xiàn)象(mutex_lock()函數(shù)會(huì)調(diào)用__mutex_lock_slowpath()將進(jìn)程設(shè)置為TASK_UNINTERRUPTIBLE狀態(tài))，進(jìn)程進(jìn)入D狀態(tài)后是無法退出的?？梢酝ㄟ^ps命令來查看：

root@apple:~# busybox ps  PID USER TIME COMMAND  ......  521 root 0:00 insmod dlock.ko  ......

然后查看該進(jìn)程的狀態(tài)，可見已經(jīng)進(jìn)入了D狀態(tài)。

root@apple:~# cat /proc/521/status  Name: insmod  State: D (disk sleep)  Tgid: 521  Ngid: 0  Pid: 521

至此在等待兩分鐘后調(diào)試串口就會(huì)輸出以下信息，可見每兩分鐘就會(huì)輸出一次：

[ 360.625466] INFO: task insmod:521 blocked for more than 120 seconds.  [ 360.631878] Tainted: G O 4.1.15 #5  [ 360.637042] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.  [ 360.644986] [] (__schedule) from [] (schedule+0x40/0xa4)  [ 360.652129] [] (schedule) from [] (schedule_preempt_disabled+0x18/0x1c)  [ 360.660570] [] (schedule_preempt_disabled) from [] (__mutex_lock_slowpath+0x6c/0xe4)  [ 360.670142] [] (__mutex_lock_slowpath) from [] (mutex_lock+0x44/0x48)  [ 360.678432] [] (mutex_lock) from [] (dlock_init+0x20/0x2c [dlock])  [ 360.686480] [] (dlock_init [dlock]) from [] (do_one_initcall+0x90/0x1e8)  [ 360.694976] [] (do_one_initcall) from [] (do_init_module+0x6c/0x1c0)  [ 360.703170] [] (do_init_module) from [] (load_module+0x1690/0x1d34)  [ 360.711284] [] (load_module) from [] (SyS_init_module+0xdc/0x130)  [ 360.719239] [] (SyS_init_module) from [] (ret_fast_syscall+0x0/0x54)  [ 480.725351] INFO: task insmod:521 blocked for more than 120 seconds.  [ 480.731759] Tainted: G O 4.1.15 #5  [ 480.736917] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.  [ 480.744842] [] (__schedule) from [] (schedule+0x40/0xa4)  [ 480.752029] [] (schedule) from [] (schedule_preempt_disabled+0x18/0x1c)  [ 480.760479] [] (schedule_preempt_disabled) from [] (__mutex_lock_slowpath+0x6c/0xe4)  [ 480.770066] [] (__mutex_lock_slowpath) from [] (mutex_lock+0x44/0x48)  [ 480.778363] [] (mutex_lock) from [] (dlock_init+0x20/0x2c [dlock])  [ 480.786402] [] (dlock_init [dlock]) from [] (do_one_initcall+0x90/0x1e8)  [ 480.794897] [] (do_one_initcall) from [] (do_init_module+0x6c/0x1c0)  [ 480.803085] [] (do_init_module) from [] (load_module+0x1690/0x1d34)  [ 480.811188] [] (load_module) from [] (SyS_init_module+0xdc/0x130)  [ 480.819113] [] (SyS_init_module) from [] (ret_fast_syscall+0x0/0x54)  [ 600.825353] INFO: task insmod:521 blocked for more than 120 seconds.  [ 600.831759] Tainted: G O 4.1.15 #5  [ 600.836916] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.  [ 600.844865] [] (__schedule) from [] (schedule+0x40/0xa4)  [ 600.852005] [] (schedule) from [] (schedule_preempt_disabled+0x18/0x1c)  [ 600.860445] [] (schedule_preempt_disabled) from [] (__mutex_lock_slowpath+0x6c/0xe4)  [ 600.870014] [] (__mutex_lock_slowpath) from [] (mutex_lock+0x44/0x48)  [ 600.878303] [] (mutex_lock) from [] (dlock_init+0x20/0x2c [dlock])  [ 600.886339] [] (dlock_init [dlock]) from [] (do_one_initcall+0x90/0x1e8)  [ 600.894835] [] (do_one_initcall) from [] (do_init_module+0x6c/0x1c0)  [ 600.903023] [] (do_init_module) from [] (load_module+0x1690/0x1d34)  [ 600.911133] [] (load_module) from [] (SyS_init_module+0xdc/0x130)  [ 600.919059] [] (SyS_init_module) from [] (ret_fast_syscall+0x0/0x54)

以上是“Linux中如何實(shí)現(xiàn)進(jìn)程D狀態(tài)死鎖檢測”這篇文章的所有內(nèi)容，感謝各位的閱讀！相信大家都有了一定的了解，希望分享的內(nèi)容對(duì)大家有所幫助，如果還想學(xué)習(xí)更多知識(shí)，歡迎關(guān)注創(chuàng)新互聯(lián)行業(yè)資訊頻道！

本文標(biāo)題：Linux中如何實(shí)現(xiàn)進(jìn)程D狀態(tài)死鎖檢測
URL地址：http://weahome.cn/article/pjjedj.html

真实的国产乱ⅩXXX66竹夫人,五月香六月婷婷激情综合,亚洲日本VA一区二区三区,亚洲精品一区二区三区麻豆

Linux中如何實(shí)現(xiàn)進(jìn)程D狀態(tài)死鎖檢測

其他資訊

網(wǎng)站制作

企業(yè)服務(wù)

網(wǎng)站建設(shè)

服務(wù)器托管