gdb调试多线程程序

在Linux中可以使用GDB（GNU调试器）工具来调试多线程程序。这里的线程应当更多指的内核级线程。这里记录一些线程相关的使用指令，像一些单步调试命令，堆栈，信息等命令都是通用的。

测试程序

在main程序中创建两个线程，线程如果没有断点，每个1s打印一行，5次后退出。注意，该程序运行中一共会有3个线程，因为程序本身就是一个线程了，还有再创建2个线程，一共3个线程。调试中，在线程函数打印处加断点（break 9）。

  
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h> // 包含sleep函数的头文件

void *thread_function(void *arg) {
    char *thread_name = (char *)arg;
    for (int i = 0; i < 5; i++) {
        printf("Running %s: print-count %d\n", thread_name, i+1);
        sleep(1); // 休眠1秒
    }
    pthread_exit(NULL);
}

int main() {
    pthread_t thread1, thread2;
    char *thread1_name="my thread-AAA";
    char *thread2_name="my thread-BBB";

    // 创建线程1
    if (pthread_create(&thread1, NULL, thread_function, (void *)thread1_name) != 0) {
        fprintf(stderr, "Error creating thread 1\n");
        return 1;
    }

    // 创建线程2
    if (pthread_create(&thread2, NULL, thread_function, (void *)thread2_name) != 0) {
        fprintf(stderr, "Error creating thread 2\n");
        return 1;
    }

    // 等待线程1和线程2结束
    if (pthread_join(thread1, NULL) != 0) {
        fprintf(stderr, "Error joining thread 1\n");
        return 1;
    }

    if (pthread_join(thread2, NULL) != 0) {
        fprintf(stderr, "Error joining thread 2\n");
        return 1;
    }

    printf("All threads have completed\n");
    return 0;
}

查看所有线程情况

打好断点后，run运行程序，创建的两个线程都会在断点处暂停下来。先查看线程情况。gdb中使用指令info threads。

(gdb) info threads

运行示例：

(gdb) b 9
Breakpoint 1 at 0x124a: file ./mttest.c, line 9.
(gdb) r
Starting program: /home/user/Templates/tempdir/mttest 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7d90700 (LWP 624602)]
[New Thread 0x7ffff758f700 (LWP 624603)]
[Switching to Thread 0x7ffff7d90700 (LWP 624602)]

Thread 2 "mttest" hit Breakpoint 1, thread_function (arg=0x555555556020) at ./mttest.c:9
9	        printf("Running %s: print-count %d\n", thread_name, i+1);

(gdb) info threads 
  Id   Target Id                                   Frame 
  1    Thread 0x7ffff7d91740 (LWP 624598) "mttest" __pthread_clockjoin_ex (threadid=140737351583488, thread_return=0x0, 
    clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:145
* 2    Thread 0x7ffff7d90700 (LWP 624602) "mttest" thread_function (arg=0x555555556020) at ./mttest.c:9
  3    Thread 0x7ffff758f700 (LWP 624603) "mttest" thread_function (arg=0x55555555602e) at ./mttest.c:9

可以看到，有3个线程，第一个是main线程，2,3是创建的线程，当前程序暂停在线程2上。

切换线程

如果是直接运行程序，A，B两个线程的每次先后打印是随机的，取决于调度情况，但是在调试器中，线程可以暂停下来，可以手动控制GDB调试器先后运行顺序。

上例中，程序首先在第一个创建的线程处停止（AAA线程），这里切换到BBB线程，先运行BBB线程。

gdb中使用命令 thread <threadid>可以切换线程。<threadid>即为 info threads命令中显示的。

(gdb) thread 3
[Switching to thread 3 (Thread 0x7ffff758f700 (LWP 624603))]
#0  thread_function (arg=0x55555555602e) at ./mttest.c:9
9	        printf("Running %s: print-count %d\n", thread_name, i+1);

(gdb) info threads 
  Id   Target Id                                   Frame 
  1    Thread 0x7ffff7d91740 (LWP 624598) "mttest" __pthread_clockjoin_ex (threadid=140737351583488, thread_return=0x0, 
    clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:145
  2    Thread 0x7ffff7d90700 (LWP 624602) "mttest" thread_function (arg=0x555555556020) at ./mttest.c:9
* 3    Thread 0x7ffff758f700 (LWP 624603) "mttest" thread_function (arg=0x55555555602e) at ./mttest.c:9

之后可以执行一些调试命令，如step，next，continue等命令，默认情况下是仅对当前线程有效的，具体和调度器锁定设置有关，参考下文。

特定线程断点

可以对特定的某一个线程设置断点，其他线程不影响，属于break指令的扩展。

(gdb) break [LOCATION] [thread THREADNUM] [if CONDITION]

使用break命令时，加上thread选项指定线程ID即可。这样只对特定的线程生效。如果未填写thread选项，相当于为全部线程设置，即thread all。

指定线程执行命令

对线程执行调试命令时，需要切换到对应的线程，再执行。其实也可以指定线程去执行特定命令。使用 thread apply命令即可，具体查看帮助。

(gdb) help thread apply 
Apply a command to a list of threads.
Usage: thread apply ID... [OPTION]... COMMAND

## 如查看所有线程的堆栈情况
(gdb) thread apply all bt

调度器锁定选项

多线程程序，有一个重要的选项 scheduler-locking，用于控制线程调度锁定方式。比如程序当前暂停在某个线程上，其他线程是否会执行，还是暂停。

set scheduler-locking指令可以影响内核调度器如何调度线程，该选项默认为on，表示启用调试器的线程调度锁定。此设置下，调度器会通过发送信号（如SIGSTOP）来暂停正在执行的线程，并在GDB调试器的控制下重新调度线程。线程被暂停后，GDB调试器可以使用continue命令继续运行线程，或使用next,step等单步调试命令控制线程的执行。

效果就是，设置为on时，暂停后，其他线程不会继续执行，一直暂停在那里，需要用户通过GDB调试器对它执行continue或next等命令才会动。同时，调试命令仅对当前线程有效，其他线程不受影响。这种模式，可以方便的调试某一个线程内部的情况，也可以很容易的让用户自主控制线程间的同步等。设置为off时，相反，其他线程照常运行，只有当前线程是暂停的，可由用户单步调试等。虽然其他线程照常运行，但也是受断点影响的，只要触发了断点，其他执行的线程也还是会因为断点暂停下来。

注意：该命令需要在创建线程后，且暂停后才能设置。

(gdb) set scheduler-locking on
(gdb) set scheduler-locking off

## 详细说明及其他选项参考帮助
(gdb) help set scheduler-locking

补充:该选项有还有其他值，默认也不一定是on，其他还有 step模式，replay模式。有点属于是中间模式了，其他线程会暂停，但又不像off模式那样完全一直暂停。replay模式会跟随执行，就是当前调试的线程执行，就跟着执行，暂停下来，就跟着暂停下来，这种模式和真实的运行场景最相近。step模式，仅对于单步调试指令，其他线程是暂停的，其他命令也会跟着执行，跟着停下。

多进程调试相关选项

有时程序会fork自身创建子进程，对于这种多个进程的，也有相关的控制选项，类似的切换操作。这种属于多进程程序，和多线程有些区别。

## 控制fork之后，调试器跟踪父还是子进程
(gdb) help set follow-fork-mode

## 在父子多个进程之间切换
(gdb) help inferior

...
...

gdb调试多线程程序

Further Reading

gdb介绍

gdb常用指令一

gdb常用指令二