C++11多線程教學（一）

2016-09-28 00:00:00 廣州睿豐德信息科技有限公司閱讀

睿豐德科技專注RFID識別技術和條碼識別技術與管理軟件的集成項目。質量追溯系統、MES系統、金蝶與條碼系統對接、用友與條碼系統對接

本篇教學代碼可在GitHub獲得：https://github.com/sol-prog/threads。

在之前的教學中，我展示了一些最新進的C++11語言內容：

1. 正則表達式（http://solarianprogrammer.com/2011/10/12/cpp-11-regex-tutorial/）
2. raw string（http://solarianprogrammer.com/2011/10/16/cpp-11-raw-strings-literals-tutorial/）
3. lambda（http://solarianprogrammer.com/2011/11/01/cpp-11-lambda-tutorial/）

也許支持多線程是C++語言最大的變化之一。此前，C++只能利用操作系統的功能（Unix族系統使用pthreads庫），或是例如OpenMP和MPI這些代碼庫，來實現多核計算的目標。

本教程意圖讓你在使用C++11線程上起個頭，而不是只把語言標準在這里繁復地羅列出來。

創建和啟動一條C++線程就像在C++源碼中添加線程頭文件那么簡便。我們來看看如何創建一個簡單的帶線程的HelloWorld：

#include《iostream》

#include《thread》

//This function will be called from a thread

//該函數將在一條線程中得到調用

void call_from_thread() {

std::cout << "Hello, World" << std::endl;

}

int main() {

//Launch a thread

//啟動一條線程

std::thread t1(call_from_thread);

//Join the thread with the main thread

//和主線程協同

t1.join();

return 0;

}

在Linux系統中，上列代碼可采用g++編譯：

g++ -std=c++0x -pthread file_name.cpp

在安裝有Xcode4.x的麥金系統上，可用clang++編譯上述代碼：

clang++ -std=c++0x -stdlib=libc++ file_name.cpp

視窗系統上，可以利用付費代碼庫，just::thread，來編譯多線程代碼。但是很不走運，他們沒有提供代碼庫的試用版，我做不了測試。

在真實世界的應用程序中，函數“call_from_thread”相對主函數而言，獨立進行一些運算工作。在上述代碼中，主函數創建一條線程，并在t1.join()處等待t1線程運行結束。如果你在編碼中忘記考慮等待一條線程結束運行，主線程有可能搶先結束它自己的運行狀態，整個程序在退出的時候，將殺死先前創建的線程，不管函數“call_from_thread”有沒有執行完。

上面的代碼比使用POSIX線程的等價代碼，相對來說簡潔一些。請看使用POSIX線程的等價代碼：

//This function will be called from a thread

void *call_from_thread(void *) {

std::cout << "Launched by thread" << std::endl;

return NULL;

}

int main() {

pthread_t t;

//Launch a thread

pthread_create(&t, NULL, call_from_thread, NULL);

//Join the thread with the main thread

pthread_join(t, NULL);

return 0;

}

我們通常希望一次啟動多個線程，來并行工作。為此，我們可以創建線程組，而不是在先前的舉例中那樣創建一條線程。下面的例子中，主函數創建十條為一組的線程，并且等待這些線程完成他們的任務（在github代碼庫中也包含這個例子的POSIX版本）：

...

static const int num_threads = 10;

...

int main() {

std::thread t[num_threads];

//Launch a group of threads 啟動一組線程

for (int i = 0; i < num_threads; ++i) {

t[i] = std::thread(call_from_thread);

}

std::cout << "Launched from the mainn";

//Join the threads with the main thread

for (int i = 0; i < num_threads; ++i) {

t[i].join();

}

return 0;

}

記住，主函數也是一條線程，通常叫做主線程，所以上面的代碼實際上有11條線程在運行。在啟動這些線程組之后，線程組和主函數進行協同（join）之前，允許我們在主線程中做些其他的事情，在教程的結尾部分，我們將會用一個圖像處理的例子來說明之。

在線程中使用帶有形參的函數，是怎么一回事呢？C++11允許我們在線程的調用中，附帶上所需的任意參數。為了舉例說明，我們可以修改上面的代碼，以接受一個整型參數（在github代碼庫中也包含這個例子的POSIX版本）：

static const int num_threads = 10;

//This function will be called from a thread

void call_from_thread(int tid) {

std::cout << "Launched by thread " << tid << std::endl;

}

int main() {

std::thread t[num_threads];

//Launch a group of threads

for (int i = 0; i < num_threads; ++i) {

t[i] = std::thread(call_from_thread, i);

}

std::cout << "Launched from the mainn";

//Join the threads with the main thread

for (int i = 0; i < num_threads; ++i) {

t[i].join();

}

return 0;

}

在我的系統上，上面代碼的執行結果是：

Sol$ ./a.out

Launched by thread 0

Launched by thread 1

Launched by thread 2

Launched from the main

Launched by thread 3

Launched by thread 5

Launched by thread 6

Launched by thread 7

Launched by thread Launched by thread 4

aunched by thread 9

Sol$

能看到上面的結果中，程序一旦創建一條線程，其運行存在先后秩序不確定的現象。程序員的任務就是要確保這組線程在訪問公共數據時不要出現阻塞。最后幾行，所顯示的錯亂輸出，表明8號線程啟動的時候，4號線程還沒有完成在stdout上的寫操作。事實上假定在你自己的機器上運行上面的代碼，將會獲得全然不同的結果，甚至是會輸出些混亂的字符。原因在于，程序內的11條線程都在競爭性地使用stdout這個公共資源（案：Race Conditions）。

要避免上面的問題，可以在代碼中使用攔截器（barriers），如std:mutex，以同步（synchronize）的方式來使得一群線程訪問公共資源，或者，如果可行的話，為線程們預留下私用的數據結構，避免使用公共資源。我們在以后的教學中，還會講到線程同步問題，包括使用原子操作類型（atomic types）和互斥體（mutex）。

從原理上講，編寫更加復雜的并行代碼所需的概念，我們已經在上面的代碼中都談到了。

接下來的例子，我要來展示并行編程方案的強大之處。這是個稍為復雜的問題：利用柔化濾波器（blur filter）去除一張圖片的雜點。思路是利用一點像素和它相鄰像素的加權均值的某種算法形式（案：后置濾波），去除圖片雜點。

本教程不在于討論優化圖像處理，筆者也非此路專家，所以我們只采取相當簡單的方法。我們的目標是勾勒出如何去編寫并行代碼，至于如何高效訪問圖片，與濾波器的卷積計算，都不是重點。我在此作為舉例，只利用空間卷積的定義，而不是采用更多的共振峰(?)，當然稍微有些實現上的難度，頻域的卷積使用快速傅里葉變換。

為簡便起見，我們將使用一種簡單的非壓縮圖像文件PPM。接下來，我們提供一個簡化的C++類的頭文件，這個類負責讀寫PPM圖片，并在內存中的三個無符號字符型數組結構里（RGB三色）重建圖像：

class ppm {

bool flag_alloc;

void init();

//info about the PPM file (height and width)

//PPM文件的信息（高和寬）

unsigned int nr_lines;

unsigned int nr_columns;

public:

//arrays for storing the R,G,B values

//保存RGB值的數組

unsigned char *r;

unsigned char *g;

unsigned char *b;

unsigned int height;

unsigned int width;

unsigned int max_col_val;

//total number of elements (pixels)

//元素（像素）的總量

unsigned int size;

ppm();

//create a PPM object and fill it with data stored in fname

//創建一個PPM對象，裝載保存在文件fname中的數據

ppm(const std::string &fname);

//create an "empty" PPM image with a given width and height;the R,G,B arrays are filled //with zeros

//創建一個“空”PPM圖像，大小由_width和_height指定；RGB數組置為零值

ppm(const unsigned int _width, const unsigned int _height);

//free the memory used by the R,G,B vectors when the object is destroyed

//在本對象銷毀時，釋放RGB向量占用的內存

~ppm();

//read the PPM image from fname

//從fname文件中讀取PPM圖像

void read(const std::string &fname);

//write the PPM image in fname

//保存PPM圖像到fname文件

void write(const std::string &fname);

};

一種可行的編碼方案是：

載入圖像到內存。
把圖像拆分為幾個部分，每部分由相應線程負責，線程數量為系統可承受之最大值，例如四核心計算機可啟用8條線程。
啟動若干線程——每條線程負責處理它自己的圖像塊。
主線程處理最后的圖像塊。
與主線程協調并等待全部線程計算完成。
保存處理后的圖像。

接下來我們列出主函數，該函數實現了如上算法（感謝wiched提出的代碼修改意見）：

int main() {

std::string fname = std::string("your_file_name.ppm");

ppm image(fname);

ppm image2(image.width, image.height);

//Number of threads to use (the image will be divided between threads)

//采用的線程數量（圖像將被分割給每一條線程去處理）

int parts = 8;

std::vectorbnd = bounds(parts, image.size);

std::thread *tt = new std::thread[parts - 1];

time_t start, end;

time(&start);

//Lauch parts-1 threads

//啟動parts-1個線程

for (int i = 0; i < parts - 1; ++i) {

tt[i] = std::thread(tst, &image, &image2, bnd[i], bnd[i + 1]);

}

//Use the main thread to do part of the work !!!

//使用主線程來做一部分任務！

for (int i = parts - 1; i < parts; ++i) {

tst(&image, &image2, bnd[i], bnd[i + 1]);

}

//Join parts-1 threads 協同parts-1條線程

for (int i = 0; i < parts - 1; ++i)

tt[i].join();

time(&end);

std::cout << difftime(end, start) << " seconds" << std::endl;

//Save the result 保存結果

image2.write("test.ppm");

//Clear memory and exit 釋放占用的內存，然后退出

delete [] tt;

return 0;

}

請無視圖像文件名和線程啟動數的硬性編碼。在實際應用中，應該讓用戶可以交互式輸入這些參數。

現在為了看看并行代碼的工作情況，我們需要賦之以足夠任務負荷，否則那些創建和銷毀線程的開銷將會干擾測試結果，使得我們的并行測試失去意義。輸入的圖像應該足夠大，以能顯示出并行代碼性能方面的改進效果。為此，我采用了一張16000x10626像素大小的PPM 格式圖片，空間占用約512MB：

RFID設備管理軟件

我用Gimp軟件往圖片里摻入了一些雜點。雜點效果如下圖：

RFID設備管理軟件

前面代碼的運行結果：

RFID設備管理軟件

正如所見，上面的圖片雜點程度被弱化了。

樣例代碼運行在雙核MacBook Pro上的結果：

Compiler Optimization Threads Time Speed up clang++ none 1 40s clang++ none 4 20s 2x clang++ -O4 1 12s clang++ -O4 4 6s 2x

在雙核機器上，并行比串行模式（單線程），速率有完美的２倍提升。

我還在一臺四核英特爾i7Linux機器上作了測試，結果如下：

Compiler Optimization Threads Time Speed up g++ none 1 33s g++ none 8 13s 2.54x g++ -O4 1 9s g++ -O4 8 3s 3x

顯然，蘋果的clang++在提升并行程序方面要更好些，不管怎么說，這是編譯器／機器特性的一個聯袂結果，也不排除MacBook Pro使用了8GB內存的因素，而Linux機器只有6GB。

如果有興趣學習新的C++11語法，我建議閱讀《Professional C++》，或《C ++　Primer Plus》。C++11多線程主題方面，建議閱讀《C++ Concurrency in Action》，這是一本好書。

from:http://article.yeeyan.org/view/234235/268247

RFID管理系統集成商 RFID中間件條碼系統中間層物聯網軟件集成

上一條 C++11 生產者消費者
下一條搜集的一些RTMP項目，有Server端也有Client端