如何在C語言項目中使用正則表達式

本篇文章給大家分享的是有關(guān)如何在C語言項目中使用正則表達式，小編覺得挺實用的，因此分享給大家學(xué)習(xí)，希望大家閱讀完這篇文章后可以有所收獲，話不多說，跟著小編一起來看看吧。

專注于為中小企業(yè)提供成都網(wǎng)站設(shè)計、成都網(wǎng)站制作、外貿(mào)網(wǎng)站建設(shè)服務(wù),電腦端+手機端+微信端的三站合一,更高效的管理,為中小企業(yè)石獅免費做網(wǎng)站提供優(yōu)質(zhì)的服務(wù)。我們立足成都，凝聚了一批互聯(lián)網(wǎng)行業(yè)人才，有力地推動了成百上千家企業(yè)的穩(wěn)健成長，幫助中小企業(yè)通過網(wǎng)站建設(shè)實現(xiàn)規(guī)模擴充和轉(zhuǎn)變。

正則表達式，又稱正規(guī)表示法、常規(guī)表示法（英語：Regular Expression，在代碼中常簡寫為regex、regexp或RE）。正則表達式是使用單個字符串來描述、匹配一系列符合某個句法規(guī)則的字符串。

在c語言中，用regcomp、regexec、regfree 和regerror處理正則表達式。處理正則表達式分三步：

編譯正則表達式，regcomp；
匹配正則表達式，regexec；
釋放正則表達式，regfree。

函數(shù)原型

/*
函數(shù)說明：Regcomp將正則表達式字符串regex編譯成regex_t的形式，后續(xù)regexec以此進行搜索。
參數(shù)說明：
  Preg：一個regex_t結(jié)構(gòu)體指針。
  Regex：正則表達式字符串。
  Cflags：是下邊四個值或者是他們的或(|)運算。
    REG_EXTENDED：使用POSIX擴展正則表達式語法解釋的正則表達式。如果沒有設(shè)置，基本POSIX正則表達式語法。
    REG_ICASE：忽略字母的大小寫。
    REG_NOSUB：不存儲匹配的結(jié)果。
    REG_NEWLINE：對換行符進行“特殊照顧”，后邊詳細說明。
返回值：
  0：表示成功編譯；
  非0：表示編譯失敗，用regerror查看失敗信息
*/
int regcomp(regex_t *preg, const char *regex, int cflags);
/*
函數(shù)說明： Regexec用來匹配正則文本。
參數(shù)說明：
  Preg：由regcomp編譯好的regex_t結(jié)構(gòu)體指針，
  String：要進行正則匹配的字符串。
  Nmatch：regmatch_t結(jié)構(gòu)體數(shù)組的大小
  Pmatch：regmatch_t結(jié)構(gòu)體數(shù)組。用來保存匹配結(jié)果的子串位置。
  regmatch_t結(jié)構(gòu)體定義如下
    typedef struct {
      regoff_t rm_so;
      regoff_t rm_eo;
    } regmatch_t;
    rm_so,它的值如果不為-1，表示匹配的最大子串在字符串中的起始偏移量，rm_eo，表示匹配的最大字串在字符串的結(jié)束偏移量。
  Eflags: REG_NOTBOL和REG_NOTEOL為兩個值之一或二者的或(|)運算，稍后會介紹。
返回值：
  0：表示成功編譯；
  非0：表示編譯失敗，用regerror查看失敗信息
*/
int regexec(const regex_t *preg, const char *string, size_t nmatch, regmatch_t pmatch[], int eflags);
/*
函數(shù)說明：用來釋放regcomp編譯好的內(nèi)置變量。
參數(shù)說明：
  Preg：由regcomp編譯好的regex_t結(jié)構(gòu)體指針。
*/
void regfree(regex_t *preg);
/*
函數(shù)說明：Regcomp，regexec出錯時，會返回error code并且為非0，此時就可以用regerror得到錯誤信息。
參數(shù)說明：
  Errcode：Regcomp，regexec出錯時的返回值
  Preg：經(jīng)過Regcomp編譯的regex_t結(jié)構(gòu)體指針。
  Errbuf：錯誤信息放置的位置。
  errbuf_size：錯誤信息buff的大小。
*/
size_t regerror(int errcode, const regex_t *preg, char *errbuf, size_t errbuf_size);

示例一

#include 
#include 
#include 
#include 
int main (void)
{
  char ebuff[256];
  int ret;
  int cflags;
  regex_t reg;
  cflags = REG_EXTENDED | REG_ICASE | REG_NOSUB;
  char *test_str = "Hello World";
  char *reg_str = "H.*";
  ret = regcomp(®, reg_str, cflags);
  if (ret)
  {  
    regerror(ret, ®, ebuff, 256);
    fprintf(stderr, "%s\n", ebuff);
    goto end;
  }  
  ret = regexec(®, test_str, 0, NULL, 0);
  if (ret)
  {
    regerror(ret, ®, ebuff, 256);
    fprintf(stderr, "%s\n", ebuff);
    goto end;
  }  
  regerror(ret, ®, ebuff, 256);
  fprintf(stderr, "result is:\n%s\n", ebuff);
end:
  regfree(®);
  return 0;
}

編譯，輸出結(jié)果：

[root@zxy regex]# ./test
result is:
Success

匹配成功。

示例二

如果我想保留匹配的結(jié)果怎么操作？那就得用到 regmatch_t 結(jié)構(gòu)體了。重新改寫上邊代碼，這時就不能用REG_NOSUB選項了，代碼如下：

#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
int main (void)
{
  int i;
  char ebuff[256];
  int ret;
  int cflags;
  regex_t reg;
  regmatch_t rm[5];
  char *part_str = NULL;
  cflags = REG_EXTENDED | REG_ICASE;
  char *test_str = "Hello World";
  char *reg_str = "e(.*)o";
  ret = regcomp(®, reg_str, cflags);
  if (ret)
  {  
    regerror(ret, ®, ebuff, 256);
    fprintf(stderr, "%s\n", ebuff);
    goto end;
  }  
  ret = regexec(®, test_str, 5, rm, 0); 
  if (ret)
  {  
    regerror(ret, ®, ebuff, 256);
    fprintf(stderr, "%s\n", ebuff);
    goto end;
  }
  regerror(ret, ®, ebuff, 256);
  fprintf(stderr, "result is:\n%s\n\n", ebuff);
  for (i=0; i<5; i++)
  {
    if (rm[i].rm_so > -1)
    {
      part_str = strndup(test_str+rm[i].rm_so, rm[i].rm_eo-rm[i].rm_so);
      fprintf(stderr, "%s\n", part_str);
      free(part_str);
      part_str = NULL;
    }
  }
end:
  regfree(®);
  return 0;
}

編譯，輸出結(jié)果：

[root@zxy regex]# ./test
result is:
Success
ello Wo
llo W

咦？？？？？？我明明只要一個匹配結(jié)果，為什么會打印兩個出來呢？？？？？？？
原來regmatch_t數(shù)組的第一個元素是有特殊意義的：它是用來保存整個正則表達式能匹配的最大子串的起始和結(jié)束偏移量。所以我們在設(shè)置regmatch_t數(shù)組個數(shù)的時候一定要記住，它的個數(shù)是最大保留結(jié)果數(shù)+1。

REG_NEWLINE、REG_NOTBOL和REG_NOTEOL

好了，基本的正則運用到此為止了，現(xiàn)在要開始講講REG_NEWLINE、REG_NOTBOL和REG_NOTEOL。很多人對這三個參數(shù)有所迷惑。我也是，昨天有人問問題，就把自己錯誤的理解告訴了別人，然后被大神一頓鄙視。我一直認為如果想用^和$這兩個匹配模式一定要用到REG_NEWLINE這個參數(shù)，其實不然。

REG_NEWLINE

首先看下man page對REG_NEWLINE的說明：

REG_NEWLINE
  Match-any-character operators don't match a newline.
  A non-matching list ([^...]) not containing a newline does not match a newline.
  Match-beginning-of-line operator (^) matches the empty string immediately after a newline, regardless of whether eflags, the execution flags of regexec(), contains REG_NOTBOL.
  Match-end-of-line operator ($) matches the empty string immediately before a newline, regardless of whether eflags contains REG_NOTEOL.

我英文不好，google翻譯之。。

REG_NEWLINE

1.匹配任何字符的運算符(比如.)不匹配換行('\n')；
2.非匹配列表（[^...]）不包含一個換行符不匹配一個換行符；
3.匹配開始運算符(^)遇到空字符串立即換行，不論在執(zhí)行regexec()時，eflags是否設(shè)置了REG_NOTBOL；
4.匹配結(jié)束運算符($)遇到空字符串立即換行，不論在執(zhí)行regexec()時，eflags是否設(shè)置了REG_NOTEOL；

不明白說的是什么，程序測之。。

第一個問題

代碼如下：

#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
int main (void)
{
  int i;
  char ebuff[256];
  int ret;
  int cflags;
  regex_t reg;
  cflags = REG_EXTENDED | REG_ICASE | REG_NOSUB;
  char *test_str = "Hello World\n";
  char *reg_str = "Hello World.";
  ret = regcomp(®, reg_str, cflags);
  if (ret)
  {  
    regerror(ret, ®, ebuff, 256);
    fprintf(stderr, "1. %s\n", ebuff);
    goto end;
  }  
  ret = regexec(®, test_str, 0, NULL, 0); 
  regerror(ret, ®, ebuff, 256);
  fprintf(stderr, "2. %s\n", ebuff);
  cflags |= REG_NEWLINE;
  ret = regcomp(®, reg_str, cflags);
  if (ret)
  {
    regerror(ret, ®, ebuff, 256);
    fprintf(stderr, "3. %s\n", ebuff);
    goto end;
  }
  ret = regexec(®, test_str, 0, NULL, 0);
  regerror(ret, ®, ebuff, 256);
  fprintf(stderr, "4. %s\n", ebuff);
end:
  regfree(®);
  return 0;
}

編譯，運行結(jié)果如下：

[root@zxy regex]# ./test
2. Success
4. No match

結(jié)果很明顯：沒有加入REG_NEWLINE的匹配成功，加入的匹配不成功。就是說不加入REG_NEWLINE，任意匹配字符(.)包含'n'，加入則不包含'n'。

第二個問題

代碼如下：

#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
int main (void)
{
  int i;
  char ebuff[256];
  int ret;
  int cflags;
  regex_t reg;
  cflags = REG_EXTENDED | REG_ICASE | REG_NOSUB;
  char *test_str = "Hello\nWorld";
  char *reg_str = "Hello[^ ]";
  ret = regcomp(®, reg_str, cflags);
  if (ret)
  {  
    regerror(ret, ®, ebuff, 256);
    fprintf(stderr, "1. %s\n", ebuff);
    goto end;
  }  
  ret = regexec(®, test_str, 0, NULL, 0); 
  regerror(ret, ®, ebuff, 256);
  fprintf(stderr, "2. %s\n", ebuff);
  cflags |= REG_NEWLINE;
  ret = regcomp(®, reg_str, cflags);
  if (ret)
  {
    regerror(ret, ®, ebuff, 256);
    fprintf(stderr, "3. %s\n", ebuff);
    goto end;
  }
  ret = regexec(®, test_str, 0, NULL, 0);
  regerror(ret, ®, ebuff, 256);
  fprintf(stderr, "4. %s\n", ebuff);
end:
  regfree(®);
  return 0;
}

編譯，運行結(jié)果如下：

[root@zxy regex]# ./test
2. Success
4. No match

結(jié)果說明：不加入REG_NEWLINE，在一個不包含'n'的非列表中，'n'是不被認作空白符，加入則'n'是被認作空白符。

第三個問題

代碼如下：

#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
int main (void)
{
  int i;
  char ebuff[256];
  int ret;
  int cflags;
  regex_t reg;
  cflags = REG_EXTENDED | REG_ICASE | REG_NOSUB;
  char *test_str = "\nHello World";
  char *reg_str = "^Hello";
  ret = regcomp(®, reg_str, cflags);
  if (ret)
  {  
    regerror(ret, ®, ebuff, 256);
    fprintf(stderr, "1. %s\n", ebuff);
    goto end;
  }  
  ret = regexec(®, test_str, 0, NULL, 0); 
  regerror(ret, ®, ebuff, 256);
  fprintf(stderr, "2. %s\n", ebuff);
  cflags |= REG_NEWLINE;
  ret = regcomp(®, reg_str, cflags);
  if (ret)
  {
    regerror(ret, ®, ebuff, 256);
    fprintf(stderr, "3. %s\n", ebuff);
    goto end;
  }
  ret = regexec(®, test_str, 0, NULL, 0);
  regerror(ret, ®, ebuff, 256);
  fprintf(stderr, "4. %s\n", ebuff);
end:
  regfree(®);
  return 0;
}

編譯，運行結(jié)果如下：

[root@zxy regex]# ./test
2. No match
4. Success

結(jié)果說明：不加入REG_NEWLINE，'^'是不忽略'n'的，加入REG_NEWLINE，'^'是忽略'n'的。也就是說：不加入REG_NEWLINE，以'n'開頭的字符串是不能用'^'匹配，加入REG_NEWLINE，以'n'開頭的字符串是可以用'^'匹配。

第四個問題

代碼如下：

#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
int main (void)
{
  int i;
  char ebuff[256];
  int ret;
  int cflags;
  regex_t reg;
  cflags = REG_EXTENDED | REG_ICASE | REG_NOSUB;
  char *test_str = "Hello World\n";
  char *reg_str = "d$";
  ret = regcomp(®, reg_str, cflags);
  if (ret)
  {  
    regerror(ret, ®, ebuff, 256);
    fprintf(stderr, "1. %s\n", ebuff);
    goto end;
  }  
  ret = regexec(®, test_str, 0, NULL, 0); 
  regerror(ret, ®, ebuff, 256);
  fprintf(stderr, "2. %s\n", ebuff);
  cflags |= REG_NEWLINE;
  ret = regcomp(®, reg_str, cflags);
  if (ret)
  {
    regerror(ret, ®, ebuff, 256);
    fprintf(stderr, "3. %s\n", ebuff);
    goto end;
  }
  ret = regexec(®, test_str, 0, NULL, 0);
  regerror(ret, ®, ebuff, 256);
  fprintf(stderr, "4. %s\n", ebuff);
end:
  regfree(®);
  return 0;
}

編譯，運行結(jié)果如下：

[root@zxy regex]# ./test
2. No match
4. Success

結(jié)果說明：不加入REG_NEWLINE，'$'是不忽略'n'的，加入REG_NEWLINE，'$'是忽略'n'的。也就是說：不加入REG_NEWLINE，以'n'結(jié)尾的字符串是不能用'$'匹配，加入REG_NEWLINE，以'n'開頭的字符串是可以用'$'匹配。

REG_NEWLINE總結(jié)

好，REG_NEWLINE選項測試到此結(jié)束?？偨Y(jié)下：

對于REG_NEWLINE選項，1.使用任意匹配符(.)時，任意匹配符不會包含'n'；2.對于一個不含有'n'的非列表，會把'n'認作空白符。3.對于以'n'開頭或結(jié)尾的字符串，會忽略'n'。使'^'和'$'可以使用。

REG_NOTBOL和REG_NOTEOL

現(xiàn)在開始說下REG_NOTBOL和REG_NOTEOL，首先看下man page對這兩選項的說明：

REG_NOTBOL
　　The match-beginning-of-line operator always fails to match (but see the compilation flag REG_NEWLINE above) This flag may be used when different portions of a string are passed to regexec() and the beginning of the string should not be interpreted as the beginning of the line.
REG_NOTEOL
　　The match-end-of-line operator always fails to match (but see the compilation flag REG_NEWLINE above)
繼續(xù)googling。

REG_NOTBOL
匹配開始操作符(^)會經(jīng)常匹配失敗(但是要考慮REG_NEWLINE)，這個標(biāo)志被用在當(dāng)一個字符串的不同位置被傳入到regexec()時，這個位置不應(yīng)該被解釋為該整個字符串的開始位置。
REG_NOTEOL
匹配結(jié)束操作符($)會經(jīng)常失敗(但是要考慮REG_NEWLINE)。(這個標(biāo)志被用在當(dāng)一個字符串的不同位置被傳入到regexec()時，即使?jié)M足匹配結(jié)束作符，也不應(yīng)該被解釋為以某字符(串)為結(jié)束的）。

好吧，繼續(xù)測試，第一個問題代碼如下：

#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
int main (void)
{
  int i;
  char ebuff[256];
  int ret;
  int cflags;
  regex_t reg;
  cflags = REG_EXTENDED | REG_ICASE | REG_NOSUB;
  char *test_str = "Hello World\n";
  char *reg_str = "^e";
  ret = regcomp(®, reg_str, cflags);
  if (ret)
  {  
    regerror(ret, ®, ebuff, 256);
    fprintf(stderr, "1. %s\n", ebuff);
    goto end;
  }  
  ret = regexec(®, test_str+1, 0, NULL, 0); 
  regerror(ret, ®, ebuff, 256);
  fprintf(stderr, "2. %s\n", ebuff);
  ret = regexec(®, test_str+1, 0, NULL, REG_NOTBOL);
  regerror(ret, ®, ebuff, 256);
  fprintf(stderr, "4. %s\n", ebuff);
end:
  regfree(®);
  return 0;
}

編譯，運行結(jié)果如下：

[root@zxy regex]# ./test
2. Success
4. No match

結(jié)果說明：不加入REG_NOTBOL，一個字符串的不同位置是可以用'^'進行匹配，加入REG_NOTBOL，則不能進行匹配。

以上就是如何在C語言項目中使用正則表達式，小編相信有部分知識點可能是我們?nèi)粘９ぷ鲿姷交蛴玫降?。希望你能通過這篇文章學(xué)到更多知識。更多詳情敬請關(guān)注創(chuàng)新互聯(lián)行業(yè)資訊頻道。

名稱欄目：如何在C語言項目中使用正則表達式
URL鏈接：http://weahome.cn/article/gigpgc.html

真实的国产乱ⅩXXX66竹夫人,五月香六月婷婷激情综合,亚洲日本VA一区二区三区,亚洲精品一区二区三区麻豆

如何在C語言項目中使用正則表達式

其他資訊

網(wǎng)站制作

企業(yè)服務(wù)

網(wǎng)站建設(shè)

服務(wù)器托管