Skip to content

Commit 507f8b9

Browse files
committed
see #12: translate ch06
1 parent 9180aab commit 507f8b9

17 files changed

+294
-36
lines changed

.travis.yml

+3
Original file line numberDiff line numberDiff line change
@@ -11,5 +11,8 @@ addons:
1111
ssh_known_hosts:
1212
- changkun.de
1313

14+
script:
15+
- make build
16+
1417
after_success:
1518
scp -r website/public/modern-cpp/* [email protected]:$encrypted_server_path

book/en-us/05-pointers.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ order: 5
66

77
# Chapter 05 Standard Library: Pointers
88

9-
[Table of Content](./toc.md) | [Previous Chapter](./04-containers.md) | [Next Chapter: Standard Library: Regular Expression](./06-regex.md)
9+
[Table of Content](./toc.md) | [Previous Chapter](./04-containers.md) | [Next Chapter: Regular Expression](./06-regex.md)
1010

1111
## Further Readings
1212

book/en-us/06-regex.md

+136-3
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,148 @@
11
---
2-
title: "Chapter 06 Standard Library: Regular Expression"
2+
title: "Chapter 06 Regular Expression"
33
type: book-en-us
44
order: 6
55
---
66

7-
# Chapter 06 Standard Library: Regular Expression
7+
# Chapter 06 Regular Expression
88

9-
[Table of Content](./toc.md) | [Previous Chapter](./05-pointers.md) | [Next Chapter: Standard Library: Threads and Concurrency](./07-thread.md)
9+
[TOC]
10+
11+
## 6.1 Introduction
12+
13+
Regular expressions are not part of the C++ language and therefore we only briefly
14+
introduced it here.
15+
16+
Regular expressions describe a pattern of string matching.
17+
The general use of regular expressions is mainly to achieve
18+
the following three requirements:
19+
20+
1. Check if a string contains some form of substring;
21+
2. Replace the matching substrings;
22+
3. Take the eligible substring from a string.
23+
24+
Regular expressions are text patterns consisting of ordinary characters (such as a to z)
25+
and special characters. A pattern describes one or more strings to match when searching for text.
26+
Regular expressions act as a template to match a character pattern to the string being searched.
27+
28+
### Ordinary characters
29+
30+
Normal characters include all printable and unprintable characters that
31+
are not explicitly specified as metacharacters. This includes all uppercase
32+
and lowercase letters, all numbers, all punctuation, and some other symbols.
33+
34+
### Special characters
35+
36+
A special character is a character with special meaning in a regular expression,
37+
and is also the core matching syntax of a regular expression. See the table below:
38+
39+
|Special characters|Description|
40+
|:---:|:------------------------------------------------------|
41+
|`$`| Matches the end position of the input string. |
42+
|`(`,`)`| Marks the start and end of a subexpression. Subexpressions can be obtained for later use. |
43+
|`*`| Matches the previous subexpression zero or more times. |
44+
|`+`| Matches the previous subexpression one or more times. |
45+
|`.`| Matches any single character except the newline character `\n`. |
46+
|`[`| Marks the beginning of a bracket expression. |
47+
|`?`| Matches the previous subexpression zero or one time, or indicates a non-greedy qualifier. |
48+
| `\`| Marks the next character as either a special character, or a literal character, or a backward reference, or an octal escape character. For example, `n` Matches the character `n`. `\n` matches newline characters. The sequence `\\` Matches the `'\'` character, while `\(` matches the `'('` character.|
49+
|`^`| Matches the beginning of the input string, unless it is used in a square bracket expression, at which point it indicates that the set of characters is not accepted. |
50+
|`{`| Marks the beginning of a qualifier expression. |
51+
|`\`| Indicates a choice between the two. |
52+
53+
### Quantifiers
54+
55+
The qualifier is used to specify how many times a given component of a regular expression must appear to satisfy the match. See the table below:
56+
57+
|Character|Description|
58+
|:---:|:------------------------------------------------------|
59+
|`*`| matches the previous subexpression zero or more times. For example, `foo*` matches `fo` and `foooo`. `*` is equivalent to `{0,}`. |
60+
|`+`| matches the previous subexpression one or more times. For example, `foo+` matches `foo` and `foooo` but does not match `fo`. `+` is equivalent to `{1,}`. |
61+
|`?`| matches the previous subexpression zero or one time. For example, `Your(s)?` can match `Your` in `Your` or `Yours`. `?` is equivalent to `{0,1}`. |
62+
|`{n}`| `n` is a non-negative integer. Matches the determined `n` times. For example, `o{2}` cannot match `o` in `for`, but can match two `o` in `foo`. |
63+
|`{n,}`| `n` is a non-negative integer. Match at least `n` times. For example, `o{2,}` cannot match `o` in `for`, but matches all `o` in `foooooo`. `o{1,}` is equivalent to `o+`. `o{0,}` is equivalent to `o*`. |
64+
|`{n,m}`| `m` and `n` are non-negative integers, where `n` is less than or equal to `m`. Matches at least `n` times and matches up to `m` times. For example, `o{1,3}` will match the first three `o` in `foooooo`. `o{0,1}` is equivalent to `o?`. Note that there can be no spaces between the comma and the two numbers. |
65+
66+
With these two tables, we can usually read almost all regular expressions.
67+
68+
## 6.2 `std::regex` and Its Related
69+
70+
The most common way to match string content is to use regular expressions. Unfortunately, in traditional C++, regular expressions have not been supported by the language level, and are not included in the standard library. C++ is a high-performance language. In the development of background services, the use of regular expressions is also used when judging URL resource links. The most mature and common practice in industry.
71+
72+
The general solution is to use the regular expression library of `boost`. C++11 officially incorporates the processing of regular expressions into the standard library, providing standard support from the language level and no longer relying on third parties.
73+
74+
The regular expression library provided by C++11 operates on the `std::string` object, and the pattern `std::regex` (essentially `std::basic_regex`) is initialized and matched by `std::regex_match` Produces `std::smatch` (essentially the `std::match_results` object).
75+
76+
We use a simple example to briefly introduce the use of this library. Consider the following regular expression:
77+
78+
- `[az]+\.txt`: In this regular expression, `[az]` means matching a lowercase letter, `+` can match the previous expression multiple times, so `[az]+` can Matches a string of lowercase letters. In the regular expression, a `.` means to match any character, and `\.` means to match the character `.`, and the last `txt` means to match `txt` exactly three letters. So the content of this regular expression to match is a text file consisting of pure lowercase letters.
79+
80+
`std::regex_match` is used to match strings and regular expressions, and there are many different overloaded forms. The simplest form is to pass `std::string` and a `std::regex` to match. When the match is successful, it will return `true`, otherwise it will return `false`. For example:
81+
82+
```cpp
83+
#include <iostream>
84+
#include <string>
85+
#include <regex>
86+
87+
int main() {
88+
std::string fnames[] = {"foo.txt", "bar.txt", "test", "a0.txt", "AAA.txt"};
89+
// In C++, `\` will be used as an escape character in the string. In order for `\.` to be passed as a regular expression, it is necessary to perform second escaping of `\`, thus we have `\\.`
90+
std::regex txt_regex("[a-z]+\\.txt");
91+
for (const auto &fname: fnames)
92+
std::cout << fname << ": " << std::regex_match(fname, txt_regex) << std::endl;
93+
}
94+
```
95+
96+
Another common form is to pass in the three arguments `std::string`/`std::smatch`/`std::regex`.
97+
The essence of `std::smatch` is actually `std::match_results`.
98+
In the standard library, `std::smatch` is defined as `std::match_results<std::string::const_iterator>`,
99+
which means `match_results` of a substring iterator type.
100+
Use `std::smatch` to easily get the matching results, for example:
101+
102+
```cpp
103+
std::regex base_regex("([a-z]+)\\.txt");
104+
std::smatch base_match;
105+
for(const auto &fname: fnames) {
106+
if (std::regex_match(fname, base_match, base_regex)) {
107+
// the first element of std::smatch matches the entire string
108+
// the second element of std::smatch matches the first expression with brackets
109+
if (base_match.size() == 2) {
110+
std::string base = base_match[1].str();
111+
std::cout << "sub-match[0]: " << base_match[0].str() << std::endl;
112+
std::cout << fname << " sub-match[1]: " << base << std::endl;
113+
}
114+
}
115+
}
116+
```
117+
118+
The output of the above two code snippets is:
119+
120+
```
121+
foo.txt: 1
122+
bar.txt: 1
123+
test: 0
124+
a0.txt: 0
125+
AAA.txt: 0
126+
sub-match[0]: foo.txt
127+
foo.txt sub-match[1]: foo
128+
sub-match[0]: bar.txt
129+
bar.txt sub-match[1]: bar
130+
```
131+
132+
## Conclusion
133+
134+
This section briefly introduces the regular expression itself,
135+
and then introduces the use of the regular expression library
136+
through a practical example based on the main requirements of
137+
using regular expressions.
138+
139+
[Table of Content](./toc.md) | [Previous Chapter](./05-pointers.md) | [Next Chapter: Threads and Concurrency](./07-thread.md)
10140
11141
## Further Readings
12142
143+
1. [Comments from `std::regex`'s author](http://zhihu.com/question/23070203/answer/84248248)
144+
2. [Library document of Regular Expression](http://en.cppreference.com/w/cpp/regex)
145+
13146
## Licenses
14147
15148
<a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-nd/4.0/88x31.png" /></a><br />This work was written by [Ou Changkun](https://changkun.de) and licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/4.0/">Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License</a>. The code of this repository is open sourced under the [MIT license](../../LICENSE).

book/en-us/toc.md

+5-4
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
- Default template parameters
3131
- Variadic templates
3232
- Fold expression
33+
- Non-type template parameter deduction
3334
+ 2.6 Object-oriented
3435
- Delegate constructor
3536
- Inheritance constructor
@@ -68,11 +69,11 @@
6869
+ 5.2 `std::shared_ptr`
6970
+ 5.3 `std::unique_ptr`
7071
- [**Chapter 06 Standard Library: Regular Expression**](./06-regex.md)
71-
+ 6.1 Regular Expression Introduction
72-
+ Normal characters
72+
+ 6.1 Introduction
73+
+ Ordinary characters
7374
+ Special characters
74-
+ Determinative
75-
+ 6.2 `std::regex` and related
75+
+ Quantifiers
76+
+ 6.2 `std::regex` and its related
7677
+ `std::regex`
7778
+ `std::regex_match`
7879
+ `std::match_results`

book/zh-cn/05-pointers.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -173,7 +173,7 @@ int main() {
173173
174174
智能指针这种技术并不新奇,在很多语言中都是一种常见的技术,现代 C++ 将这项技术引进,在一定程度上消除了 `new`/`delete` 的滥用,是一种更加成熟的编程范式。
175175
176-
[返回目录](./toc.md) | [上一章](./04-containers.md) | [下一章 标准库:正则表达式](./06-regex.md)
176+
[返回目录](./toc.md) | [上一章](./04-containers.md) | [下一章 正则表达式](./06-regex.md)
177177
178178
## 进一步阅读的参考资料
179179

book/zh-cn/06-regex.md

+110-15
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,10 @@
11
---
2-
title: 第 6 章 标准库:正则表达式
2+
title: 第 6 章 正则表达式
33
type: book-zh-cn
44
order: 6
55
---
66

7-
# 第 6 章 标准库:正则表达式
8-
9-
> 内容修订中
7+
# 第 6 章 正则表达式
108

119
[TOC]
1210

@@ -59,21 +57,31 @@ order: 6
5957
|`{n,}`| `n` 是一个非负整数。至少匹配 `n` 次。例如,`o{2,}` 不能匹配 `for` 中的 `o`,但能匹配 `foooooo` 中的所有 `o``o{1,}` 等价于 `o+``o{0,}` 则等价于 `o*`|
6058
|`{n,m}`| `m``n` 均为非负整数,其中 `n` 小于等于 `m`。最少匹配 `n` 次且最多匹配 `m` 次。例如,`o{1,3}` 将匹配 `foooooo` 中的前三个 `o``o{0,1}` 等价于 `o?`。注意,在逗号和两个数之间不能有空格。|
6159

62-
有了这三张表,我们通常就能够读懂几乎所有的正则表达式了。
60+
有了这两张表,我们通常就能够读懂几乎所有的正则表达式了。
6361

6462
## 6.2 std::regex 及其相关
6563

66-
对字符串内容进行匹配的最常见手段就是使用正则表达式。可惜在传统 C++ 中正则表达式一直没有得到语言层面的支持,没有纳入标准库,而 C++ 作为一门高性能语言,在后台服务的开发中,对 URL 资源链接进行判断时,使用正则表达式也是工业界最为成熟的普遍做法。
64+
对字符串内容进行匹配的最常见手段就是使用正则表达式。
65+
可惜在传统 C++ 中正则表达式一直没有得到语言层面的支持,没有纳入标准库,
66+
而 C++ 作为一门高性能语言,在后台服务的开发中,对 URL 资源链接进行判断时,
67+
使用正则表达式也是工业界最为成熟的普遍做法。
6768

68-
一般的解决方案就是使用 `boost` 的正则表达式库。而 C++11 正式将正则表达式的的处理方法纳入标准库的行列,从语言级上提供了标准的支持,不再依赖第三方。
69+
一般的解决方案就是使用 `boost` 的正则表达式库。
70+
而 C++11 正式将正则表达式的的处理方法纳入标准库的行列,从语言级上提供了标准的支持,
71+
不再依赖第三方。
6972

70-
C++11 提供的正则表达式库操作 `std::string` 对象,模式 `std::regex` (本质是 `std::basic_regex`)进行初始化,通过 `std::regex_match` 进行匹配,从而产生 `std::smatch` (本质是 `std::match_results` 对象)。
73+
C++11 提供的正则表达式库操作 `std::string` 对象,
74+
模式 `std::regex` (本质是 `std::basic_regex`)进行初始化,
75+
通过 `std::regex_match` 进行匹配,
76+
从而产生 `std::smatch` (本质是 `std::match_results` 对象)。
7177

72-
我们通过一个简单的例子来简单介绍这个库的使用。考虑下面的正则表达式
78+
我们通过一个简单的例子来简单介绍这个库的使用。考虑下面的正则表达式:
7379

7480
- `[a-z]+\.txt`: 在这个正则表达式中, `[a-z]` 表示匹配一个小写字母, `+` 可以使前面的表达式匹配多次,因此 `[a-z]+` 能够匹配一个小写字母组成的字符串。在正则表达式中一个 `.` 表示匹配任意字符,而 `\.` 则表示匹配字符 `.`,最后的 `txt` 表示严格匹配 `txt` 则三个字母。因此这个正则表达式的所要匹配的内容就是由纯小写字母组成的文本文件。
7581

76-
`std::regex_match` 用于匹配字符串和正则表达式,有很多不同的重载形式。最简单的一个形式就是传入 `std::string` 以及一个 `std::regex` 进行匹配,当匹配成功时,会返回 `true`,否则返回 `false`。例如:
82+
`std::regex_match` 用于匹配字符串和正则表达式,有很多不同的重载形式。
83+
最简单的一个形式就是传入 `std::string` 以及一个 `std::regex` 进行匹配,
84+
当匹配成功时,会返回 `true`,否则返回 `false`。例如:
7785

7886
```cpp
7987
#include <iostream>
@@ -89,15 +97,19 @@ int main() {
8997
}
9098
```
9199

92-
另一种常用的形式就是依次传入 `std::string`/`std::smatch`/`std::regex` 三个参数,其中 `std::smatch` 的本质其实是 `std::match_results`,在标准库中, `std::smatch` 被定义为了 `std::match_results<std::string::const_iterator>`,也就是一个子串迭代器类型的 `match_results`。使用 `std::smatch` 可以方便的对匹配的结果进行获取,例如:
100+
另一种常用的形式就是依次传入 `std::string`/`std::smatch`/`std::regex` 三个参数,
101+
其中 `std::smatch` 的本质其实是 `std::match_results`
102+
在标准库中, `std::smatch` 被定义为了 `std::match_results<std::string::const_iterator>`
103+
也就是一个子串迭代器类型的 `match_results`
104+
使用 `std::smatch` 可以方便的对匹配的结果进行获取,例如:
93105

94106
```cpp
95107
std::regex base_regex("([a-z]+)\\.txt");
96108
std::smatch base_match;
97109
for(const auto &fname: fnames) {
98110
if (std::regex_match(fname, base_match, base_regex)) {
99-
// sub_match 的第一个元素匹配整个字符串
100-
// sub_match 的第二个元素匹配了第一个括号表达式
111+
// std::smatch 的第一个元素匹配整个字符串
112+
// std::smatch 的第二个元素匹配了第一个括号表达式
101113
if (base_match.size() == 2) {
102114
std::string base = base_match[1].str();
103115
std::cout << "sub-match[0]: " << base_match[0].str() << std::endl;
@@ -126,9 +138,92 @@ bar.txt sub-match[1]: bar
126138
127139
本节简单介绍了正则表达式本身,然后根据使用正则表达式的主要需求,通过一个实际的例子介绍了正则表达式库的使用。
128140
129-
> 本节提到的内容足以让我们开发编写一个简单的 Web 框架中关于URL匹配的功能,请参考习题 TODO
141+
## 习题
142+
143+
1. 在 Web 服务器开发中,我们通常希望服务某些满足某个条件的路由。正则表达式便是完成这一目标的工具之一。
144+
145+
给定如下请求结构:
146+
147+
```cpp
148+
struct Request {
149+
// request method, POST, GET; path; HTTP version
150+
std::string method, path, http_version;
151+
// use smart pointer for reference counting of content
152+
std::shared_ptr<std::istream> content;
153+
// hash container, key-value dict
154+
std::unordered_map<std::string, std::string> header;
155+
// use regular expression for path match
156+
std::smatch path_match;
157+
};
158+
```
159+
160+
请求的资源类型:
161+
162+
```cpp
163+
typedef std::map<
164+
std::string, std::unordered_map<
165+
std::string,std::function<void(std::ostream&, Request&)>>> resource_type;
166+
```
167+
168+
以及服务端模板:
169+
170+
```cpp
171+
template <typename socket_type>
172+
class ServerBase {
173+
public:
174+
resource_type resource;
175+
resource_type default_resource;
176+
177+
void start() {
178+
// TODO
179+
}
180+
protected:
181+
Request parse_request(std::istream& stream) const {
182+
// TODO
183+
}
184+
}
185+
```
186+
187+
请实现成员函数 `start()` 与 `parse_request`。使得服务器模板使用者可以如下指定路由:
188+
189+
```cpp
190+
template<typename SERVER_TYPE>
191+
void start_server(SERVER_TYPE &server) {
192+
193+
// process GET request for /match/[digit+numbers], e.g. GET request is /match/abc123, will return abc123
194+
server.resource["^/match/([0-9a-zA-Z]+)/?$"]["GET"] = [](ostream& response, Request& request) {
195+
string number=request.path_match[1];
196+
response << "HTTP/1.1 200 OK\r\nContent-Length: " << number.length() << "\r\n\r\n" << number;
197+
};
198+
199+
// peocess default GET request; anonymous function will be called if no other matches
200+
// response files in folder web/
201+
// default: index.html
202+
server.default_resource["^/?(.*)$"]["GET"] = [](ostream& response, Request& request) {
203+
string filename = "www/";
204+
205+
string path = request.path_match[1];
206+
207+
// forbidden use `..` access content outside folder web/
208+
size_t last_pos = path.rfind(".");
209+
size_t current_pos = 0;
210+
size_t pos;
211+
while((pos=path.find('.', current_pos)) != string::npos && pos != last_pos) {
212+
current_pos = pos;
213+
path.erase(pos, 1);
214+
last_pos--;
215+
}
216+
217+
// (...)
218+
};
219+
220+
server.start();
221+
}
222+
```
223+
224+
参考答案[见此](../../exercises/6)
130225

131-
[返回目录](./toc.md) | [上一章](./05-pointers.md) | [下一章 标准库:线程与并发](./07-thread.md)
226+
[返回目录](./toc.md) | [上一章](./05-pointers.md) | [下一章 线程与并发](./07-thread.md)
132227

133228
## 进一步阅读的参考资料
134229

book/zh-cn/toc.md

+1
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
- 默认模板参数
3131
- 变长参数模板
3232
- 折叠表达式
33+
- 非类型模板参数推导
3334
+ 2.6 面向对象
3435
- 委托构造
3536
- 继承构造

0 commit comments

Comments
 (0)