-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathatom.xml
421 lines (235 loc) · 609 KB
/
atom.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Just for Life.</title>
<subtitle>明月更几时</subtitle>
<link href="https://muyuuuu.github.io/atom.xml" rel="self"/>
<link href="https://muyuuuu.github.io/"/>
<updated>2024-12-26T15:58:27.037Z</updated>
<id>https://muyuuuu.github.io/</id>
<author>
<name>兰铃</name>
</author>
<generator uri="https://hexo.io/">Hexo</generator>
<entry>
<title>如何看懂 nndeploy</title>
<link href="https://muyuuuu.github.io/2024/12/26/nndeploy-1/"/>
<id>https://muyuuuu.github.io/2024/12/26/nndeploy-1/</id>
<published>2024-12-26T15:48:50.000Z</published>
<updated>2024-12-26T15:58:27.037Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" class="aplayer-secondary-style-marker" href="/assets/css/APlayer.min.css"><script src="/assets/js/APlayer.min.js" class="aplayer-secondary-script-marker"></script><p>最近这半年实在是闲,秉承着下班少玩手机的目的,7 月开始学 <code>cuda</code>,8 9 月学了 <code>C++</code>,10 月懈怠了一个月,11 月学了 <code>cuda</code> 进阶,12 月我来祸害 <a href="https://github.com/nndeploy/nndeploy"><code>nndeploy</code></a> 了。</p><p>一来是学完 <code>C++</code> 后看下我能看懂的优秀的开源项目,一方面在之前学校训练模型感觉没意思,是看看 <code>AI</code> 的工程化。</p><span id="more"></span><blockquote><p>个人背景</p></blockquote><ol><li><code>C</code> 和 <code>C++</code> 薄弱,上班才开始学,<code>cmake</code> 也是上班后学的,项目经验少,只能看懂简单一些的</li><li>上学那会儿会用 <code>pytorch</code> 训练模型,用过常见的 CV、NLP 模型。对大模型完全未知</li><li>计算机出身,对线程池、内存池、有向无环、多级流水不陌生</li><li>工作一年,会用 <code>neon</code>、<code>OpenCL</code> 写算子。下班时间自学了 <code>CUDA</code></li><li>对部署、推理框架完全未知,全凭兴趣,代码一点点看吧</li><li>由于涉及相当多的知识,会以超链接的形式给出,语法知识点不在解释</li><li>以我看完代码的体验而言,<code>C++</code>,多线程,数据结构,AI 算法都得了解,不然代码会看的很难受</li><li>模型推理是模型部署中的重点,所以会重点看一下,所以即使标题中有推理引擎的部分,但它也只是计算图中的一个节点。由于下班时间自学了 <code>CUDA</code>,所以推理引擎部分选用的是 <code>tensorrt</code>。</li></ol><h1 id="从-main-函数开始"><a href="#从-main-函数开始" class="headerlink" title="从 main 函数开始"></a>从 main 函数开始</h1><h2 id="获取参数"><a href="#获取参数" class="headerlink" title="获取参数"></a>获取参数</h2><p>说实话打开项目的时候,这么多文件夹我都没找到入口在哪。<code>cmake</code> 中生成可执行文件的命令为:<code>add_executable</code>,搜索这个关键字,定位到了是 <code>demo</code> 文件夹。以检测为例,打开 <code>demo/detect/demo.cc</code> 开始阅读。</p><p>看到 <code>main()</code> 函数的时候发现了未知的 <code>gflags</code>,<code>vscode</code> 中甚至无法跳转。一般而言是第三方库,打开网页搜索,果然……如果想使用这个库,可以看<a href="https://github.com/AngryHacker/articles/blob/master/src/open_source_components/google_gflags.md">这里</a>。</p><p>那么 <code>main()</code> 函数里的这段代码,都是获取用户的输入,并创建对应的数据类型:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br></pre></td><td class="code"><pre><span class="line">gflags::<span class="built_in">ParseCommandLineNonHelpFlags</span>(&argc, &argv, <span class="literal">true</span>);</span><br><span class="line"><span class="keyword">if</span> (demo::FLAGS_usage) {</span><br><span class="line"> demo::<span class="built_in">showUsage</span>();</span><br><span class="line"> <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="comment">// 检测模型的有向无环图graph名称,例如</span></span><br><span class="line"><span class="comment">// NNDEPLOY_YOLOV5/NNDEPLOY_YOLOV6/NNDEPLOY_YOLOV8</span></span><br><span class="line">std::string name = demo::<span class="built_in">getName</span>();</span><br><span class="line"><span class="comment">// 推理后端类型,例如:</span></span><br><span class="line"><span class="comment">// kInferenceTypeOpenVino / kInferenceTypeTensorRt / kInferenceTypeOnnxRuntime</span></span><br><span class="line">base::InferenceType inference_type = demo::<span class="built_in">getInferenceType</span>();</span><br><span class="line"><span class="comment">// 推理设备类型,例如:</span></span><br><span class="line"></span><br><span class="line"><span class="comment">// kDeviceTypeCodeX86:0/kDeviceTypeCodeCuda:0/...</span></span><br><span class="line">base::DeviceType device_type = demo::<span class="built_in">getDeviceType</span>();</span><br><span class="line"></span><br><span class="line"><span class="comment">// 模型类型,例如:</span></span><br><span class="line"><span class="comment">// kModelTypeOnnx/kModelTypeMnn/...</span></span><br><span class="line">base::ModelType model_type = demo::<span class="built_in">getModelType</span>();</span><br><span class="line"></span><br><span class="line"><span class="comment">// 模型是否是路径</span></span><br><span class="line"><span class="type">bool</span> is_path = demo::<span class="built_in">isPath</span>();</span><br><span class="line"></span><br><span class="line"><span class="comment">// 模型路径或者模型字符串</span></span><br><span class="line">std::vector<std::string> model_value = demo::<span class="built_in">getModelValue</span>();</span><br><span class="line"></span><br><span class="line"><span class="comment">// input path</span></span><br><span class="line">std::string input_path = demo::<span class="built_in">getInputPath</span>();</span><br><span class="line"></span><br><span class="line"><span class="comment">// input path</span></span><br><span class="line">base::CodecFlag codec_flag = demo::<span class="built_in">getCodecFlag</span>();</span><br><span class="line"><span class="comment">// output path</span></span><br><span class="line">std::string ouput_path = demo::<span class="built_in">getOutputPath</span>();</span><br><span class="line"><span class="comment">// base::kParallelTypePipeline / base::kParallelTypeSequential</span></span><br><span class="line">base::ParallelType pt = demo::<span class="built_in">getParallelType</span>();</span><br></pre></td></tr></table></figure><p>以下面的代码为例:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">base::ParallelType pt = demo::<span class="built_in">getParallelType</span>();</span><br></pre></td></tr></table></figure><p><code>ParallelType</code> 的定义为 :</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">enum</span> <span class="title class_">ParallelType</span> : <span class="type">int</span> {</span><br><span class="line"> kParallelTypeNone = <span class="number">0x0001</span>,</span><br><span class="line"> kParallelTypeSequential = <span class="number">0x0001</span> << <span class="number">1</span>,</span><br><span class="line"> kParallelTypeTask = <span class="number">0x0001</span> << <span class="number">2</span>,</span><br><span class="line"> kParallelTypePipeline = <span class="number">0x0001</span> << <span class="number">3</span>,</span><br><span class="line">};</span><br></pre></td></tr></table></figure><p>只能从单词的意思上看出,这个参数表示部署的任务是并行还是串行。</p><h1 id="计算图"><a href="#计算图" class="headerlink" title="计算图"></a>计算图</h1><h2 id="计算图创建"><a href="#计算图创建" class="headerlink" title="计算图创建"></a>计算图创建</h2><p>学过数据结构的话,对图都不陌生,用边把节点连接起来。</p><h3 id="Edge-定义"><a href="#Edge-定义" class="headerlink" title="Edge 定义"></a>Edge 定义</h3><p>之后就是定义图的边:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 有向无环图graph的输入边packert</span></span><br><span class="line"><span class="function">dag::Edge <span class="title">input</span><span class="params">(<span class="string">"detect_in"</span>)</span></span>;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 有向无环图graph的输出边packert</span></span><br><span class="line"><span class="function">dag::Edge <span class="title">output</span><span class="params">(<span class="string">"detect_out"</span>)</span></span>;</span><br></pre></td></tr></table></figure><p>打开 <code>Edge</code> 这个类简单阅读一下,发现它继承自 <code>NonCopyable</code>:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">NNDEPLOY_CC_API</span> Edge : <span class="keyword">public</span> base::NonCopyable</span><br></pre></td></tr></table></figure><p>简而言之这个类被禁用了拷贝构造赋值,移动构造和赋值,我们给他留下一个数据不可被拷贝的印象就可以了。额外的,<code>NNDEPLOY_CC_API</code> 是项目中常见的宏定义,一般生成动态链接库会选择 <code>release</code> 模式。<code>NNDEPLOY_CC_API</code> 则会控制符号表是否对外可见,运行时出错时可以根据出错地址找到对应的符号,也就是哪个函数报错了。</p><p>对 <code>Edge</code> 的方法进行大致浏览,可以分为内存和节点位置索引相关:</p><ul><li>内存:可以操作 <code>buffer、tensor</code> 和 <code>param</code></li></ul><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">device::Buffer *<span class="title">create</span><span class="params">(device::Device *device, <span class="type">const</span> device::BufferDesc &desc,</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">int</span> index)</span></span>;</span><br><span class="line"><span class="function">base::Status <span class="title">set</span><span class="params">(device::Buffer &buffer, <span class="type">int</span> index)</span></span>;</span><br><span class="line"><span class="function">device::Buffer *<span class="title">getBuffer</span><span class="params">(<span class="type">const</span> Node *node)</span></span>;</span><br><span class="line"></span><br><span class="line"><span class="function">base::Status <span class="title">set</span><span class="params">(device::Tensor *tensor, <span class="type">int</span> index, <span class="type">bool</span> is_external = <span class="literal">true</span>)</span></span>;</span><br><span class="line"><span class="function">base::Status <span class="title">set</span><span class="params">(device::Tensor &tensor, <span class="type">int</span> index)</span></span>;</span><br><span class="line"><span class="function">device::Tensor *<span class="title">create</span><span class="params">(device::Device *device, <span class="type">const</span> device::TensorDesc &desc,</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">int</span> index)</span></span>;</span><br><span class="line"></span><br><span class="line"><span class="function">base::Status <span class="title">set</span><span class="params">(base::Param *param, <span class="type">int</span> index, <span class="type">bool</span> is_external = <span class="literal">true</span>)</span></span>;</span><br><span class="line"><span class="function">base::Status <span class="title">set</span><span class="params">(base::Param &param, <span class="type">int</span> index)</span></span>;</span><br><span class="line"><span class="function">base::Param *<span class="title">getParam</span><span class="params">(<span class="type">const</span> Node *node)</span></span>;</span><br><span class="line"><span class="function">base::Param *<span class="title">getGraphOutputParam</span><span class="params">()</span></span>;</span><br></pre></td></tr></table></figure><ul><li>节点位置相关,目测是获取索引或者位置</li></ul><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">int</span> <span class="title">getIndex</span><span class="params">(<span class="type">const</span> Node *node)</span></span>;</span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">getGraphOutputIndex</span><span class="params">()</span></span>;</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">getPosition</span><span class="params">(<span class="type">const</span> Node *node)</span></span>;</span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">getGraphOutputPosition</span><span class="params">()</span></span>;</span><br></pre></td></tr></table></figure><p>至于更多内容,需要的时候再看。</p><h3 id="Graph-定义"><a href="#Graph-定义" class="headerlink" title="Graph 定义"></a>Graph 定义</h3><p><code>graph</code> 类继承自 <code>Node</code> 类:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">NNDEPLOY_CC_API</span> Graph : <span class="keyword">public</span> Node</span><br></pre></td></tr></table></figure><p>初步推测一个 <code>graph</code> 可以视为一个节点,被添加到其他 <code>graph</code> 中。简单浏览 <code>Node</code> 类的方法,发现它可以获取 <code>Edge</code>:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">std::vector<Edge *> <span class="title">getAllInput</span><span class="params">()</span></span>;</span><br><span class="line"><span class="function">std::vector<Edge *> <span class="title">getAllOutput</span><span class="params">()</span></span>;</span><br></pre></td></tr></table></figure><p>以及设置和获取一些运行时信息:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">void</span> <span class="title">setDebugFlag</span><span class="params">(<span class="type">bool</span> flag)</span></span>;</span><br><span class="line"><span class="function"><span class="type">bool</span> <span class="title">getDebugFlag</span><span class="params">()</span></span>;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">setRunningFlag</span><span class="params">(<span class="type">bool</span> flag)</span></span>;</span><br><span class="line"><span class="function"><span class="type">bool</span> <span class="title">isRunning</span><span class="params">()</span></span>;</span><br></pre></td></tr></table></figure><p>以 <code>setRunningFlag</code> 为例,发现当打开 <code>is_time_profile_</code> 选项后,当 <code>ENABLE_NNDEPLOY_TIME_PROFILER</code> 宏开启时,会通过 <code>NNDEPLOY_TIME_POINT_START</code> <a href="https://muyuuuu.github.io/2024/02/03/define-macro/">宏定义</a>去记录 <code>node</code> 的执行时间。额外的,<code>NNDEPLOY_LOGE</code> 日志函数,<code>NNDEPLOY_RETURN_ON_NEQ</code> 返回状态检查也是通过<a href="https://muyuuuu.github.io/2024/02/03/define-macro/">宏定义</a>和 <code>do-while(0)</code> 的技巧实现的。</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">void</span> <span class="title">Node::setRunningFlag</span><span class="params">(<span class="type">bool</span> flag)</span> </span>{</span><br><span class="line"> is_running_ = flag;</span><br><span class="line"> <span class="keyword">if</span> (is_time_profile_) {</span><br><span class="line"> <span class="keyword">if</span> (is_running_) {</span><br><span class="line"> <span class="built_in">NNDEPLOY_TIME_POINT_START</span>(name_ + <span class="string">" run()"</span>);</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="built_in">NNDEPLOY_TIME_POINT_END</span>(name_ + <span class="string">" run()"</span>);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (is_debug_) {</span><br><span class="line"> <span class="keyword">if</span> (is_running_) {</span><br><span class="line"> <span class="built_in">NNDEPLOY_LOGE</span>(<span class="string">"%s run start.\n"</span>, name_.<span class="built_in">c_str</span>());</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="built_in">NNDEPLOY_LOGE</span>(<span class="string">"%s run end.\n"</span>, name_.<span class="built_in">c_str</span>());</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>那么简单的推测:<code>Edge</code> 负责资源申请等管理,<code>Node</code> 负责调度资源运行。因为之前完全没接触过部署相关的项目,所以一边看代码一边猜测了。</p><p>至于 <code>Graph</code> 这个类,作者的注释很详细了,创建 <code>Node</code> 和 <code>Edge</code>,具体如何使用,继续往下看。</p><h3 id="Graph-注册"><a href="#Graph-注册" class="headerlink" title="Graph 注册"></a>Graph 注册</h3><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">dag::Graph *graph = <span class="keyword">new</span> dag::<span class="built_in">Graph</span>(<span class="string">"demo"</span>, <span class="literal">nullptr</span>, &output);</span><br></pre></td></tr></table></figure><p>这里就是创建了一个图,并且把之前创建的 <code>Edge</code> 添加了进去。不过迷惑一些的在后面:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 创建检测模型有向无环图graph</span></span><br><span class="line">dag::Graph *detect_graph =</span><br><span class="line"> dag::<span class="built_in">createGraph</span>(name, inference_type, device_type, &input, &output,</span><br><span class="line"> model_type, is_path, model_value);</span><br><span class="line"><span class="keyword">if</span> (detect_graph == <span class="literal">nullptr</span>) {</span><br><span class="line"> <span class="built_in">NNDEPLOY_LOGE</span>(<span class="string">"detect_graph is nullptr"</span>);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p><code>createGraph</code> 函数跳转进去,我看了十分钟寻思没看错呀,会直接返回空指针,报错退出。后面发现在 <code>getGlobalGraphCreatorMap</code> 中有两个变量是 <a href="https://www.runoob.com/w3cnote/cpp-static-usage.html"><code>static</code></a> 的,莫非在其他地方这个函数被调用过了?</p><p>又浏览了下目标检测相关头文件 <code>yolo.h</code>,以及这个文件夹下 <code>config.cmake</code> 的写法:</p><figure class="highlight cmake"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">file</span>(GLOB_RECURSE SOURCE</span><br><span class="line"> <span class="string">"${ROOT_PATH}/demo/detect/*.h"</span></span><br><span class="line"> <span class="string">"${ROOT_PATH}/demo/detect/*.cc"</span></span><br><span class="line">)</span><br><span class="line"><span class="keyword">file</span>(GLOB DEMO_SOURCE</span><br><span class="line"> <span class="string">"${ROOT_PATH}/demo/*.h"</span></span><br><span class="line"> <span class="string">"${ROOT_PATH}/demo/*.cc"</span></span><br><span class="line">)</span><br><span class="line"><span class="keyword">set</span>(SOURCE <span class="variable">${SOURCE}</span> <span class="variable">${DEMO_SOURCE}</span>)</span><br><span class="line"><span class="comment"># OBJECT</span></span><br><span class="line"><span class="comment"># BINARY</span></span><br><span class="line"><span class="keyword">add_executable</span>(<span class="variable">${BINARY}</span> <span class="variable">${SOURCE}</span> <span class="variable">${OBJECT}</span>)</span><br></pre></td></tr></table></figure><p>发现在 <code>using namespace nndeploy</code> 时,在 <code>yolo.cc</code> 中已经注册过了:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 编程规范:g_ 开头的变量是全局变量</span></span><br><span class="line"><span class="function">dag::TypeGraphRegister <span class="title">g_register_yolov5_graph</span><span class="params">(NNDEPLOY_YOLOV5,</span></span></span><br><span class="line"><span class="params"><span class="function"> createYoloV5Graph)</span></span>;</span><br><span class="line"></span><br><span class="line"><span class="function">dag::TypeGraphRegister <span class="title">g_register_yolov6_graph</span><span class="params">(NNDEPLOY_YOLOV6,</span></span></span><br><span class="line"><span class="params"><span class="function"> createYoloV6Graph)</span></span>;</span><br><span class="line"></span><br><span class="line"><span class="function">dag::TypeGraphRegister <span class="title">g_register_yolov8_graph</span><span class="params">(NNDEPLOY_YOLOV8,</span></span></span><br><span class="line"><span class="params"><span class="function"> createYoloV8Graph)</span></span>;</span><br></pre></td></tr></table></figure><p>我也第一次见这种形式的代码,是通过注册全局变量的形式调用图创建函数。简化一下代码:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><iostream></span></span></span><br><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><map></span></span></span><br><span class="line"></span><br><span class="line"><span class="built_in">std</span>::<span class="built_in">map</span><<span class="built_in">std</span>::<span class="built_in">string</span>, <span class="type">int</span>> <span class="built_in">map</span>;</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">TypeGraphRegister</span> {</span></span><br><span class="line"> public:</span><br><span class="line"> <span class="comment">// explicit 不允许隐式类型转换</span></span><br><span class="line"> explicit <span class="title function_">TypeGraphRegister</span><span class="params">(<span class="type">const</span> <span class="built_in">std</span>::<span class="built_in">string</span> &name, <span class="type">int</span> v)</span> {</span><br><span class="line"> <span class="built_in">map</span>[name] = v;</span><br><span class="line"> }</span><br><span class="line">};</span><br><span class="line"></span><br><span class="line">namespace A{</span><br><span class="line"> TypeGraphRegister a{<span class="string">"a"</span>, <span class="number">1</span>};</span><br><span class="line">};</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="type">int</span> <span class="title function_">main</span><span class="params">()</span> {</span><br><span class="line"> using namespace A;</span><br><span class="line"> <span class="built_in">std</span>::<span class="built_in">cerr</span> << <span class="built_in">map</span>[<span class="string">"a"</span>] << <span class="built_in">std</span>::<span class="built_in">endl</span>;</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>用全局变量是因为,无法在名称空间中进行变量赋值,也就是下面的代码是错误的:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><iostream></span></span></span><br><span class="line"></span><br><span class="line">namespace A {</span><br><span class="line"> <span class="type">int</span> a = <span class="number">1</span>;</span><br><span class="line"> a += <span class="number">4</span>; <span class="comment">// 错误</span></span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="type">int</span> <span class="title function_">main</span><span class="params">()</span> {</span><br><span class="line"> using namespace A;</span><br><span class="line"> <span class="built_in">std</span>::<span class="built_in">cerr</span> << A::a << <span class="built_in">std</span>::<span class="built_in">endl</span>;</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h3 id="创建目标检测图"><a href="#创建目标检测图" class="headerlink" title="创建目标检测图"></a>创建目标检测图</h3><p>以 <code>yolov5</code> 为例,会调用 <code>createYoloV5Graph</code> 函数,根据用户指定的 <code>inference_type</code> 推理类型,<code>device_type</code> 设备类型等信息创建目标检测的计算图。</p><p>先创建一个图:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">dag::Graph *graph = <span class="keyword">new</span> dag::<span class="built_in">Graph</span>(name, input, output);</span><br></pre></td></tr></table></figure><p>之后在输入边 <code>input</code> 边和推理边 <code>infer_input</code> 边直接增加节点 <code>pre</code>,完成颜色空间转换和 <code>resize</code>,印象中目标检测模型是要把输入的图像 <code>resize</code> 到固定的尺寸来。</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">dag::Node *pre = graph-><span class="built_in">createNode</span><preprocess::CvtColorResize>(</span><br><span class="line"> <span class="string">"preprocess"</span>, input, infer_input);</span><br><span class="line">preprocess::CvtclorResizeParam *pre_param =</span><br><span class="line"> <span class="built_in">dynamic_cast</span><preprocess::CvtclorResizeParam *>(pre-><span class="built_in">getParam</span>());</span><br><span class="line">pre_param->src_pixel_type_ = base::kPixelTypeBGR;</span><br><span class="line">pre_param->dst_pixel_type_ = base::kPixelTypeRGB;</span><br><span class="line">pre_param->interp_type_ = base::kInterpTypeLinear;</span><br><span class="line">pre_param->h_ = <span class="number">640</span>;</span><br><span class="line">pre_param->w_ = <span class="number">640</span>;</span><br></pre></td></tr></table></figure><p>推理输入边 <code>infer_input</code> 和推理输出边 <code>infer_output</code> 之间增加推理节点 <code>infer</code> 完成模型推理。同理,推理结束后增加 <code>post</code> 节点,完成目标检测中的 <a href="https://github.com/luanshiyinyang/NMS"><code>nms</code> 抑制</a>,置信度筛选等:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">dag::Node *infer = graph-><span class="built_in">createInfer</span><infer::Infer>(</span><br><span class="line"> <span class="string">"infer"</span>, inference_type, infer_input, infer_output);</span><br><span class="line">dag::Node *post =</span><br><span class="line"> graph-><span class="built_in">createNode</span><YoloPostProcess>(<span class="string">"postprocess"</span>, infer_output, output);</span><br></pre></td></tr></table></figure><p>对于 <code>createNode<YoloPostProcess></code> 形式的调用,看一下 <code>createNode</code> 方法:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">template</span> <<span class="keyword">typename</span> T, <span class="keyword">typename</span>... Args,</span><br><span class="line"> <span class="keyword">typename</span> std::enable_if<std::is_base_of<Node, T>{}, <span class="type">int</span>>::type></span><br><span class="line"><span class="function">Node *<span class="title">Graph::createNode</span><span class="params">(<span class="type">const</span> std::string &name, Edge *input, Edge *output,</span></span></span><br><span class="line"><span class="params"><span class="function"> Args &...args)</span></span></span><br></pre></td></tr></table></figure><p>模板那里的写法是 <a href="https://github.com/wuye9036/CppTemplateTutorial"><code>SFINAE</code></a>,有兴趣可以看下。</p><p>额外的,增加的 <code>pre, infer, post</code> 这些节点都继承自 <code>Node</code> 类,并实现了 <code>run</code> 方法。启动计算图时,通过 <code>Node</code> 基类去调用 <code>node</code> 的 <code>run</code> 方法,这样就可以执行计算图中的所有 <code>node</code> 节点。</p><h4 id="目标检测图中的推理引擎"><a href="#目标检测图中的推理引擎" class="headerlink" title="目标检测图中的推理引擎"></a>目标检测图中的推理引擎</h4><p>创建推理节点:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">dag::Node *infer = graph-><span class="built_in">createInfer</span><model::Infer>(</span><br><span class="line"> <span class="string">"infer"</span>, inference_type, infer_input, infer_output);</span><br></pre></td></tr></table></figure><p>首先是构造 <code>Infer</code> 这个类:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">Infer</span>(<span class="type">const</span> std::string &name, base::InferenceType type,</span><br><span class="line"> std::initializer_list<dag::Edge *> inputs,</span><br><span class="line"> std::initializer_list<dag::Edge *> outputs);</span><br></pre></td></tr></table></figure><p>在这个构造函数中,调用 <code>Node</code> 构造函数传入输入输出边外,还有创建推理引擎:<code>inference::createInference(type);</code>。推理引擎的创建和之前的全局注册一样,以 <code>tensorrt</code> 为例,会创建一个全局变量:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">TypeInferenceRegister<TypeInferenceCreator<TensorRtInference>></span><br><span class="line"> <span class="built_in">g_tensorrt_inference_register</span>(base::kInferenceTypeTensorRt);</span><br></pre></td></tr></table></figure><p>在创建 <code>TensorRtInference</code> 的时候,会调用 <code>Inference</code> 的构造函数创建参数:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">Inference::<span class="built_in">Inference</span>(base::InferenceType type) {</span><br><span class="line"> type_ = type;</span><br><span class="line"> inference_param_ = <span class="built_in">createInferenceParam</span>(type);</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>参数创建对应的代码是:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">TensorRtInferenceParam::<span class="built_in">TensorRtInferenceParam</span>() : <span class="built_in">InferenceParam</span>() {</span><br><span class="line"> model_type_ = base::kModelTypeOnnx;</span><br><span class="line"> device_type_.code_ = base::kDeviceTypeCodeCuda;</span><br><span class="line"> device_type_.device_id_ = <span class="number">0</span>;</span><br><span class="line"> gpu_tune_kernel_ = <span class="number">1</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>可以看到 <code>tensorrt</code> 运行在 <code>cuda</code> 的单卡上面。</p><p>之后将检测模型视为 <code>node</code> 添加到 <code>graph</code> 中:<code>graph->addNode(detect_graph);</code>。</p><p>后面是创建解码节点和编码节点,<code>createDecodeNode</code> 和 <code>createGraph</code> 在实现逻辑上是类似的,大概猜测是对输入的图像或者视频进行编码解码。两者均位于 <code>codec</code> 名称空间下,中间还有一个 <code>drawbox</code> 节点,会调用 <code>opencv</code> 画出图片中的检测框。</p><p>图这部分的流程如下图所示:</p><p><img data-src="https://s21.ax1x.com/2024/12/26/pAvD454.png" alt></p><h3 id="wrapper-相关"><a href="#wrapper-相关" class="headerlink" title="wrapper 相关"></a>wrapper 相关</h3><h4 id="EdgeWrapper"><a href="#EdgeWrapper" class="headerlink" title="EdgeWrapper"></a>EdgeWrapper</h4><p>在调用 <code>createEdge</code> 的时候,将每个 <code>edge</code> 封装成 <code>edge_warpper</code>,放到当前图的 <code>edge_repository_</code> 里面。这里使用 <code>new</code> 申请 <code>edge_wrapper</code> ,不恰当的释放、程序异常退出没调用析构函数时,会有内存泄漏。</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">Edge *<span class="title">Graph::createEdge</span><span class="params">(<span class="type">const</span> std::string &name)</span> </span>{</span><br><span class="line"> Edge *edge = <span class="keyword">new</span> <span class="built_in">Edge</span>(name);</span><br><span class="line"> EdgeWrapper *edge_wrapper = <span class="keyword">new</span> <span class="built_in">EdgeWrapper</span>();</span><br><span class="line"> edge_wrapper->is_external_ = <span class="literal">false</span>;</span><br><span class="line"> edge_wrapper->edge_ = edge;</span><br><span class="line"> edge_wrapper->name_ = name;</span><br><span class="line"> edge_repository_.<span class="built_in">emplace_back</span>(edge_wrapper);</span><br><span class="line"> <span class="keyword">return</span> edge;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>而 <code>EdgeWrapper</code> 类的代码如下,<code>producers_</code> 和 <code>consumers_</code> 推测用于管理边的输入节点和输出节点。</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">NNDEPLOY_CC_API</span> EdgeWrapper {</span><br><span class="line"> <span class="keyword">public</span>:</span><br><span class="line"> <span class="type">bool</span> is_external_;</span><br><span class="line"> Edge *edge_;</span><br><span class="line"> std::string name_;</span><br><span class="line"> std::vector<NodeWrapper *> producers_;</span><br><span class="line"> std::vector<NodeWrapper *> consumers_;</span><br><span class="line">};</span><br></pre></td></tr></table></figure><h4 id="NodeWrapper"><a href="#NodeWrapper" class="headerlink" title="NodeWrapper"></a>NodeWrapper</h4><p><code>addNode</code> 和 <code>createNode</code> 代码类似,需要有输入边和输出边这两个参数,因此相比 <code>createEdge</code> 麻烦一些,多了下面的内容:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">EdgeWrapper *input_wrapper = <span class="built_in">findEdgeWrapper</span>(edge_repository_, input);</span><br><span class="line"><span class="keyword">if</span> (input_wrapper == <span class="literal">nullptr</span>) {</span><br><span class="line"> input_wrapper = <span class="keyword">this</span>-><span class="built_in">addEdge</span>(input);</span><br><span class="line">}</span><br><span class="line">input_wrapper->consumers_.<span class="built_in">emplace_back</span>(node_wrapper);</span><br><span class="line">EdgeWrapper *output_wrapper = <span class="built_in">findEdgeWrapper</span>(edge_repository_, output);</span><br><span class="line"><span class="keyword">if</span> (output_wrapper == <span class="literal">nullptr</span>) {</span><br><span class="line"> output_wrapper = <span class="keyword">this</span>-><span class="built_in">addEdge</span>(output);</span><br><span class="line">}</span><br><span class="line">output_wrapper->producers_.<span class="built_in">emplace_back</span>(node_wrapper);</span><br><span class="line"></span><br><span class="line">node_repository_.<span class="built_in">emplace_back</span>(node_wrapper);</span><br></pre></td></tr></table></figure><p>首先调用 <code>findEdgeWrapper</code> 找到输入边的 <code>wrapper</code>,如果边不在 <code>graph</code> 就添加进来。输入边的 <code>consumers_</code> 需要添加这个 <code>node</code>;同理,对于输出边的 <code>produces_</code> 也需要添加这个 <code>node</code>。不过需要注意的是,允许有多条边的 <code>consumers_</code> 是同一个节点,允许一个节点是多条边的 <code>produces_</code>。</p><p><img data-src="https://s21.ax1x.com/2024/12/26/pAvDIPJ.png" alt></p><p><code>NodeWrapper</code> 代码如下:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">NNDEPLOY_CC_API</span> NodeWrapper {</span><br><span class="line"> <span class="keyword">public</span>:</span><br><span class="line"> <span class="type">bool</span> is_external_;</span><br><span class="line"> Node *node_;</span><br><span class="line"> std::string name_;</span><br><span class="line"> std::vector<NodeWrapper *> predecessors_;</span><br><span class="line"> std::vector<NodeWrapper *> successors_;</span><br><span class="line"> base::NodeColorType color_ = base::kNodeColorWhite;</span><br><span class="line">};</span><br></pre></td></tr></table></figure><p>推测其中的 <code>predecessors_</code> 和 <code>successors_</code> 对应 <code>EdgeWrapper</code> 的 <code>consumers_</code> 和 <code>produces_</code>。</p><p>在看 <code>createNode</code> 代码的时候发现了未知代码 <code>std::initializer_list</code>,<a href="https://www.geeksforgeeks.org/std-initializer_list-in-cpp-11/">学习</a>了一下,粗浅理解为轻量的迭代同类型对象的类模板。</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">Node *<span class="title">Graph::createNode</span><span class="params">(<span class="type">const</span> std::string &name,</span></span></span><br><span class="line"><span class="params"><span class="function"> std::initializer_list<Edge *> inputs,</span></span></span><br><span class="line"><span class="params"><span class="function"> std::initializer_list<Edge *> outputs, Args &...args)</span></span></span><br></pre></td></tr></table></figure><h2 id="计算图初始化"><a href="#计算图初始化" class="headerlink" title="计算图初始化"></a>计算图初始化</h2><p>之后就是计算图的初始化、执行和释放。之前的代码难度还 <code>OK</code>,到了这里感觉代码难度飞升。调用 <code>status = graph->init();</code> 时完成计算图的初始化,看一下初始化了哪些内容。</p><p>首先是 <code>this->construct();</code> 函数检查 <code>graph</code> 的 <code>node, edge</code> 是否为空,并检查 <code>edge_wrapper</code> 的生产者和消费者是否为空。如果这些都是空的话,说明创建的计算图有问题。</p><h3 id="Node-处理"><a href="#Node-处理" class="headerlink" title="Node 处理"></a>Node 处理</h3><p>而后是 <code>Node</code> 节点的处理,首先为 <code>Node</code> 设置基础的信息:运行方式,是否计时等。</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">node-><span class="built_in">setDebugFlag</span>(is_debug_);</span><br><span class="line">node-><span class="built_in">setTimeProfileFlag</span>(is_time_profile_);</span><br><span class="line">node-><span class="built_in">setParallelType</span>(parallel_type_);</span><br><span class="line">node-><span class="built_in">setInnerFlag</span>(<span class="literal">true</span>);</span><br></pre></td></tr></table></figure><p>而后来看下面的代码:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line">std::vector<Edge *> inputs = node-><span class="built_in">getAllInput</span>();</span><br><span class="line"><span class="keyword">for</span> (<span class="keyword">auto</span> input : inputs) {</span><br><span class="line"> EdgeWrapper *input_wrapper = <span class="built_in">findEdgeWrapper</span>(edge_repository_, input);</span><br><span class="line"> <span class="built_in">NNDEPLOY_CHECK_PARAM_NULL_RET_STATUS</span>(input_wrapper,</span><br><span class="line"> <span class="string">"input_wrapper is null!"</span>);</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">auto</span> producer : input_wrapper->producers_) {</span><br><span class="line"> <span class="built_in">insertUnique</span>(node_wrapper->predecessors_, producer);</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line">std::vector<Edge *> outputs = node-><span class="built_in">getAllOutput</span>();</span><br><span class="line"><span class="keyword">for</span> (<span class="keyword">auto</span> output : outputs) {</span><br><span class="line"> EdgeWrapper *output_wrapper = <span class="built_in">findEdgeWrapper</span>(edge_repository_, output);</span><br><span class="line"> <span class="built_in">NNDEPLOY_CHECK_PARAM_NULL_RET_STATUS</span>(output_wrapper,</span><br><span class="line"> <span class="string">"output_wrapper is null!"</span>);</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">auto</span> consumer : output_wrapper->consumers_) {</span><br><span class="line"> <span class="built_in">insertUnique</span>(node_wrapper->successors_, consumer);</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>首先调用 <code>getAllInput</code> 方法获取 <code>Node</code> 节点的输入,以 <code>Infer</code> 节点为例,对于 <code>graph->createInfer<model::Infer></code> 这个 <code>Infer</code> 节点,调用 <code>getAllInput</code> 会调用 <code>Node</code> 类的方法得到输入 <code>inputs_</code>:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">std::vector<Edge *> <span class="title">Node::getAllInput</span><span class="params">()</span> </span>{ <span class="keyword">return</span> inputs_; }</span><br></pre></td></tr></table></figure><p>而 <code>inputs_</code> 是在创建子类时由子类的构造函数的参数决定的,看下 <code>infer</code> 类的构造函数:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">Infer::<span class="built_in">Infer</span>(<span class="type">const</span> std::string &name, base::InferenceType type,</span><br><span class="line"> std::initializer_list<dag::Edge *> inputs,</span><br><span class="line"> std::initializer_list<dag::Edge *> outputs)</span><br><span class="line"> : dag::<span class="built_in">Node</span>(name, inputs, outputs)</span><br></pre></td></tr></table></figure><p>调用了父类 <code>dag::Node</code> 的构造函数:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">Node::<span class="built_in">Node</span>(<span class="type">const</span> std::string &name, std::vector<Edge *> inputs,</span><br><span class="line"> std::vector<Edge *> outputs)</span><br><span class="line"> : <span class="built_in">name_</span>(name) {</span><br><span class="line"> device_type_ = device::<span class="built_in">getDefaultHostDeviceType</span>();</span><br><span class="line"> inputs_ = inputs;</span><br><span class="line"> outputs_ = outputs;</span><br><span class="line"> constructed_ = <span class="literal">true</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>也就是在父类的构造函数中,指定了 <code>inputs_</code> 是输入边。获取节点的输入边后,获取边的 <code>producers_</code>。也就是在当前节点的 <code>predecessors</code> 中添加指向当前节点的节点。之后的处理同理,在当前节点的 <code>successors</code> 中添加当前节点指向的节点。说起来有点乱,看图吧:</p><p><img data-src="https://s21.ax1x.com/2024/12/26/pAvDoG9.png" alt></p><p>对于黄色节点而言,蓝色节点是 <code>predecessors</code>,绿色节点是 <code>successors</code>。</p><h3 id="Edge-处理"><a href="#Edge-处理" class="headerlink" title="Edge 处理"></a>Edge 处理</h3><p>处理节点之后开始处理边:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">for</span> (<span class="keyword">auto</span> edge_wrapper : edge_repository_) {</span><br><span class="line"> std::vector<Node *> producers;</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">auto</span> producer : edge_wrapper->producers_) {</span><br><span class="line"> producers.<span class="built_in">emplace_back</span>(producer->node_);</span><br><span class="line"> }</span><br><span class="line"> std::vector<Node *> consumers;</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">auto</span> consumer : edge_wrapper->consumers_) {</span><br><span class="line"> consumers.<span class="built_in">emplace_back</span>(consumer->node_);</span><br><span class="line"> }</span><br><span class="line"> base::Status status = edge_wrapper->edge_-><span class="built_in">setParallelType</span>(parallel_type);</span><br><span class="line"> <span class="built_in">NNDEPLOY_RETURN_ON_NEQ</span>(status, base::kStatusCodeOk,</span><br><span class="line"> <span class="string">"setParallelType failed!"</span>);</span><br><span class="line"> <span class="comment">// 必须在abstract_edge管理该字段</span></span><br><span class="line"> status = edge_wrapper->edge_-><span class="built_in">increaseProducers</span>(producers);</span><br><span class="line"> <span class="built_in">NNDEPLOY_RETURN_ON_NEQ</span>(status, base::kStatusCodeOk,</span><br><span class="line"> <span class="string">"increaseProducers failed!"</span>);</span><br><span class="line"> status = edge_wrapper->edge_-><span class="built_in">increaseConsumers</span>(consumers);</span><br><span class="line"> <span class="built_in">NNDEPLOY_RETURN_ON_NEQ</span>(status, base::kStatusCodeOk,</span><br><span class="line"> <span class="string">"increaseConsumers failed!"</span>);</span><br><span class="line"> status = edge_wrapper->edge_-><span class="built_in">construct</span>();</span><br><span class="line"> <span class="built_in">NNDEPLOY_RETURN_ON_NEQ</span>(status, base::kStatusCodeOk,</span><br><span class="line"> <span class="string">"construct edge failed!"</span>);</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>这里有个疑问,<code>std::vector<Node *> consumers</code> 使用了指针,那么如果 <code>abstract_edge</code> 修改了 <code>Node</code> 的内容,<code>edge_wrapper</code> 跟踪的 <code>Node</code> 的内容也会被修改。会不会有影响?</p><p>在 <code>setParallelType</code> 时,创建了 <code>abstract_edge</code>。这里感觉不太合适,函数的用途是创建 <code>abstract_edge</code>,而函数名确实设置并行方式。我还以为和 <code>node_wrapper</code> 的处理方式一样只是设置并行方式,找了半天才找到 <code>abstract_edge</code> 的创建藏在 <code>setParallelType</code> 方法中。而后由 <code>abstract_edge</code> 管理边的生产者和消费者,并调用 <code>abstract_edge</code> 的 <code>construct</code> 方法。</p><p>来看一下 <code>abstract_edge</code>,这个边的创建形式和前面讲过的 <code>createYoloV5Graph</code> 一样,由 <code>TypeEdgeRegister</code> 注册,支持 <code>FixedEdge</code>(串行、任务并行)和 <code>PipelineEdge</code>(流水并行)。</p><p>在 <code>PipelineEdge</code> 的 <code>construct</code> 方法中,会将消费者添加到数据包中,用于任务并行,当数据一到位,立马执行。来看一下这个方法:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line">std::list<PipelineDataPacket *> data_packets_;</span><br><span class="line"><span class="comment">// 每个消费者 消费 的数据包最新索引 与下面当前数据包的关系为该索引为其+1</span></span><br><span class="line">std::map<Node *, <span class="type">int</span>> to_consume_index_;</span><br><span class="line"><span class="comment">// 每个消费者 消费 的当前数据包</span></span><br><span class="line">std::map<Node *, PipelineDataPacket *> consuming_dp_;</span><br><span class="line"></span><br><span class="line"><span class="function">base::Status <span class="title">PipelineEdge::construct</span><span class="params">()</span> </span>{</span><br><span class="line"> consumers_size_ = consumers_.<span class="built_in">size</span>();</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">auto</span> iter : consumers_) {</span><br><span class="line"> <span class="keyword">if</span> (to_consume_index_.<span class="built_in">find</span>(iter) == to_consume_index_.<span class="built_in">end</span>()) {</span><br><span class="line"> to_consume_index_.<span class="built_in">insert</span>({iter, <span class="number">0</span>});</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (consuming_dp_.<span class="built_in">find</span>(iter) == consuming_dp_.<span class="built_in">end</span>()) {</span><br><span class="line"> consuming_dp_.<span class="built_in">insert</span>({iter, <span class="literal">nullptr</span>});</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> base::kStatusCodeOk;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h3 id="执行器初始化"><a href="#执行器初始化" class="headerlink" title="执行器初始化"></a>执行器初始化</h3><p>之后就是调用 <code>status = this->executor();</code>,根据用户传入的执行类型创建执行器 <code>executor_</code>,选择串行执行还是并行执行,并行执行又分为任务并行和流水线并行。这三个概念可以看项目的 <code>README</code>,里面有详细的解释:</p><ol><li><p>串行:按照模型部署的有向无环图的拓扑排序,依次执行每个节点。</p></li><li><p>流水线并行:在处理多帧的场景下,基于有向无环图的模型部署方式,可将前处理 <code>Node</code>、推理 <code>Node</code>、后处理 <code>Node</code> 绑定三个不同的线程,每个线程又可绑定不同的硬件设备下,从而三个 <code>Node</code> 可流水线并行处理。在多模型以及多硬件设备的的复杂场景下,更加可以发挥流水线并行的优势,从而可显著提高整体吞吐量。</p></li><li><p>任务并行:在多模型以及多硬件设备的的复杂场景下,基于有向无环图的模型部署方式,可充分挖掘模型部署中的并行性,缩短单次算法全流程运行耗时</p></li></ol><p>之后对执行器 <code>executor_</code> 进行初始化:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">status = executor_-><span class="built_in">init</span>(edge_repository_, node_repository_);</span><br></pre></td></tr></table></figure><p>接下来仔细看看这 3 个执行器吧。到目前为止,由 <code>graph->init()</code> 引发的代码还没看完。</p><h4 id="SequentialExecutor-初始化"><a href="#SequentialExecutor-初始化" class="headerlink" title="SequentialExecutor 初始化"></a>SequentialExecutor 初始化</h4><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">base::Status <span class="title">SequentialExecutor::init</span><span class="params">(</span></span></span><br><span class="line"><span class="params"><span class="function"> std::vector<EdgeWrapper *> &edge_repository,</span></span></span><br><span class="line"><span class="params"><span class="function"> std::vector<NodeWrapper *> &node_repository)</span> </span>{</span><br><span class="line"> base::Status status = <span class="built_in">topoSortDFS</span>(node_repository, topo_sort_node_);</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">auto</span> iter : topo_sort_node_) {</span><br><span class="line"> iter->node_-><span class="built_in">setInitializedFlag</span>(<span class="literal">false</span>);</span><br><span class="line"> status = iter->node_-><span class="built_in">init</span>();</span><br><span class="line"> <span class="keyword">if</span> (status != base::kStatusCodeOk) {</span><br><span class="line"> <span class="built_in">NNDEPLOY_LOGE</span>(<span class="string">"Node %s init failed\n"</span>, iter->node_-><span class="built_in">getName</span>().<span class="built_in">c_str</span>());</span><br><span class="line"> <span class="keyword">return</span> status;</span><br><span class="line"> }</span><br><span class="line"> iter->node_-><span class="built_in">setInitializedFlag</span>(<span class="literal">true</span>);</span><br><span class="line"> }</span><br><span class="line"> edge_repository_ = edge_repository;</span><br><span class="line"> <span class="keyword">return</span> status;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>调用 <code>topoSortDFS</code> 函数进行了拓扑排序,而后对所有拓扑后的 <code>Node</code> 进行初始化 <code>status = iter->node_->init();</code>。比如 <code>Node</code> 是 <code>Infer</code> 节点,就调用 <code>Infer</code> 节点的初始化,完成推理引擎的初始化。如果推理引擎是 <code>tensorrt</code>,就会调用 <code>base::Status TensorRtInference::init()</code>。</p><p>仔细看下 <code>topoSortDFS</code> 函数:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">base::Status <span class="title">topoSortDFS</span><span class="params">(std::vector<NodeWrapper *> &node_repository,</span></span></span><br><span class="line"><span class="params"><span class="function"> std::vector<NodeWrapper *> &topo_sort_node)</span> </span>{</span><br><span class="line"> base::Status status = base::kStatusCodeOk;</span><br><span class="line"> std::vector<NodeWrapper *> start_nodes = <span class="built_in">findStartNodes</span>(node_repository);</span><br><span class="line"> <span class="keyword">if</span> (start_nodes.<span class="built_in">empty</span>()) {</span><br><span class="line"> <span class="built_in">NNDEPLOY_LOGE</span>(<span class="string">"No start node found in graph"</span>);</span><br><span class="line"> <span class="keyword">return</span> base::kStatusCodeErrorInvalidValue;</span><br><span class="line"> }</span><br><span class="line"> std::stack<NodeWrapper *> dst;</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">auto</span> node_wrapper : start_nodes) {</span><br><span class="line"> <span class="keyword">if</span> (node_wrapper->color_ == base::kNodeColorWhite) {</span><br><span class="line"> status = <span class="built_in">TopoSortDFSRecursive</span>(node_wrapper, dst);</span><br><span class="line"> } <span class="keyword">else</span> <span class="keyword">if</span> (node_wrapper->color_ == base::kNodeColorGray) {</span><br><span class="line"> <span class="built_in">NNDEPLOY_LOGE</span>(<span class="string">"Cycle detected in graph"</span>);</span><br><span class="line"> status = base::kStatusCodeErrorInvalidValue;</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="keyword">continue</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">while</span> (!dst.<span class="built_in">empty</span>()) {</span><br><span class="line"> topo_sort_node.<span class="built_in">emplace_back</span>(dst.<span class="built_in">top</span>());</span><br><span class="line"> dst.<span class="built_in">pop</span>();</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="built_in">checkUnuseNode</span>(node_repository);</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> base::kStatusCodeOk;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>看到了熟悉的数据结构和 <code>leetcode</code> 的味道,首先寻找根节点,没有 <code>predecessors_</code> 的就是根节点。其中的 <code>TopoSortDFSRecursive</code> 递归方法是<a href="https://leetcode.cn/problems/vEAB3K/solutions/1412180/er-fen-tu-by-leetcode-solution-dryu/">图染色算法</a>,也是经典数据结构和 <code>leetcode</code> 题,将白色的节点染成黑色,如果染色器件重复对灰色点染色,就说明计算图存在环路,报错退出。将拓扑排序后的节点放到 <code>topo_sort_node</code> 中。</p><h4 id="ParallelTaskExecutor-初始化"><a href="#ParallelTaskExecutor-初始化" class="headerlink" title="ParallelTaskExecutor 初始化"></a>ParallelTaskExecutor 初始化</h4><p>和 <code>kParallelTypeSequential</code> 相比,<code>DFS</code> 算法换成了 <code>BFS</code> 算法,这是因为 <a href="https://stackoverflow.com/questions/3332947/what-are-the-practical-factors-to-consider-when-choosing-between-depth-first-sea"><code>DFS</code> 和 <code>BFS</code></a> 算法得到的拓扑排序不同,后者适用于并行的情况,也就是由两个节点可以并行执行。将节点全部置回了白色,因为后面 <code>run</code> 的时候用来判断节点是否执行过,如果执行过,设置为黑色。</p><p>而且多了线程池的初始化,至于线程池,这个东西感觉没啥好讲的。网上很多线程池的代码,如果有兴趣,看看条件变量、互斥锁的用法,最多一天差不多能看完,我之前写过 C 版本的线程池,所以这里不在展开讲了。把他理解为一个任务执行器,可以同时执行很多任务并返回就可以了。</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">base::Status <span class="title">ParallelTaskExecutor::init</span><span class="params">(</span></span></span><br><span class="line"><span class="params"><span class="function"> std::vector<EdgeWrapper*>& edge_repository,</span></span></span><br><span class="line"><span class="params"><span class="function"> std::vector<NodeWrapper*>& node_repository)</span> </span>{</span><br><span class="line"> <span class="comment">// <span class="doctag">TODO:</span></span></span><br><span class="line"> <span class="comment">// 计算图的最大并行度,决定线程的数量</span></span><br><span class="line"> thread_pool_ = <span class="keyword">new</span> thread_pool::<span class="built_in">ThreadPool</span>();</span><br><span class="line"> thread_pool_-><span class="built_in">init</span>();</span><br><span class="line"> start_nodes_ = <span class="built_in">findStartNodes</span>(node_repository);</span><br><span class="line"> base::Status status = <span class="built_in">topoSortBFS</span>(node_repository, topo_sort_node_);</span><br><span class="line"> all_task_count_ = topo_sort_node_.<span class="built_in">size</span>();</span><br><span class="line"> <span class="keyword">if</span> (start_nodes_.<span class="built_in">empty</span>()) {</span><br><span class="line"> <span class="built_in">NNDEPLOY_LOGE</span>(<span class="string">"No start node found in graph"</span>);</span><br><span class="line"> <span class="keyword">return</span> base::kStatusCodeErrorInvalidValue;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">auto</span> iter : topo_sort_node_) {</span><br><span class="line"> iter->color_ = base::kNodeColorWhite;</span><br><span class="line"> iter->node_-><span class="built_in">setInitializedFlag</span>(<span class="literal">false</span>);</span><br><span class="line"> status = iter->node_-><span class="built_in">init</span>();</span><br><span class="line"> <span class="built_in">NNDEPLOY_RETURN_ON_NEQ</span>(status, base::kStatusCodeOk, <span class="string">"node init failure"</span>);</span><br><span class="line"> iter->node_-><span class="built_in">setInitializedFlag</span>(<span class="literal">true</span>);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> edge_repository_ = edge_repository;</span><br><span class="line"> <span class="keyword">return</span> status;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h4 id="ParallelPipelineExecutor-初始化"><a href="#ParallelPipelineExecutor-初始化" class="headerlink" title="ParallelPipelineExecutor 初始化"></a>ParallelPipelineExecutor 初始化</h4><p>除了初始化线程池外,还执行了 <code>this->commitThreadPool();</code>,直接提交任务开始执行。和串行、任务并行执行器最大的不同是:这个执行器在 <code>init()</code> 里运行,并没有 <code>run</code> 方法,所以准备放到后面执行器执行的时候在看了。</p><p><code>commitThreadPool</code> 方法里最重要的是 <code>updataInput</code>,会调用绝对边的 <code>update</code> 方法,由于流水线并行的执行器只能用 <code>PipelineEdge</code>,看一下这个类的 <code>update</code> 方法:</p><h3 id="推理引擎初始化"><a href="#推理引擎初始化" class="headerlink" title="推理引擎初始化"></a>推理引擎初始化</h3><p>推理引擎也是一种 <code>Node</code>,会在执行器初始化 <code>Node</code> 的时候初始化推理引擎。不过这个 <code>Node</code> 的初始化相比之下比较重要,所以重点看一下。</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">base::Status <span class="title">Infer::init</span><span class="params">()</span> </span>{</span><br><span class="line"> base::Status status = base::kStatusCodeOk;</span><br><span class="line"> status = inference_-><span class="built_in">init</span>();</span><br><span class="line"> <span class="built_in">NNDEPLOY_RETURN_ON_NEQ</span>(status, base::kStatusCodeOk,</span><br><span class="line"> <span class="string">"abstract_inference init failed"</span>);</span><br><span class="line"> is_input_dynamic_ = inference_-><span class="built_in">isInputDynamic</span>();</span><br><span class="line"> is_output_dynamic_ = inference_-><span class="built_in">isOutputDynamic</span>();</span><br><span class="line"> can_op_input_ = inference_-><span class="built_in">canOpInput</span>();</span><br><span class="line"> can_op_output_ = inference_-><span class="built_in">canOpOutput</span>();</span><br><span class="line"> <span class="keyword">return</span> status;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>首先是调用 <code>inference_</code> 的初始化,也就是调用 <code>TensorRtInference::init()</code> 方法。前面这一坨代码仿佛在初始化模型(我没用过任何推理引擎,智能猜代码啥意思了):</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line">TensorRtInferenceParam *tensorrt_inference_param =</span><br><span class="line"> <span class="built_in">dynamic_cast</span><TensorRtInferenceParam *>(inference_param_);</span><br><span class="line"><span class="keyword">if</span> (tensorrt_inference_param->is_path_) {</span><br><span class="line"> model_buffer = base::<span class="built_in">openFile</span>(tensorrt_inference_param->model_value_[<span class="number">0</span>]);</span><br><span class="line">} <span class="keyword">else</span> {</span><br><span class="line"> model_buffer = tensorrt_inference_param->model_value_[<span class="number">0</span>];</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> (tensorrt_inference_param->model_type_ == base::kModelTypeOnnx) {</span><br><span class="line"> status = <span class="built_in">initWithOnnxModel</span>(model_buffer, tensorrt_inference_param);</span><br><span class="line"> <span class="built_in">NNDEPLOY_RETURN_ON_NEQ</span>(status, base::kStatusCodeOk,</span><br><span class="line"> <span class="string">"initWithOnnxModel failed"</span>);</span><br><span class="line">} <span class="keyword">else</span> <span class="keyword">if</span> (tensorrt_inference_param->model_type_ ==</span><br><span class="line"> base::kModelTypeTensorRt) {</span><br><span class="line"> status = <span class="built_in">initWithTensorRtModel</span>(model_buffer, tensorrt_inference_param);</span><br><span class="line"> <span class="built_in">NNDEPLOY_RETURN_ON_NEQ</span>(status, base::kStatusCodeOk,</span><br><span class="line"> <span class="string">"initWithTensorRtModel failed"</span>);</span><br><span class="line">} <span class="keyword">else</span> {</span><br><span class="line"> <span class="built_in">NNDEPLOY_LOGE</span>(<span class="string">"not support this model type(%d)!\n"</span>,</span><br><span class="line"> tensorrt_inference_param->model_type_);</span><br><span class="line"> <span class="keyword">return</span> base::kStatusCodeErrorInferenceTensorRt;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>之后是通过绑定,来获取模型输入、输出、中间缓存的绑定数,来准确的分配内存。这里可以通过名字找索引,也可以通过索引找名字。</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">for</span> (<span class="keyword">auto</span> i = <span class="number">0</span>; i < <span class="built_in">getNbBindings</span>(); ++i) {</span><br><span class="line"> std::string name = std::<span class="built_in">string</span>(<span class="built_in">getBindingName</span>(i));</span><br><span class="line"> io_name_index_[name] = i;</span><br><span class="line"> io_index_name_[i] = name;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>获取模型输入的名字和 <code>shape</code>:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">for</span> (<span class="keyword">auto</span> i = <span class="number">0</span>; i < <span class="built_in">getNbBindings</span>(); ++i) {</span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">bindingIsInput</span>(i)) {</span><br><span class="line"> std::string name = std::<span class="built_in">string</span>(<span class="built_in">getBindingName</span>(i));</span><br><span class="line"> <span class="keyword">auto</span> shape = TensorRtConvert::<span class="built_in">convertToShape</span>(<span class="built_in">getBindingDimensions</span>(i));</span><br><span class="line"> current_shape.<span class="built_in">insert</span>({name, shape});</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>下面的代码我觉得 <code>max_shape_</code> 为空,因为没看到在哪创建的,所以不会进入循环:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">for</span> (<span class="keyword">auto</span> iter : tensorrt_inference_param->max_shape_) {</span><br><span class="line"> <span class="keyword">auto</span> tmp = current_shape.<span class="built_in">find</span>(iter.first);</span><br><span class="line"> <span class="keyword">if</span> (tmp != current_shape.<span class="built_in">end</span>()) {</span><br><span class="line"> <span class="keyword">auto</span> &shape = current_shape[iter.first];</span><br><span class="line"> <span class="keyword">if</span> (base::<span class="built_in">shapeEqual</span>(iter.second, shape)) {</span><br><span class="line"> <span class="keyword">continue</span>;</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="type">int</span> idx = io_name_index_[iter.first];</span><br><span class="line"> nvinfer1::Dims dims = TensorRtConvert::<span class="built_in">convertFromShape</span>(iter.second);</span><br><span class="line"> <span class="built_in">setBindingDimensions</span>(idx, dims);</span><br><span class="line"> }</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="built_in">NNDEPLOY_LOGE</span>(<span class="string">"reshape failed, not found input tensor(%s)!\n"</span>,</span><br><span class="line"> iter.first.<span class="built_in">c_str</span>());</span><br><span class="line"> <span class="keyword">return</span> base::kStatusCodeErrorInferenceTensorRt;</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>之后是获取设备:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">device::Device *device = device::<span class="built_in">getDevice</span>(inference_param_->device_type_);</span><br></pre></td></tr></table></figure><p>获取设备的时候会注册一个 <code>CudaArchitecture</code>,也就是一个管理 <code>cuda</code> 设备的类。后续的代码是:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">for</span> (<span class="keyword">auto</span> i = <span class="number">0</span>; i < num_binds; ++i) {</span><br><span class="line"> std::string name = std::<span class="built_in">string</span>(<span class="built_in">getBindingName</span>(i));</span><br><span class="line"> base::IntVector shape =</span><br><span class="line"> TensorRtConvert::<span class="built_in">convertToShape</span>(<span class="built_in">getBindingDimensions</span>(i));</span><br><span class="line"> base::DataType data_type =</span><br><span class="line"> TensorRtConvert::<span class="built_in">convertToDataType</span>(<span class="built_in">getBindingDataType</span>(i));</span><br><span class="line"> base::DataFormat data_format =</span><br><span class="line"> TensorRtConvert::<span class="built_in">convertToDataFormat</span>(<span class="built_in">getBindingFormat</span>(i));</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">bindingIsInput</span>(i)) {</span><br><span class="line"> device::TensorDesc desc;</span><br><span class="line"> desc.data_type_ = data_type;</span><br><span class="line"> desc.data_format_ = data_format;</span><br><span class="line"> desc.shape_ = shape;</span><br><span class="line"> device::Tensor *max_input_tensor = <span class="keyword">new</span> device::<span class="built_in">Tensor</span>(device, desc, name);</span><br><span class="line"> max_input_tensors_.<span class="built_in">insert</span>({name, max_input_tensor});</span><br><span class="line"></span><br><span class="line"> device::Buffer *max_input_buffer = max_input_tensor-><span class="built_in">getBuffer</span>();</span><br><span class="line"> device::Tensor *current_input_tensor =</span><br><span class="line"> <span class="keyword">new</span> device::<span class="built_in">Tensor</span>(desc, max_input_buffer, name);</span><br><span class="line"> input_tensors_.<span class="built_in">insert</span>({name, current_input_tensor});</span><br><span class="line"></span><br><span class="line"> <span class="comment">// bindings_[i] = max_input_buffer->getData();</span></span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> device::TensorDesc desc;</span><br><span class="line"> desc.data_type_ = data_type;</span><br><span class="line"> desc.data_format_ = data_format;</span><br><span class="line"> desc.shape_ = shape;</span><br><span class="line"> device::Tensor *max_output_tensor =</span><br><span class="line"> <span class="keyword">new</span> device::<span class="built_in">Tensor</span>(device, desc, name);</span><br><span class="line"> max_output_tensors_.<span class="built_in">insert</span>({name, max_output_tensor});</span><br><span class="line"></span><br><span class="line"> device::Buffer *max_output_buffer = max_output_tensor-><span class="built_in">getBuffer</span>();</span><br><span class="line"> device::Tensor *current_output_tensor =</span><br><span class="line"> <span class="keyword">new</span> device::<span class="built_in">Tensor</span>(desc, max_output_buffer, name);</span><br><span class="line"> output_tensors_.<span class="built_in">insert</span>({name, current_output_tensor});</span><br><span class="line"></span><br><span class="line"> <span class="comment">// bindings_[i] = max_output_buffer->getData();</span></span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><ol><li>获取绑定的 <code>shape</code>,类型和格式</li><li>如果是输入,创建对应的 <code>tensor</code>,存入 <code>input_tensors_</code> 和 <code>max_input_tensors_</code>,为什么存两次存疑</li><li>如果不是输入,就存到 <code>max_output_tensors_</code> 和 <code>output_tensors_</code></li></ol><p>额外的,<code>TensorDesc</code> 用于描述内存中的数据,而 <code>tensor</code> 用 <code>buffer_</code> 管理申请的 <code>Buffer</code>,具体看一下内存的申请,<code>new device::Tensor(device, desc, name)</code> 会调用:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">Tensor::<span class="built_in">Tensor</span>(Device *device, <span class="type">const</span> TensorDesc &desc, <span class="type">const</span> std::string &name,</span><br><span class="line"> <span class="type">const</span> base::IntVector &config)</span><br><span class="line"> : <span class="built_in">name_</span>(name), <span class="built_in">desc_</span>(desc), <span class="built_in">is_external_</span>(<span class="literal">false</span>) {</span><br><span class="line"> BufferDesc buffer_desc = device-><span class="built_in">toBufferDesc</span>(desc, config);</span><br><span class="line"> <span class="type">void</span> *ptr = device-><span class="built_in">allocate</span>(buffer_desc);</span><br><span class="line"> buffer_ = <span class="keyword">new</span> <span class="built_in">Buffer</span>(device, buffer_desc, ptr, base::kMemoryTypeAllocate);</span><br><span class="line"> ref_count_ = <span class="keyword">new</span> <span class="built_in">int</span>(<span class="number">1</span>);</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p><code>device->toBufferDesc(desc, config)</code> 用于获取内存的大小,<code>device->allocate(buffer_desc);</code> 会根据大小调用 <code>cudaMalloc</code> 申请内存:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">void</span> *<span class="title">CudaDevice::allocate</span><span class="params">(<span class="type">const</span> BufferDesc &desc)</span> </span>{</span><br><span class="line"> <span class="type">void</span> *data = <span class="literal">nullptr</span>;</span><br><span class="line"> cudaError_t status = <span class="built_in">cudaMalloc</span>(&data, desc.size_[<span class="number">0</span>]);</span><br><span class="line"> <span class="keyword">if</span> (cudaSuccess != status) {</span><br><span class="line"> <span class="built_in">NNDEPLOY_LOGE</span>(<span class="string">"cuda alloc failed with size %lu for %p, status:%d\n"</span>,</span><br><span class="line"> desc.size_[<span class="number">0</span>], data, status);</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nullptr</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (data == <span class="literal">nullptr</span>) {</span><br><span class="line"> <span class="built_in">NNDEPLOY_LOGE</span>(<span class="string">"cuda alloc got nullptr\n"</span>);</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nullptr</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> data;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>最后调用 <code>new Buffer</code> 使用 <code>Buffer</code> 这个类管理申请到的内存。</p><p>总结:获取输入输出的名字、尺寸、数据类型,并创建对应的内存 <code>buffer</code>。</p><h2 id="计算图执行"><a href="#计算图执行" class="headerlink" title="计算图执行"></a>计算图执行</h2><p>对应代码中的 <code>graph->run()</code>,具体也就是调用执行器的 <code>run</code> 方法:<code>status = executor_->run();</code>。</p><h3 id="kParallelTypeSequential-执行"><a href="#kParallelTypeSequential-执行" class="headerlink" title="kParallelTypeSequential 执行"></a>kParallelTypeSequential 执行</h3><p>暗自庆幸一下这是最简单的一个:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">base::Status <span class="title">SequentialExecutor::run</span><span class="params">()</span> </span>{</span><br><span class="line"> base::Status status = base::kStatusCodeOk;</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">auto</span> iter : topo_sort_node_) {</span><br><span class="line"> base::EdgeUpdateFlag edge_update_flag = iter->node_-><span class="built_in">updataInput</span>();</span><br><span class="line"> <span class="keyword">if</span> (edge_update_flag == base::kEdgeUpdateFlagComplete) {</span><br><span class="line"> iter->node_-><span class="built_in">setRunningFlag</span>(<span class="literal">true</span>);</span><br><span class="line"> status = iter->node_-><span class="built_in">run</span>();</span><br><span class="line"> <span class="built_in">NNDEPLOY_RETURN_ON_NEQ</span>(status, base::kStatusCodeOk,</span><br><span class="line"> <span class="string">"node execute failed!\n"</span>);</span><br><span class="line"> iter->node_-><span class="built_in">setRunningFlag</span>(<span class="literal">false</span>);</span><br><span class="line"> } <span class="keyword">else</span> <span class="keyword">if</span> (edge_update_flag == base::kEdgeUpdateFlagTerminate) {</span><br><span class="line"> ;</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="built_in">NNDEPLOY_LOGE</span>(<span class="string">"Failed to node[%s] updataInput();\n"</span>,</span><br><span class="line"> iter->node_-><span class="built_in">getName</span>().<span class="built_in">c_str</span>());</span><br><span class="line"> <span class="keyword">return</span> base::kStatusCodeErrorDag;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> status;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p><code>iter->node_->updataInput();</code> 会调用 <code>abstact_edge_->update(node);</code>,由于此时计算图还没有执行完毕,<code>FixedEdge</code> 会将节点设置为 <code>kEdgeUpdateFlagComplete</code>。之后就是调用 <code>node</code> 的 <code>run</code> 方法。对于 <code>infer</code> 节点,如果推理引擎是 <code>tensorrt</code>,就会调用 <code>TensorRtInference::run()</code> 方法。</p><h4 id="实例说明,目标检测的节点运行与数据"><a href="#实例说明,目标检测的节点运行与数据" class="headerlink" title="实例说明,目标检测的节点运行与数据"></a>实例说明,目标检测的节点运行与数据</h4><h5 id="解码节点运行"><a href="#解码节点运行" class="headerlink" title="解码节点运行"></a>解码节点运行</h5><p>以目标检测的计算图为例,一共有 3 个节点:<code>CvtColorResize</code>,<code>Infer</code> 和 <code>YoloPostProcess</code>。还记得在 <code>CvtColorResize</code> 节点前面还有一个解码节点吗?以单张图像的目标检测为例,会读取图像并创建 <code>mat</code>,并将 <code>mat</code> 放到输出边中,完成数据的传递:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">base::Status <span class="title">OpenCvImageDecodeNode::run</span><span class="params">()</span> </span>{</span><br><span class="line"> cv::Mat *mat = <span class="keyword">new</span> cv::<span class="built_in">Mat</span>(cv::<span class="built_in">imread</span>(path_));</span><br><span class="line"> width_ = mat->cols;</span><br><span class="line"> height_ = mat->rows;</span><br><span class="line"> outputs_[<span class="number">0</span>]-><span class="built_in">set</span>(mat, index_, <span class="literal">false</span>);</span><br><span class="line"> index_++;</span><br><span class="line"> <span class="keyword">return</span> base::kStatusCodeOk;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>不知道为啥一定要用 <code>outputs_[0]</code>,如果有多天输出边呢?其中的 <code>set</code> 方法对应:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">base::Status <span class="title">DataPacket::set</span><span class="params">(<span class="type">void</span> *anything, <span class="type">int</span> index, <span class="type">bool</span> is_external)</span> </span>{</span><br><span class="line"> base::Status status = base::kStatusCodeOk;</span><br><span class="line"> <span class="keyword">if</span> (anything != anything_) {</span><br><span class="line"> <span class="built_in">destory</span>();</span><br><span class="line"> }</span><br><span class="line"> is_external_ = is_external;</span><br><span class="line"> index_ = index;</span><br><span class="line"> flag_ = kFlagVoid;</span><br><span class="line"> written_ = <span class="literal">true</span>;</span><br><span class="line"> anything_ = anything;</span><br><span class="line"> <span class="keyword">return</span> status;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h5 id="CvtColorResize-运行"><a href="#CvtColorResize-运行" class="headerlink" title="CvtColorResize 运行"></a>CvtColorResize 运行</h5><p>由于 <code>CvtColorResize</code> 的输入是解码节点的输出,所以可以直接在 <code>CvtColorResize</code> 的 <code>run</code> 方法中拿到解码节点的输出:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">cv::Mat *src = inputs_[<span class="number">0</span>]-><span class="built_in">getCvMat</span>(<span class="keyword">this</span>);</span><br></pre></td></tr></table></figure><p>而后获取 <code>host</code> 端的设备:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">device::Device *device = device::<span class="built_in">getDefaultHostDevice</span>();</span><br></pre></td></tr></table></figure><p>根据输入的参数创建描述 <code>tensor</code> 的描述符 <code>desc</code>:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">device::TensorDesc desc;</span><br><span class="line">desc.data_type_ = tmp_param->data_type_;</span><br><span class="line">desc.data_format_ = tmp_param->data_format_;</span><br><span class="line"><span class="keyword">if</span> (desc.data_format_ == base::kDataFormatNCHW) {</span><br><span class="line"> desc.shape_ = {<span class="number">1</span>, <span class="built_in">getChannelByPixelType</span>(tmp_param->dst_pixel_type_),</span><br><span class="line"> tmp_param->h_, tmp_param->w_};</span><br><span class="line">} <span class="keyword">else</span> {</span><br><span class="line"> desc.shape_ = {<span class="number">1</span>, tmp_param->h_, tmp_param->w_,</span><br><span class="line"> <span class="built_in">getChannelByPixelType</span>(tmp_param->dst_pixel_type_)};</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>根据内存信息创建 <code>dst</code>,也就是这条边的输出:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">device::Tensor *dst =</span><br><span class="line"> outputs_[<span class="number">0</span>]-><span class="built_in">create</span>(device, desc, inputs_[<span class="number">0</span>]-><span class="built_in">getIndex</span>(<span class="keyword">this</span>));</span><br></pre></td></tr></table></figure><p>其中 <code>getIndex</code> 是获取当前节点的索引,由于解码节点在添加数据后进行了 <code>index++</code>,所以这里拿到的 <code>index</code> 实际为 1。至于其中的 <code>create</code> 方法就是创建这个节点的 <code>tensor</code> 输出,来看一下:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">device::Tensor *<span class="title">DataPacket::create</span><span class="params">(device::Device *device,</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">const</span> device::TensorDesc &desc, <span class="type">int</span> index,</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">const</span> std::string &name)</span> </span>{</span><br><span class="line"> base::Status status = base::kStatusCodeOk;</span><br><span class="line"> device::Tensor *tensor = <span class="literal">nullptr</span>;</span><br><span class="line"> <span class="keyword">if</span> (anything_ == <span class="literal">nullptr</span>) {</span><br><span class="line"> tensor = <span class="keyword">new</span> device::<span class="built_in">Tensor</span>(device, desc, name);</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="keyword">if</span> (flag_ != kFlagTensor) {</span><br><span class="line"> <span class="built_in">destory</span>();</span><br><span class="line"> tensor = <span class="keyword">new</span> device::<span class="built_in">Tensor</span>(device, desc, name);</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> tensor = (device::Tensor *)(anything_);</span><br><span class="line"> <span class="keyword">if</span> (tensor-><span class="built_in">getDesc</span>() != desc) {</span><br><span class="line"> <span class="built_in">destory</span>();</span><br><span class="line"> tensor = <span class="keyword">new</span> device::<span class="built_in">Tensor</span>(device, desc, name);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> is_external_ = <span class="literal">false</span>;</span><br><span class="line"> index_ = index;</span><br><span class="line"> flag_ = kFlagTensor;</span><br><span class="line"> written_ = <span class="literal">false</span>;</span><br><span class="line"> anything_ = (<span class="type">void</span> *)(tensor);</span><br><span class="line"> <span class="keyword">return</span> tensor;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p><code>anything</code> 指向了实际的数据。而后对输入进行颜色空间转换和 <code>resize</code> 操作,这个好像是数字图像处理的部分,比如将 <code>BGR</code> 的图转换为 <code>RGB</code> 的图,并 <code>resize</code> 到固定尺寸。</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line">cv::Mat tmp_cvt;</span><br><span class="line"><span class="keyword">if</span> (tmp_param->src_pixel_type_ != tmp_param->dst_pixel_type_) {</span><br><span class="line"> base::CvtColorType cvt_type = base::<span class="built_in">calCvtColorType</span>(</span><br><span class="line"> tmp_param->src_pixel_type_, tmp_param->dst_pixel_type_);</span><br><span class="line"> <span class="keyword">if</span> (cvt_type == base::kCvtColorTypeNotSupport) {</span><br><span class="line"> <span class="built_in">NNDEPLOY_LOGE</span>(<span class="string">"cvtColor type not support"</span>);</span><br><span class="line"> <span class="keyword">return</span> base::kStatusCodeErrorNotSupport;</span><br><span class="line"> }</span><br><span class="line"> <span class="type">int</span> cv_cvt_type = OpenCvConvert::<span class="built_in">convertFromCvtColorType</span>(cvt_type);</span><br><span class="line"> cv::<span class="built_in">cvtColor</span>(*src, tmp_cvt, cv_cvt_type);</span><br><span class="line">} <span class="keyword">else</span> {</span><br><span class="line"> tmp_cvt = *src;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line">cv::Mat tmp_resize;</span><br><span class="line"><span class="keyword">if</span> (tmp_param->interp_type_ != base::kInterpTypeNotSupport) {</span><br><span class="line"> <span class="type">int</span> interp_type =</span><br><span class="line"> OpenCvConvert::<span class="built_in">convertFromInterpType</span>(tmp_param->interp_type_);</span><br><span class="line"> cv::<span class="built_in">resize</span>(tmp_cvt, tmp_resize, cv::<span class="built_in">Size</span>(w, h), <span class="number">0.0</span>, <span class="number">0.0</span>, interp_type);</span><br><span class="line">} <span class="keyword">else</span> {</span><br><span class="line"> tmp_resize = tmp_cvt;</span><br><span class="line">}</span><br><span class="line">OpenCvConvert::<span class="built_in">convertToTensor</span>(tmp_resize, dst, tmp_param->normalize_,</span><br><span class="line"> tmp_param->scale_, tmp_param->mean_,</span><br><span class="line"> tmp_param->std_);</span><br></pre></td></tr></table></figure><p>然后是 <code>outputs_[0]->notifyWritten(dst);</code>,通知 <code>dst</code> 数据准备好了。</p><h5 id="infer-节点运行"><a href="#infer-节点运行" class="headerlink" title="infer 节点运行"></a>infer 节点运行</h5><p>获取所有输入的 <code>tensor</code> 和 <code>index</code>:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">for</span> (<span class="keyword">auto</span> input : inputs_) {</span><br><span class="line"> device::Tensor *tensor = input-><span class="built_in">getTensor</span>(<span class="keyword">this</span>);</span><br><span class="line"> tensors.<span class="built_in">emplace_back</span>(tensor);</span><br><span class="line"> <span class="type">int</span> index = input-><span class="built_in">getIndex</span>(<span class="keyword">this</span>);</span><br><span class="line"> indexs.<span class="built_in">emplace_back</span>(index);</span><br><span class="line">}</span><br><span class="line"><span class="type">int</span> index = indexs[<span class="number">0</span>];</span><br><span class="line"><span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">1</span>; i < indexs.<span class="built_in">size</span>(); i++) {</span><br><span class="line"> <span class="keyword">if</span> (index != indexs[i]) {</span><br><span class="line"> <span class="built_in">NNDEPLOY_LOGE</span>(<span class="string">"index not equal"</span>);</span><br><span class="line"> <span class="keyword">return</span> base::kStatusCodeErrorInvalidValue;</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p><code>getTensor</code> 对应的就是获取 <code>data_packet</code> 的 <code>anything</code>:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">device::Tensor *<span class="title">DataPacket::getTensor</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="keyword">if</span> (flag_ != kFlagTensor) {</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nullptr</span>;</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="built_in">return</span> (device::Tensor *)(anything_);</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>获取 <code>index</code> 也是数据包的 <code>index</code>,如果输入边的 <code>index</code> 不同,说明这个节点收到了错误的输入,需要报错退出。而后是为推理引擎设置输入:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">for</span> (<span class="keyword">auto</span> tensor : tensors) {</span><br><span class="line"> inference_-><span class="built_in">setInputTensor</span>(tensor-><span class="built_in">getName</span>(), tensor);</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function">base::Status <span class="title">Inference::setInputTensor</span><span class="params">(<span class="type">const</span> std::string &name,</span></span></span><br><span class="line"><span class="params"><span class="function"> device::Tensor *input_tensor)</span> </span>{</span><br><span class="line"> base::Status status = base::kStatusCodeOk;</span><br><span class="line"></span><br><span class="line"> std::string new_name = <span class="string">""</span>;</span><br><span class="line"> <span class="keyword">if</span> (!name.<span class="built_in">empty</span>()) {</span><br><span class="line"> new_name = name;</span><br><span class="line"> } <span class="keyword">else</span> <span class="keyword">if</span> (!input_tensor-><span class="built_in">getName</span>().<span class="built_in">empty</span>()) {</span><br><span class="line"> new_name = input_tensor-><span class="built_in">getName</span>();</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> new_name = <span class="built_in">getInputName</span>(<span class="number">0</span>);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (input_tensors_.<span class="built_in">count</span>(new_name) > <span class="number">0</span>) {</span><br><span class="line"> <span class="keyword">if</span> (input_tensor != input_tensors_[new_name]) {</span><br><span class="line"> external_input_tensors_[new_name] = input_tensor;</span><br><span class="line"> }</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="built_in">NNDEPLOY_LOGI</span>(<span class="string">"input_tensor name: %s not exist!\n"</span>, new_name.<span class="built_in">c_str</span>());</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> status;</span><br><span class="line"> }</span><br></pre></td></tr></table></figure><p>可以看到如果这个 <code>tensor</code> 没有在最初的 <code>input_tensors_</code> 中(初始化指定)时,视为外部的输入。在数据准备好后,推理引擎开始执行:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">status = inference_-><span class="built_in">run</span>();</span><br></pre></td></tr></table></figure><p>最后就是将推理引擎的输出设置到边上:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">for</span> (<span class="keyword">auto</span> output : outputs_) {</span><br><span class="line"> std::string name = output-><span class="built_in">getName</span>();</span><br><span class="line"> base::ParallelType parallel_type = output-><span class="built_in">getParallelType</span>();</span><br><span class="line"> <span class="type">bool</span> flag = parallel_type == base::kParallelTypePipeline;</span><br><span class="line"> device::Tensor *tensor =</span><br><span class="line"> inference_-><span class="built_in">getOutputTensorAfterRun</span>(name, device_type_, flag);</span><br><span class="line"> <span class="keyword">if</span> (tensor == <span class="literal">nullptr</span>) {</span><br><span class="line"> <span class="built_in">NNDEPLOY_LOGE</span>(<span class="string">"can't getOutputTensorAfterRun[%s].\n"</span>, name.<span class="built_in">c_str</span>());</span><br><span class="line"> status = base::kStatusCodeErrorInvalidParam;</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> output-><span class="built_in">set</span>(tensor, index, <span class="literal">false</span>);</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h5 id="post-节点运行"><a href="#post-节点运行" class="headerlink" title="post 节点运行"></a>post 节点运行</h5><p><code>YoloPostProcess</code> 这个节点的 <code>run</code> 方法比较简单,就是对输出的 <code>tensor</code> 进行 <code>nms</code> 和阈值筛选处理,不在过多解释。</p><h3 id="ParallelTaskExecutor-执行"><a href="#ParallelTaskExecutor-执行" class="headerlink" title="ParallelTaskExecutor 执行"></a>ParallelTaskExecutor 执行</h3><p>以所有的根节点 <code>start_nodes_</code> 为起点,并行的形式执行这个计算图。</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">for</span> (<span class="keyword">auto</span> iter : start_nodes_) {</span><br><span class="line"> <span class="built_in">process</span>(iter);</span><br><span class="line">}</span><br><span class="line"><span class="built_in">wait</span>();</span><br></pre></td></tr></table></figure><p>这个 <code>wait</code> 方法是以条件变量的形式等待完成任务的节点数大于等于总任务数,所以等待完成任务的节点数是<a href="https://en.cppreference.com/w/cpp/atomic/atomic">原子类型</a>的模板类:<code>std::atomic<int></code>。</p><p>至于 <code>process(iter)</code> 方法,除了执行节点外,还有 <code>afterNodeRun(node_wrapper)</code> 方法,重点来看一下:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line">completed_task_count_++;</span><br><span class="line">node_wrapper->color_ = base::kNodeColorBlack;</span><br><span class="line"><span class="keyword">for</span> (<span class="keyword">auto</span> successor : node_wrapper->successors_) {</span><br><span class="line"> <span class="type">bool</span> all_pre_done = <span class="literal">true</span>;</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">auto</span> iter : successor->predecessors_) {</span><br><span class="line"> all_pre_done &= (iter->color_ == base::kNodeColorBlack);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (all_pre_done && successor->color_ == base::kNodeColorWhite) {</span><br><span class="line"> <span class="keyword">if</span> (successor->predecessors_.<span class="built_in">size</span>() <= <span class="number">1</span>) {</span><br><span class="line"> <span class="built_in">process</span>(successor);</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="built_in">submitTaskSynchronized</span>(successor);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>首先是已完成的任务数 +1,而后将已完成节点的颜色设置为黑色。遍历这个节点的后继节点 <code>successors_</code>,对于后继节点而言,如果全部的前任节点都执行完毕,那么 <code>all_pre_done</code> 会为 <code>true</code>。此时进行判断:</p><ul><li>如果这个后继节点只有一个前任节点,那么调用 <code>process</code> 处理这个后继节点</li><li>如果这个后继节点有多个前任节点,那么已加锁的形式调用 <code>process</code> 处理这个后继节点,防止多线程环境下 <code>process</code> 函数操作 <code>node_wrapper</code> 这个临界资源。</li></ul><p>最后提交任务后,以</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">std::lock_guard<std::mutex> <span class="title">lock</span><span class="params">(main_lock_)</span></span>;</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> (completed_task_count_ >= all_task_count_) {</span><br><span class="line"> cv_.<span class="built_in">notify_one</span>();</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>的形式唤醒等待的主线程,也就是 <code>cv_.notify_one()</code> 那里。最终将节点的颜色改为白色。额外的一点点<a href="https://muyuuuu.github.io/2021/02/19/process-synchronization/">小知识</a>:</p><blockquote><p>使用条件变量时,在检查条件之前加锁,并在等待之前释放锁。因为唤醒的线程需要重新检查条件是否成立(因为可能会发生虚假唤醒)。如果不加锁,唤醒的线程可能会在其他线程修改条件之前就继续执行,导致逻辑错误。</p><p>虚假唤醒。这是一种能保证执行效率的方法。假设此时有10个线程处于等待中,在收到一个唤醒信号后,操作系统尝试去唤醒所有的线程,这会打破发送信号与唤醒之间一对一的关系。所以此时只能唤醒一个线程,而其余九个线程处于等待阶段。为了更灵活的处理这种情况,所以无论条件是否满足,操作系统允许等待中的线程自己醒来,称为虚假唤醒。</p></blockquote><h3 id="ParallelPipelineExecutor-执行"><a href="#ParallelPipelineExecutor-执行" class="headerlink" title="ParallelPipelineExecutor 执行"></a>ParallelPipelineExecutor 执行</h3><p>为了更好的读懂流水线并行的代码,简单写了份模拟的代码:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><iostream></span> <span class="comment">// std::cout</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><thread></span> <span class="comment">// std::thread</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><mutex></span> <span class="comment">// std::mutex, std::unique_lock</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><condition_variable></span> <span class="comment">// std::condition_variable</span></span></span><br><span class="line"></span><br><span class="line">std::mutex mtx;</span><br><span class="line">std::condition_variable cv;</span><br><span class="line"><span class="type">bool</span> ready = <span class="literal">false</span>;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">run</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="function">std::unique_lock<std::mutex> <span class="title">lck</span><span class="params">(mtx)</span></span>;</span><br><span class="line"> ready = <span class="literal">true</span>;</span><br><span class="line"> cv.<span class="built_in">notify_all</span>();</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">func1</span><span class="params">(<span class="type">int</span> id)</span> </span>{</span><br><span class="line"> <span class="function">std::unique_lock<std::mutex> <span class="title">lock</span><span class="params">(mtx)</span></span>;</span><br><span class="line"> <span class="keyword">while</span> (!ready) {</span><br><span class="line"> cv.<span class="built_in">wait</span>(lock);</span><br><span class="line"> }</span><br><span class="line"> std::cerr << <span class="string">"Node Run : thread id = "</span> << id << <span class="string">"\n"</span>;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">commit</span><span class="params">()</span> </span>{</span><br><span class="line"> std::thread threads[<span class="number">10</span>];</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">0</span>; i < <span class="number">10</span>; i++) {</span><br><span class="line"> threads[i] = std::<span class="built_in">thread</span>(func1, i); <span class="comment">// 模拟 update 方法</span></span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">run</span>(); <span class="comment">// node.run() 方法</span></span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">auto</span>& th : threads) {</span><br><span class="line"> th.<span class="built_in">join</span>();</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="built_in">commit</span>(); <span class="comment">// 模拟线程池提交任务</span></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><blockquote><p>一个线程等待『条件变量的条件成立』而挂起;另一个线程使『条件成立』。为了防止竞争,条件的检测是在互斥锁的保护下进行的,线程在改变条件状态前先要锁住互斥量。如果一个条件为假,则一个线程自动阻塞,该线程处于等待状态,并释放相关变量的互斥锁。如果另一个线程改变了条件,它将信号发送给关联的条件变量,唤醒一个或多个处于等待中的线程,使其重新获得互斥锁,重新评价条件。</p></blockquote><p>如果能看懂上面代码的话,再来看流水线并行的代码。和前两个执行器不同的是,<code>ParallelPipelineExecutor</code> 执行器在 <code>init</code> 的时候直接将所有节点提交到线程池开始执行:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">void</span> <span class="title">ParallelPipelineExecutor::commitThreadPool</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="comment">// NNDEPLOY_LOGE("ppe run Thread ID: %d.\n", std::this_thread::get_id());</span></span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">auto</span> iter : topo_sort_node_) {</span><br><span class="line"> <span class="keyword">auto</span> func = [iter]() -> base::Status {</span><br><span class="line"> base::Status status = base::kStatusCodeOk;</span><br><span class="line"> <span class="keyword">while</span> (<span class="literal">true</span>) {</span><br><span class="line"> base::EdgeUpdateFlag edge_update_flag = iter->node_-><span class="built_in">updataInput</span>();</span><br><span class="line"> <span class="keyword">if</span> (edge_update_flag == base::kEdgeUpdateFlagComplete) {</span><br><span class="line"> iter->node_-><span class="built_in">setRunningFlag</span>(<span class="literal">true</span>);</span><br><span class="line"> status = iter->node_-><span class="built_in">run</span>();</span><br><span class="line"> <span class="built_in">NNDEPLOY_RETURN_ON_NEQ</span>(status, base::kStatusCodeOk,</span><br><span class="line"> <span class="string">"node execute failed!\n"</span>);</span><br><span class="line"> iter->node_-><span class="built_in">setRunningFlag</span>(<span class="literal">false</span>);</span><br><span class="line"> } <span class="keyword">else</span> <span class="keyword">if</span> (edge_update_flag == base::kEdgeUpdateFlagTerminate) {</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="built_in">NNDEPLOY_LOGE</span>(<span class="string">"Failed to node[%s] updataInput();\n"</span>,</span><br><span class="line"> iter->node_-><span class="built_in">getName</span>().<span class="built_in">c_str</span>());</span><br><span class="line"> status = base::kStatusCodeErrorDag;</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> status;</span><br><span class="line"> };</span><br><span class="line"> thread_pool_-><span class="built_in">commit</span>(std::<span class="built_in">bind</span>(func));</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>上面最重要的是 <code>updataInput</code> 方法,会调用所有输入边的 <code>update</code> 方法:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">for</span> (<span class="keyword">auto</span> input : inputs_) {</span><br><span class="line"> flag = input-><span class="built_in">update</span>(<span class="keyword">this</span>);</span><br><span class="line"> <span class="keyword">if</span> (flag != base::kEdgeUpdateFlagComplete) {</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>由于 <code>decode</code> 节点没有输入边,所以会直接跳过这一环节执行 <code>run</code> 方法,会调用 <code>PipeEdeLine</code> 的 <code>set</code> 方法写入数据:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">base::Status <span class="title">PipelineEdge::set</span><span class="params">(<span class="type">void</span> *anything, <span class="type">int</span> index, <span class="type">bool</span> is_external)</span> </span>{</span><br><span class="line"> <span class="comment">// 上锁</span></span><br><span class="line"> <span class="function">std::lock_guard<std::mutex> <span class="title">lock</span><span class="params">(mutex_)</span></span>;</span><br><span class="line"> PipelineDataPacket *dp = <span class="keyword">new</span> <span class="built_in">PipelineDataPacket</span>(consumers_size_);</span><br><span class="line"> <span class="built_in">NNDEPLOY_CHECK_PARAM_NULL_RET_STATUS</span>(dp, <span class="string">"PipelineDataPacket is null.\n"</span>);</span><br><span class="line"></span><br><span class="line"> data_packets_.<span class="built_in">push_back</span>(dp);</span><br><span class="line"> cv_.<span class="built_in">notify_all</span>();</span><br><span class="line"> <span class="comment">// set</span></span><br><span class="line"> base::Status status = dp-><span class="built_in">set</span>(anything, index, is_external);</span><br><span class="line"> <span class="built_in">NNDEPLOY_RETURN_ON_NEQ</span>(status, base::kStatusCodeOk,</span><br><span class="line"> <span class="string">"PipelineDataPacket set error.\n"</span>);</span><br><span class="line"> <span class="keyword">return</span> status;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>总感觉这里先放数据,在唤醒比较合适。</p><p>解码节点执行完毕后,继续循环执行 <code>updataInput</code> 节点,也就是调用在 <code>PipeEdgeLine</code> 的 <code>update</code> 方法:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">cv_.<span class="built_in">wait</span>(lock, [<span class="keyword">this</span>, tmp_node] {</span><br><span class="line"> <span class="keyword">return</span> to_consume_index_[tmp_node] < data_packets_.<span class="built_in">size</span>() ||</span><br><span class="line"> terminate_flag_; <span class="comment">// 消费者需求的数据已存在,否则等待最新数据 ||</span></span><br><span class="line"> <span class="comment">// 数据被消耗结束</span></span><br><span class="line">});</span><br></pre></td></tr></table></figure><p>由于 <code>data_packets_</code> 中插入了数据,所以这个条件会满足继续向下执行。后续的代码我颅内 <code>debug</code> 了很长时间。</p><p>大概意思是:节点在一次执行后,将节点对应的索引自增。在第二次执行这个节点时,由于边管理的 <code>data_packet</code> 会有两条数据(前面的节点放进来的),所以这个节点需要找到对应的数据,删除不用的数据。</p><p>这个流水线并行实现了下图的效果:</p><p><img data-src="https://s21.ax1x.com/2024/12/26/pAvDT2R.png" alt></p><p>如果是 <code>yolo</code> 对视频进行目标检测,就会有多帧的输入图像。每帧是一个 <code>node</code> 节点,也就是有多个输入节点,一个推理节点,多个输出节点。</p><ul><li>第一帧前处理,第一帧推理,第一帧后处理</li><li>第二帧前处理,第二帧推理,第二帧后处理</li><li>…</li><li>第 N 帧前处理,第 N 帧推理,第 N 帧后处理</li></ul><p>这样就流水处理了起来。</p><h2 id="计算图释放"><a href="#计算图释放" class="headerlink" title="计算图释放"></a>计算图释放</h2><p>也就是最后的 <code>graph->deinit()</code>,还是对应执行器的释放。</p><h3 id="SequentialExecutor-释放"><a href="#SequentialExecutor-释放" class="headerlink" title="SequentialExecutor 释放"></a>SequentialExecutor 释放</h3><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">base::Status <span class="title">SequentialExecutor::deinit</span><span class="params">()</span> </span>{</span><br><span class="line"> base::Status status = base::kStatusCodeOk;</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">auto</span> iter : edge_repository_) {</span><br><span class="line"> <span class="type">bool</span> flag = iter->edge_-><span class="built_in">requestTerminate</span>();</span><br><span class="line"> <span class="keyword">if</span> (!flag) {</span><br><span class="line"> <span class="built_in">NNDEPLOY_LOGE</span>(<span class="string">"failed iter->edge_->requestTerminate()!\n"</span>);</span><br><span class="line"> <span class="keyword">return</span> base::kStatusCodeErrorDag;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">auto</span> iter : topo_sort_node_) {</span><br><span class="line"> status = iter->node_-><span class="built_in">deinit</span>();</span><br><span class="line"> <span class="built_in">NNDEPLOY_RETURN_ON_NEQ</span>(status, base::kStatusCodeOk,</span><br><span class="line"> <span class="string">"failed iter->node_->deinit()"</span>);</span><br><span class="line"> iter->node_-><span class="built_in">setInitializedFlag</span>(<span class="literal">false</span>);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> status;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>代码很容易看懂,通过 <code>requestTerminate</code> 将边设置为计算完毕,执行所有的节点的释放操作。如果是 <code>tensorrt</code> 的推理节点,将执行推理引擎的释放。</p><h3 id="ParallelTaskExecutor-释放"><a href="#ParallelTaskExecutor-释放" class="headerlink" title="ParallelTaskExecutor 释放"></a>ParallelTaskExecutor 释放</h3><p>相比 <code>SequentialExecutor</code>,多了一步释放线程池。</p><h3 id="ParallelPipelineExecutor-释放"><a href="#ParallelPipelineExecutor-释放" class="headerlink" title="ParallelPipelineExecutor 释放"></a>ParallelPipelineExecutor 释放</h3><p>同 <code>ParallelTaskExecutor</code> 释放。</p><p>在最后退出的时候,手动删除了图资源 <code>delete graph</code>,会调用图的析构函数,删除之前由 <code>new</code> 申请的 <code>node wrapper</code> 和 <code>edge wrapper</code>。</p><h3 id="推理引擎释放"><a href="#推理引擎释放" class="headerlink" title="推理引擎释放"></a>推理引擎释放</h3><p><code>SequentialExecutor</code>,<code>ParallelTaskExecutor</code>,<code>ParallelPipelineExecutor</code> 在释放的时候会释放节点,重点看下 <code>infer</code> 这个节点的释放,</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">base::Status <span class="title">Infer::deinit</span><span class="params">()</span> </span>{</span><br><span class="line"> base::Status status = base::kStatusCodeOk;</span><br><span class="line"> status = inference_-><span class="built_in">deinit</span>();</span><br><span class="line"> <span class="built_in">NNDEPLOY_RETURN_ON_NEQ</span>(status, base::kStatusCodeOk, <span class="string">"deinit failed"</span>);</span><br><span class="line"> <span class="keyword">return</span> status;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>也就是反初始化推理引擎,注意就是内存释放:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">base::Status <span class="title">TensorRtInference::deinit</span><span class="params">()</span> </span>{</span><br><span class="line"> base::Status status = base::kStatusCodeOk;</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">auto</span> iter : input_tensors_) {</span><br><span class="line"> <span class="keyword">delete</span> iter.second;</span><br><span class="line"> }</span><br><span class="line"> input_tensors_.<span class="built_in">clear</span>();</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">auto</span> iter : max_input_tensors_) {</span><br><span class="line"> <span class="keyword">delete</span> iter.second;</span><br><span class="line"> }</span><br><span class="line"> max_input_tensors_.<span class="built_in">clear</span>();</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">auto</span> iter : output_tensors_) {</span><br><span class="line"> <span class="keyword">delete</span> iter.second;</span><br><span class="line"> }</span><br><span class="line"> output_tensors_.<span class="built_in">clear</span>();</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">auto</span> iter : max_output_tensors_) {</span><br><span class="line"> <span class="keyword">delete</span> iter.second;</span><br><span class="line"> }</span><br><span class="line"> max_output_tensors_.<span class="built_in">clear</span>();</span><br><span class="line"> device::Device *device = device::<span class="built_in">getDevice</span>(inference_param_->device_type_);</span><br><span class="line"> <span class="keyword">if</span> (inner_forward_buffer_ != <span class="literal">nullptr</span>) {</span><br><span class="line"> device-><span class="built_in">deallocate</span>(inner_forward_buffer_);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> status;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p><code>delete iter.second</code> 会手动删除 <code>tensor</code>,也就是调用 <code>tensor</code> 类的析构函数,删除数据的 <code>buffer</code> 并清除引用计数。</p><h1 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h1><p>整个计算图的代码就梳理完了,对于 <code>README</code> 中提到的:上述模式的组合并行,我好像还不知道怎么组合。</p><p>目前对推理引擎完全未知,怎么写自己的高性能 <code>op</code> 还没了解到。</p><p>我想部署一个大模型!好像还差的很远。</p><ul><li><code>DAG</code> 的组织有点乱,而且有内存泄漏,内存这块并没有很好的管理</li><li><code>CvtColorResize</code> 的 <code>run</code> 为什么两次 notify ?</li><li><code>CvtColorResize</code> 用的是 <code>inputs_[0]</code>,<code>Infer</code> 需要遍历所有的 <code>inputs</code>,很迷惑</li></ul><p>需要实际编译运行一下看看了。</p>]]></content>
<summary type="html"><p>最近这半年实在是闲,秉承着下班少玩手机的目的,7 月开始学 <code>cuda</code>,8 9 月学了 <code>C++</code>,10 月懈怠了一个月,11 月学了 <code>cuda</code> 进阶,12 月我来祸害 <a href="https://github.com/nndeploy/nndeploy"><code>nndeploy</code></a> 了。</p>
<p>一来是学完 <code>C++</code> 后看下我能看懂的优秀的开源项目,一方面在之前学校训练模型感觉没意思,是看看 <code>AI</code> 的工程化。</p></summary>
<category term="AISystem" scheme="https://muyuuuu.github.io/tags/AISystem/"/>
</entry>
<entry>
<title>CUFX(CUDA Framework eXtended): CUDA 计算框架</title>
<link href="https://muyuuuu.github.io/2024/08/04/CUFX/"/>
<id>https://muyuuuu.github.io/2024/08/04/CUFX/</id>
<published>2024-08-04T06:31:33.000Z</published>
<updated>2024-08-04T13:19:17.948Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" class="aplayer-secondary-style-marker" href="/assets/css/APlayer.min.css"><script src="/assets/js/APlayer.min.js" class="aplayer-secondary-script-marker"></script><p>利用下班时间学完了 CUDA,Anyway 忙起来真的很大程度能缓解焦虑,能忘记和忽略很多烦恼。所以寻思着结合这一年来所学,写了一个简单的 CUDA 计算框架:<a href="https://github.com/muyuuuu/CUFX/tree/main/CUFX">CUFX</a>。</p><span id="more"></span><h1 id="目录结构"><a href="#目录结构" class="headerlink" title="目录结构"></a>目录结构</h1><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br></pre></td><td class="code"><pre><span class="line">├── CMakeLists.txt <span class="comment"># 根目录 CMakeLists.txt</span></span><br><span class="line">├── README.md</span><br><span class="line">├── build_run.sh <span class="comment"># 编译运行脚本</span></span><br><span class="line">├── inc <span class="comment"># 外部接口,外部可见</span></span><br><span class="line">│ ├── data_type <span class="comment"># 数据类型</span></span><br><span class="line">│ │ └── data_type.cuh</span><br><span class="line">│ ├── <span class="built_in">log</span> <span class="comment"># 日志声明</span></span><br><span class="line">│ │ └── log.cuh </span><br><span class="line">│ ├── matrix <span class="comment"># 矩阵声明</span></span><br><span class="line">│ │ └── matrix.cuh </span><br><span class="line">│ └── operator <span class="comment"># cuda 算子</span></span><br><span class="line">│ └── external.cuh</span><br><span class="line">├── priv <span class="comment"># 内部接口,不对外提供</span></span><br><span class="line">│ ├── runtime <span class="comment"># 运行时信息</span></span><br><span class="line">│ │ ├── inc</span><br><span class="line">│ │ │ └── runtime_info.cuh</span><br><span class="line">│ │ └── src</span><br><span class="line">│ │ └── runtime_info.cu</span><br><span class="line">│ └── time <span class="comment"># 计时函数</span></span><br><span class="line">│ └── inc</span><br><span class="line">│ └── clock.cuh</span><br><span class="line">├── src <span class="comment"># 实现代码目录</span></span><br><span class="line">│ ├── CMakeLists.txt</span><br><span class="line">│ ├── <span class="built_in">log</span> <span class="comment"># 日志函数实现</span></span><br><span class="line">│ │ └── log.cu</span><br><span class="line">│ ├── matrix <span class="comment"># 矩阵函数实现</span></span><br><span class="line">│ │ └── matrix.cu</span><br><span class="line">│ ├── reductsum <span class="comment"># 归约求和算子</span></span><br><span class="line">│ │ ├── inc</span><br><span class="line">│ │ └── src</span><br><span class="line">│ │ └── reduct_sum.cu</span><br><span class="line">│ └── transpose</span><br><span class="line">└── <span class="built_in">test</span> <span class="comment"># 测试目录</span></span><br><span class="line"> ├── CMakeLists.txt</span><br><span class="line"> ├── inc <span class="comment"># 测试模块代码</span></span><br><span class="line"> │ ├── compare.cuh</span><br><span class="line"> │ └── testcase.cuh</span><br><span class="line"> ├── main.cu <span class="comment"># 程序入口,执行测试样例</span></span><br><span class="line"> └── testcase <span class="comment"># 测试用例</span></span><br><span class="line"> └── reduct_sum_testcase.cu</span><br></pre></td></tr></table></figure><p>从上面的目录结构可以看出来这个计算框架分几大模块:</p><ul><li>inc 是声明的头文件,包括数据类型、矩阵、日志、算子等</li><li>priv 是对内的头文件,运行时信息,计时函数等</li><li>src 是所有的实现代码</li><li>test 是测试代码,GPU 的执行结果和 CPU 进行比对</li></ul><p>可以开发一些算子,比如高斯滤波,图像翻转等。那么这个框架就可以用于高性能计算的落地和部署了,比如:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="number">1.</span> 读取图片</span><br><span class="line"><span class="number">2.</span> 颜色变换</span><br><span class="line"><span class="number">3.</span> 图像翻转</span><br><span class="line"><span class="number">4.</span> 模型推理</span><br><span class="line"><span class="number">5.</span> 后处理</span><br><span class="line"><span class="number">5.</span> 结果输出</span><br></pre></td></tr></table></figure><p>除了模型推理这一步,其余颜色变换、后处理都可以写一些 cuda 算子来进行加速和优化。或者,你可以把 CUFX 理解为一个 OpenCV,不过目前还没那么多功能和接口。</p><h1 id="TODO"><a href="#TODO" class="headerlink" title="TODO"></a>TODO</h1><ol><li>线程池,这个网上很多,可以实现后添加进去</li><li>内存池,对 CUFX 内部的内存进行管理,避免碎片化,打印内存使用详情等信息</li><li>可以添加一些优雅的 C++ 实现,比如最近在学模板的特化和偏特化,也许哪天就加进去了</li><li>一些高性能 cuda 算子,核心也是灵魂,写好需要花很多时间</li></ol><p><del>如果我上学的时候会这些多好啊,我就慢慢的每天学习一些新知识,然后添加代码,面试还能吹牛逼。</del></p>]]></content>
<summary type="html"><p>利用下班时间学完了 CUDA,Anyway 忙起来真的很大程度能缓解焦虑,能忘记和忽略很多烦恼。所以寻思着结合这一年来所学,写了一个简单的 CUDA 计算框架:<a href="https://github.com/muyuuuu/CUFX/tree/main/CUFX">CUFX</a>。</p></summary>
<category term="CUDA" scheme="https://muyuuuu.github.io/tags/CUDA/"/>
</entry>
<entry>
<title>重返C++:C++ 类型擦除</title>
<link href="https://muyuuuu.github.io/2024/07/26/type-ensure/"/>
<id>https://muyuuuu.github.io/2024/07/26/type-ensure/</id>
<published>2024-07-25T16:38:30.000Z</published>
<updated>2024-11-27T16:43:48.549Z</updated>
<content type="html">< {</span><br><span class="line"> std::cerr << x << <span class="string">"\n"</span>; </span><br><span class="line">};</span><br></pre></td></tr></table></figure><p>问题来了,刚开始的时候我以为 <code>std::function<void(int)></code> 就是匿名函数的返回类型,在 <code>github</code> 上给别人发送 <a href="https://github.com/parallel101/cppguidebook/pull/48"><code>PR</code> </a> 时就发生了笑话。</p><p>实际上这两个类型并不相同,<code>function</code> 是一个类型擦除容器,而 <code>lambda</code> 匿名类型简单来说就是重载了 <code>operator()</code> 的类。由于 <code>std::function</code> 有转换构造函数,<code>lambda</code> 表达式得以调用这个转换构造函数,构造出这一个 <code>std::function</code>对象,所以这个赋值发生了隐式类型转换。</p><p>在一些代码中,我们可能无法保留原有的数据类型,上面的匿名函数就是典型的例子。这个时候需要用一种通用的类型去使用它们,需要去掉对象原有的数据类型,也就是类型擦除 (Type Erasure)。</p><span id="more"></span><h1 id="常见的类型擦除"><a href="#常见的类型擦除" class="headerlink" title="常见的类型擦除"></a>常见的类型擦除</h1><h2 id="Void-擦除"><a href="#Void-擦除" class="headerlink" title="Void* 擦除"></a>Void* 擦除</h2><p>C 语言中的类型擦除技术为 <code>void*</code>,比较常见的场景是 <code>memset</code>,接受一个 <code>void*</code> 指针,可以将给定字节的内容置为 <code>value</code>。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">void</span> * <span class="title function_">memset</span> <span class="params">( <span class="type">void</span> * ptr, <span class="type">int</span> value, <span class="type">size_t</span> num )</span>;</span><br></pre></td></tr></table></figure><p>但是 <code>void*</code> 有很大的缺陷,在某些场景需要恢复数据类型才能使用。这就需要开发者在编程的时候时刻注意,运行到这里时,变量应该是什么类型,维护极其困难。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">void</span> <span class="title function_">func</span><span class="params">(<span class="type">void</span> *ctx)</span> { </span><br><span class="line"><span class="keyword">auto</span> sparse_ctx = (SparseData *)(ctx); <span class="comment">// 必须记住参数类型才能复原</span></span><br><span class="line">}</span><br></pre></td></tr></table></figure><h2 id="继承擦除"><a href="#继承擦除" class="headerlink" title="继承擦除"></a>继承擦除</h2><p>类的继承也可以实现类型擦除的效果,用基类的指针指向子类:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><iostream></span></span></span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">Shape</span> {</span></span><br><span class="line">public:</span><br><span class="line"> virtual <span class="type">void</span> <span class="title function_">show</span><span class="params">()</span> = <span class="number">0</span>;</span><br><span class="line">};</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">Circle</span> :</span> public Shape {</span><br><span class="line">public:</span><br><span class="line"> <span class="type">void</span> <span class="title function_">show</span><span class="params">()</span> {</span><br><span class="line"> <span class="built_in">std</span>::<span class="built_in">cerr</span> << <span class="string">"I am a circle\n"</span>; </span><br><span class="line"> }</span><br><span class="line">};</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">Rect</span> :</span> public Shape {</span><br><span class="line">public:</span><br><span class="line"> <span class="type">void</span> <span class="title function_">show</span><span class="params">()</span> {</span><br><span class="line"> <span class="built_in">std</span>::<span class="built_in">cerr</span> << <span class="string">"I am a rect\n"</span>; </span><br><span class="line"> }</span><br><span class="line">};</span><br><span class="line"></span><br><span class="line"><span class="type">void</span> <span class="title function_">func</span><span class="params">(Shape *obj)</span> { <span class="comment">// 基类指针指向子类</span></span><br><span class="line"> obj->show();</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="type">int</span> <span class="title function_">main</span><span class="params">()</span> {</span><br><span class="line"> Circle c;</span><br><span class="line"> Rect r;</span><br><span class="line"> func(&c);</span><br><span class="line"> func(&r);</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>但是在很多情况下,这是很难做到的,因为它要求每个实现类型都继承自某个基类,甚至是毫不相关的类型。</p><h2 id="模板擦除"><a href="#模板擦除" class="headerlink" title="模板擦除"></a>模板擦除</h2><p>在 <code>C++</code> 中,可以通过模板参数实例化的形式,变相的实现了类型擦除:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">template</span><<span class="keyword">typename</span> T></span></span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">Compare</span><span class="params">(T* data1, T* data2, std::<span class="type">size_t</span> size)</span> </span>{</span><br><span class="line"> <span class="keyword">for</span> (std::<span class="type">size_t</span> i = <span class="number">0</span>; i < size; i++) {</span><br><span class="line"> <span class="type">unsigned</span> <span class="type">char</span> *p1 = (<span class="type">unsigned</span> <span class="type">char</span> *)data1 + i;</span><br><span class="line"> <span class="type">unsigned</span> <span class="type">char</span> *p2 = (<span class="type">unsigned</span> <span class="type">char</span> *)data2 + i;</span><br><span class="line"> <span class="keyword">if</span> (*p1 != *p2) { <span class="comment">// 逐字节比对</span></span><br><span class="line"> std::cerr << <span class="string">" compare failed \n"</span>;</span><br><span class="line"> <span class="keyword">return</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> std::cerr << <span class="string">" compare success \n"</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>但是使用模板并没有完全的擦除类型,<code>T</code> 仍然是函数原型的一部分。这样我们其实一直都保留着元素的具体类型信息,好处:</p><ul><li>完整的类型安全性,没有任何环节丢掉了类型信息</li><li>因此不需要动态绑定,所有环节都是静态的,没有运行时性能损失</li></ul><p>但也有坏处:</p><ul><li>模板类型会作为模板函数或模板类的原型的一部分,即 <code>vector<int></code> 和 <code>vector<double></code> 是两个类型,没办法用一个类型来表示</li><li>每次用不同的参数类型来实例化模板时,都会新生成一份代码,导致编译出来的二进制文件很大</li></ul><h1 id="Type-Erasure"><a href="#Type-Erasure" class="headerlink" title="Type Erasure"></a>Type Erasure</h1><p>在 <code>C++</code> 中我们可以结合继承与模板,实现出类型擦除技术,将不同类型用同一种类型表示。假设我们此时有一个计数器类 <code>ClassA</code>,可以完成递增和递减的操作:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">ClassA</span> {</span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> <span class="type">int</span> a;</span><br><span class="line"> <span class="built_in">ClassA</span>() {</span><br><span class="line"> a = <span class="number">0</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="type">void</span> <span class="title">Increase</span><span class="params">()</span> </span>{</span><br><span class="line"> a += <span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="type">void</span> <span class="title">Decrease</span><span class="params">()</span> </span>{</span><br><span class="line"> a -= <span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="type">void</span> <span class="title">GetValue</span><span class="params">()</span> </span>{</span><br><span class="line"> std::cerr << <span class="string">"val = "</span> << a << <span class="string">"\n"</span>;</span><br><span class="line"> }</span><br><span class="line">};</span><br></pre></td></tr></table></figure><p>我们期望在 <code>main</code> 函数中进行类型擦除,当接受的类型为 <code>ClassA</code> 时,可以进行递增和递减;在接受的类型时,如 <code>int</code> 型的常数,也可以进行同样的计算。如下所示:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"> CountContain c1 = ClassA{};</span><br><span class="line"> c1.<span class="built_in">Increase</span>();</span><br><span class="line"> c1.<span class="built_in">GetValue</span>();</span><br><span class="line"></span><br><span class="line"> CountContain c2 = <span class="number">5</span>;</span><br><span class="line"> c2.<span class="built_in">Increase</span>();</span><br><span class="line"> c2.<span class="built_in">GetValue</span>();</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>为了这个目的,我们来实现 <code>CountContain</code> 这个类,在类中封装一个智能指针来保管创建的对象,也就是 <code>main</code> 函数中的 <code>c1</code> 和 <code>c2</code>,并调用对象的 <code>Increase</code> 等方法。</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">CountContain</span> {</span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> std::unique_ptr<???> m_ptr;</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">template</span><<span class="keyword">typename</span> T></span></span><br><span class="line"><span class="function"> <span class="title">CountContain</span><span class="params">(T t)</span> : m_ptr{</span><span class="keyword">new</span> ???(std::forward<T>(t))} {};</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="type">void</span> <span class="title">Increase</span><span class="params">()</span> </span>{</span><br><span class="line"> m_ptr-><span class="built_in">Increase</span>();</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="type">void</span> <span class="title">Decrease</span><span class="params">()</span> </span>{</span><br><span class="line"> m_ptr-><span class="built_in">Decrease</span>();</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="type">void</span> <span class="title">GetValue</span><span class="params">()</span> </span>{</span><br><span class="line"> m_ptr-><span class="built_in">GetValue</span>();</span><br><span class="line"> }</span><br><span class="line">};</span><br></pre></td></tr></table></figure><p>此时需要填充上述代码的问号 <code>???</code> 部分。在 <code>std::unique_ptr<???> m_ptr</code> 中,没有模板参数 <code>T</code>。模板参数 <code>T</code> 只存于构造函数中,用于创造具体的对象。还记得之前讲过的类继承擦除吗?</p><p>对的,在这里指针的 <code>???</code> 部分应该使用基类,而构造函数的 <code>???</code> 应该使用继承基类的子类。子类通过模板来保留类型信息,而通过基类来实现统一的存储与调用。</p><p>那么实现一个这样的基类 <code>CounterBase</code> 用于调用:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">CounterBase</span> {</span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> <span class="keyword">virtual</span> ~<span class="built_in">CounterBase</span>() {};</span><br><span class="line"> <span class="function"><span class="keyword">virtual</span> <span class="type">void</span> <span class="title">Increase</span><span class="params">()</span> </span>= <span class="number">0</span>;</span><br><span class="line"> <span class="function"><span class="keyword">virtual</span> <span class="type">void</span> <span class="title">Decrease</span><span class="params">()</span> </span>= <span class="number">0</span>;</span><br><span class="line"> <span class="function"><span class="keyword">virtual</span> <span class="type">void</span> <span class="title">GetValue</span><span class="params">()</span> </span>= <span class="number">0</span>;</span><br><span class="line"> <span class="type">int</span> val;</span><br><span class="line">};</span><br></pre></td></tr></table></figure><p>在实现对应的子类 <code>CounterImpl</code> 保留模板信息。注意,由子类调用实例化的 <code>m_impl</code>(也就是 <code>ClassA</code> )的 <code>Increase</code> 方法完成最终调用。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line">template<typename T></span><br><span class="line">class CounterImpl : public CounterBase {</span><br><span class="line">public:</span><br><span class="line"> T m_impl; // 移动构造,将参数的 value 移动到 m_impl</span><br><span class="line"> CounterImpl(T value) : m_impl(std::move(value)) {}</span><br><span class="line"> void Increase() override {</span><br><span class="line"> m_impl.Increase();</span><br><span class="line"> }</span><br><span class="line"> void Decrease() override {</span><br><span class="line"> m_impl.Decrease();</span><br><span class="line"> }</span><br><span class="line"> void GetValue() override {</span><br><span class="line"> m_impl.GetValue();</span><br><span class="line"> }</span><br><span class="line">};</span><br></pre></td></tr></table></figure><p>对于没有 <code>Increase</code>、<code>Decrease</code>、<code>Count</code> 接口的类型,比如内置类型 <code>int</code>,我们还可以特化模板 <code>CounterImpl</code> 来满足要求:<br><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">template</span><></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">CounterImpl</span><<span class="type">int</span>> : <span class="keyword">public</span> CounterBase {</span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> <span class="type">int</span> m_impl;</span><br><span class="line"> <span class="built_in">CounterImpl</span>(<span class="type">int</span> value) : <span class="built_in">m_impl</span>(value) {}</span><br><span class="line"> <span class="function"><span class="type">void</span> <span class="title">Increase</span><span class="params">()</span> <span class="keyword">override</span> </span>{</span><br><span class="line"> m_impl++;</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="type">void</span> <span class="title">Decrease</span><span class="params">()</span> <span class="keyword">override</span> </span>{</span><br><span class="line"> m_impl--;</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="type">void</span> <span class="title">GetValue</span><span class="params">()</span> <span class="keyword">override</span> </span>{</span><br><span class="line"> std::cerr << <span class="string">"val = "</span> << m_impl << <span class="string">"\n"</span>;</span><br><span class="line"> }</span><br><span class="line">};</span><br></pre></td></tr></table></figure></p><p>对于 <code>int</code> 类型,通过模板特化支持了 <code>Increase</code> 等方法,各种行为和 <code>ClassA</code> 类保持一致,那么我们就可以认为 <code>CounterImpl<int></code> 类型是一个像 <code>ClassA</code> 的鸭子类型。 </p><blockquote><p>如果一个东西,走路像鸭子,叫声也像鸭子,那么它就是鸭子。换句话说,如果一个东西,满足我们对鸭子的所有要求,那么它就是鸭子。</p></blockquote><p>最终代码如下:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><iostream></span></span></span><br><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><memory></span></span></span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">ClassA</span>{</span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> <span class="type">int</span> a;</span><br><span class="line"> <span class="built_in">ClassA</span>() {</span><br><span class="line"> a = <span class="number">0</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="type">void</span> <span class="title">Increase</span><span class="params">()</span> </span>{</span><br><span class="line"> a += <span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="type">void</span> <span class="title">Decrease</span><span class="params">()</span> </span>{</span><br><span class="line"> a -= <span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="type">void</span> <span class="title">GetValue</span><span class="params">()</span> </span>{</span><br><span class="line"> std::cerr << <span class="string">"val = "</span> << a << <span class="string">"\n"</span>;</span><br><span class="line"> }</span><br><span class="line">};</span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">CounterBase</span> {</span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> <span class="keyword">virtual</span> ~<span class="built_in">CounterBase</span>() {};</span><br><span class="line"> <span class="function"><span class="keyword">virtual</span> <span class="type">void</span> <span class="title">Increase</span><span class="params">()</span> </span>= <span class="number">0</span>;</span><br><span class="line"> <span class="function"><span class="keyword">virtual</span> <span class="type">void</span> <span class="title">Decrease</span><span class="params">()</span> </span>= <span class="number">0</span>;</span><br><span class="line"> <span class="function"><span class="keyword">virtual</span> <span class="type">void</span> <span class="title">GetValue</span><span class="params">()</span> </span>= <span class="number">0</span>;</span><br><span class="line"> <span class="type">int</span> val;</span><br><span class="line">};</span><br><span class="line"></span><br><span class="line"><span class="keyword">template</span><<span class="keyword">typename</span> T></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">CounterImpl</span> : <span class="keyword">public</span> CounterBase {</span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> T m_impl;</span><br><span class="line"> <span class="built_in">CounterImpl</span>(T &&value) : <span class="built_in">m_impl</span>(value) {}</span><br><span class="line"> <span class="function"><span class="type">void</span> <span class="title">Increase</span><span class="params">()</span> <span class="keyword">override</span> </span>{</span><br><span class="line"> m_impl.<span class="built_in">Increase</span>();</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="type">void</span> <span class="title">Decrease</span><span class="params">()</span> <span class="keyword">override</span> </span>{</span><br><span class="line"> m_impl.<span class="built_in">Decrease</span>();</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="type">void</span> <span class="title">GetValue</span><span class="params">()</span> <span class="keyword">override</span> </span>{</span><br><span class="line"> m_impl.<span class="built_in">GetValue</span>();</span><br><span class="line"> }</span><br><span class="line">};</span><br><span class="line"></span><br><span class="line"><span class="keyword">template</span><></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">CounterImpl</span><<span class="type">int</span>> : <span class="keyword">public</span> CounterBase {</span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> <span class="type">int</span> m_impl;</span><br><span class="line"> <span class="built_in">CounterImpl</span>(<span class="type">int</span> value) : <span class="built_in">m_impl</span>(value) {}</span><br><span class="line"> <span class="function"><span class="type">void</span> <span class="title">Increase</span><span class="params">()</span> <span class="keyword">override</span> </span>{</span><br><span class="line"> m_impl++;</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="type">void</span> <span class="title">Decrease</span><span class="params">()</span> <span class="keyword">override</span> </span>{</span><br><span class="line"> m_impl--;</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="type">void</span> <span class="title">GetValue</span><span class="params">()</span> <span class="keyword">override</span> </span>{</span><br><span class="line"> std::cerr << <span class="string">"val = "</span> << m_impl << <span class="string">"\n"</span>;</span><br><span class="line"> }</span><br><span class="line">};</span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">CountContain</span> {</span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> std::unique_ptr<CounterBase> m_ptr;</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">template</span><<span class="keyword">typename</span> T></span></span><br><span class="line"><span class="function"> <span class="title">CountContain</span><span class="params">(T t)</span> : m_ptr{</span><span class="keyword">new</span> <span class="built_in">CounterImpl</span><T>(std::forward<T>(t))} {};</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="type">void</span> <span class="title">Increase</span><span class="params">()</span> </span>{</span><br><span class="line"> m_ptr-><span class="built_in">Increase</span>();</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="type">void</span> <span class="title">Decrease</span><span class="params">()</span> </span>{</span><br><span class="line"> m_ptr-><span class="built_in">Decrease</span>();</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="type">void</span> <span class="title">GetValue</span><span class="params">()</span> </span>{</span><br><span class="line"> m_ptr-><span class="built_in">GetValue</span>();</span><br><span class="line"> }</span><br><span class="line">};</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"> CountContain c1 = ClassA{};</span><br><span class="line"> c1.<span class="built_in">Increase</span>();</span><br><span class="line"> c1.<span class="built_in">GetValue</span>();</span><br><span class="line"></span><br><span class="line"> CountContain c2 = <span class="number">5</span>;</span><br><span class="line"> c2.<span class="built_in">Increase</span>();</span><br><span class="line"> c2.<span class="built_in">GetValue</span>();</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h1 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h1><p>如果有下面两个需求,可能是需要 Type Erasure 的:</p><ul><li>需要用同一种方式处理不同的类型</li><li>需要用同一种类型或容器保存不同类型的对象</li></ul><h1 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h1><ol><li><a href="https://fuzhe1989.github.io/2017/10/29/cpp-type-erasure/">C++: Type Erasure</a></li><li><a href="https://nihil.cc/posts/std_function/">std::function实现</a></li></ol>]]></content>
<summary type="html"><p>对于 <code>C++</code> 中的匿名函数,除了写 <code>auto</code> 外,还可以使用 <code>std::function</code> 作为类型接受匿名函数:</p>
<figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">std::function&lt;<span class="type">void</span>(<span class="type">int</span>)&gt; func = [](<span class="type">int</span> x) &#123;</span><br><span class="line"> std::cerr &lt;&lt; x &lt;&lt; <span class="string">&quot;\n&quot;</span>; </span><br><span class="line">&#125;;</span><br></pre></td></tr></table></figure>
<p>问题来了,刚开始的时候我以为 <code>std::function&lt;void(int)&gt;</code> 就是匿名函数的返回类型,在 <code>github</code> 上给别人发送 <a href="https://github.com/parallel101/cppguidebook/pull/48"><code>PR</code> </a> 时就发生了笑话。</p>
<p>实际上这两个类型并不相同,<code>function</code> 是一个类型擦除容器,而 <code>lambda</code> 匿名类型简单来说就是重载了 <code>operator()</code> 的类。由于 <code>std::function</code> 有转换构造函数,<code>lambda</code> 表达式得以调用这个转换构造函数,构造出这一个 <code>std::function</code>对象,所以这个赋值发生了隐式类型转换。</p>
<p>在一些代码中,我们可能无法保留原有的数据类型,上面的匿名函数就是典型的例子。这个时候需要用一种通用的类型去使用它们,需要去掉对象原有的数据类型,也就是类型擦除 (Type Erasure)。</p></summary>
<category term="Cpp" scheme="https://muyuuuu.github.io/tags/Cpp/"/>
</entry>
<entry>
<title>重返C++:从 ref 挖到移动语义,在从 forward 挖到可变参数模板</title>
<link href="https://muyuuuu.github.io/2024/06/28/Cpp-1-ref-to-forward/"/>
<id>https://muyuuuu.github.io/2024/06/28/Cpp-1-ref-to-forward/</id>
<published>2024-06-28T14:32:24.000Z</published>
<updated>2024-11-27T16:32:12.841Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" class="aplayer-secondary-style-marker" href="/assets/css/APlayer.min.css"><script src="/assets/js/APlayer.min.js" class="aplayer-secondary-script-marker"></script><p>C++ 漫游的第一部分,起因源于项目中错误的使用 std::ref 和 std::fowrad 导致了一些神奇的 bug。而 std::ref 又涉及到了引用,左右值引用又会联想到移动语义,std::forward 又常用于模板。所以以此为契机,不如仔细学习一下 C++ 中的新特性。</p><span id="more"></span><h1 id="std-ref-用法"><a href="#std-ref-用法" class="headerlink" title="std::ref 用法"></a>std::ref 用法</h1><p>将一个对象作为引用传递给函数或算法,而不是按值传递。</p><h2 id="用于-bind"><a href="#用于-bind" class="headerlink" title="用于 bind"></a>用于 bind</h2><p><strong><code>std::bind</code></strong> <strong>使用的是参数的拷贝而不是引用,因此必须显示利用</strong> <strong><code>std::ref</code></strong> <strong>来进行引用绑定</strong>。</p><ul><li>捕获引用:<code>std::bind</code> 不支持捕获引用,总是拷贝参数,必须配合 <code>std::ref</code> 才能捕获到引用。</li></ul><blockquote><p>如果不使用 <code>std::ref</code>,那么 <code>main</code> 里的局部变量 <code>x</code> 不会改变!因为 <code>std::bind</code> 有一个恼人的设计:默认按拷贝捕获,会把参数拷贝一份,而不是保留引用。有趣的是,placeholder 指定的参数,却不需要 <code>std::ref</code> 才能保持引用:</p></blockquote><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><iostream></span></span></span><br><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><functional></span></span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">func</span><span class="params">(<span class="type">int</span>& n1, <span class="type">int</span>& n2)</span> </span>{</span><br><span class="line"> std::cout << <span class="string">" ============ in function ============"</span> << std::endl;</span><br><span class="line"> n1 ++;</span><br><span class="line"> n2 ++;</span><br><span class="line"> std::cout << <span class="string">" n1 = "</span> << n1 << <span class="string">" n2 = "</span> << n2 << std::endl;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="type">int</span> n1 = <span class="number">1</span>, n2 = <span class="number">2</span>;</span><br><span class="line"> std::function<<span class="type">void</span>()> f = std::<span class="built_in">bind</span>(func, n1, std::<span class="built_in">ref</span>(n2)); <span class="comment">// 已经按值绑定</span></span><br><span class="line"> </span><br><span class="line"> n1 = <span class="number">10</span>;</span><br><span class="line"> n2 = <span class="number">12</span>;</span><br><span class="line"> </span><br><span class="line"> std::cout << <span class="string">" ============== before ============= "</span> << std::endl;</span><br><span class="line"> std::cout << <span class="string">" n1 = "</span> << n1 << <span class="string">" n2 = "</span> << n2 << std::endl;</span><br><span class="line"> </span><br><span class="line"> <span class="built_in">f</span>();</span><br><span class="line"> </span><br><span class="line"> std::cout << <span class="string">" ============== after ============= "</span> << std::endl;</span><br><span class="line"> std::cout << <span class="string">" n1 = "</span> << n1 << <span class="string">" n2 = "</span> << n2 << std::endl;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>输出:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">============== before ============= </span><br><span class="line">n1 = 10 n2 = 12</span><br><span class="line">============ in function ============</span><br><span class="line">n1 = 2 n2 = 13</span><br><span class="line">============== after ============= </span><br><span class="line">n1 = 10 n2 = 13</span><br></pre></td></tr></table></figure><h2 id="用于线程传参"><a href="#用于线程传参" class="headerlink" title="用于线程传参"></a>用于线程传参</h2><p><code>std::thread</code> 的构造函数基于了 <code>bind</code>,因此会将提供的值进行拷贝,而不会转换为预期的参数类型。如果形参声明为引用,而不传入引用,不写 <code>ref</code> 时会报错哦~</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><iostream></span></span></span><br><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><thread></span></span></span><br><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><string></span></span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">func</span><span class="params">(std::string& str, <span class="type">int</span> v)</span> </span>{</span><br><span class="line"> str = <span class="string">"func"</span>;</span><br><span class="line"> v = <span class="number">12</span>;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="function">std::string <span class="title">str</span><span class="params">(<span class="string">"main"</span>)</span></span>;</span><br><span class="line"> <span class="type">int</span> v = <span class="number">-12</span>;</span><br><span class="line"></span><br><span class="line"> <span class="function">std::thread <span class="title">t</span><span class="params">(func, std::ref(str), v)</span></span>;</span><br><span class="line"></span><br><span class="line"> t.<span class="built_in">join</span>();</span><br><span class="line"></span><br><span class="line"> std::cout << str << std::endl;</span><br><span class="line"> std::cout << v << std::endl;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h1 id="左值和右值"><a href="#左值和右值" class="headerlink" title="左值和右值"></a>左值和右值</h1><ul><li><p>左值:在程序中可以被寻址、具有持久存储位置的表达式。换句话说,表示一个内存位置,用于赋值表达式的左侧,可以是变量、数组或者引用等。在内存中有固定的存储位置,编译器会为其分配内存,并将地址存储到符号表中。所以在程序运行时,左值有具体的内存位置,可以通过地址访问和修改。</p></li><li><p>右值:在程序中不可寻址、临时存储在寄存器中的表达式。通常是字面值、临时变量或者结算结果。不能用于赋值表达式的左侧。由于存储在寄存器或栈上,没有固定的内存位置。当编译器遇到右值时,不会为其分配内存,在内存中没有固定的位置,不能用于赋值表达式的左侧。</p></li></ul><h2 id="左值示例"><a href="#左值示例" class="headerlink" title="左值示例"></a>左值示例</h2><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">int</span> a = <span class="number">10</span>;</span><br><span class="line"><span class="type">int</span>& b = a;</span><br><span class="line">b = <span class="number">17</span>; <span class="comment">// a 会被修改为 17</span></span><br></pre></td></tr></table></figure><p>以下是错误的写法:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">int</span> a = <span class="number">10</span>;</span><br><span class="line"><span class="type">const</span> <span class="type">int</span>& b = a;</span><br><span class="line">b = <span class="number">17</span>; <span class="comment">// b 不能修改</span></span><br><span class="line"></span><br><span class="line"><span class="type">const</span> <span class="type">int</span> a = <span class="number">10</span>;</span><br><span class="line"><span class="type">int</span>& b = a; <span class="comment">// 错误,必须为 const</span></span><br></pre></td></tr></table></figure><p>在下面的例子中,表达式 <code>a + b</code> 是一个右值,表达临时的计算结果,在内存中没有固定的存储位置。</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">int</span> a = <span class="number">42</span>; </span><br><span class="line"><span class="type">int</span> b = a; </span><br><span class="line"><span class="type">int</span> c = a + b; </span><br></pre></td></tr></table></figure><p>在下面的例子中:</p><ul><li><code>int v = func()</code>,创建一个左值,并赋值为引用中的值,因此修改 <code>v</code> 时,不会修改全局变量 <code>val</code></li><li>而 <code>int& v = func()</code>,会创建 <code>val</code> 的引用,因此修改 <code>v</code> 时会修改全局变量 <code>val</code></li></ul><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><iostream></span></span></span><br><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><functional></span></span></span><br><span class="line"></span><br><span class="line"><span class="type">int</span> val = <span class="number">-1</span>;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span>& <span class="title">func</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="keyword">return</span> val;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> <span class="type">int</span> v = <span class="built_in">func</span>(); <span class="comment">// 不会修改</span></span><br><span class="line"> <span class="type">int</span>& v = <span class="built_in">func</span>(); <span class="comment">// 会修改</span></span><br><span class="line"></span><br><span class="line"> v = <span class="number">1</span>;</span><br><span class="line"></span><br><span class="line"> std::cout << val << std::endl;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h2 id="右值引用与移动语义"><a href="#右值引用与移动语义" class="headerlink" title="右值引用与移动语义"></a>右值引用与移动语义</h2><p>在下面的例子中,<code>a_ref * 2</code> 是临时的右值,绑定到右值引用上。因此 <code>b</code> 的值为 26。</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">int</span> a = <span class="number">1</span>;</span><br><span class="line"><span class="type">int</span>& a_ref = a;</span><br><span class="line"></span><br><span class="line">a = <span class="number">13</span>;</span><br><span class="line"><span class="type">int</span>&& b = a_ref * <span class="number">2</span>;</span><br><span class="line">std::cout << b << std::endl;</span><br></pre></td></tr></table></figure><p>既然了解了右值引用,那么来看一下移动语义吧。下面是一个较长的使用右值引用完成移动语义的的例子:主要目的是优化临时对象的资源转移,避免不必要的拷贝动作。</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><iostream></span></span></span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">MyString</span>{</span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"></span><br><span class="line"> <span class="built_in">MyString</span>() = <span class="keyword">default</span>;</span><br><span class="line"> <span class="built_in">MyString</span>(<span class="type">int</span>* d) : _data{d} {</span><br><span class="line"> std::cerr << <span class="string">"default construct"</span> << std::endl;</span><br><span class="line"> };</span><br><span class="line"></span><br><span class="line"> MyString& <span class="keyword">operator</span>=(<span class="type">const</span> MyString& other) <span class="keyword">noexcept</span> {</span><br><span class="line"> <span class="keyword">this</span>->_data = other._data;</span><br><span class="line"> std::cerr << <span class="string">"called copy assignment"</span> << std::endl;</span><br><span class="line"> <span class="keyword">return</span> *<span class="keyword">this</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="built_in">MyString</span>(MyString &&other) <span class="keyword">noexcept</span> {</span><br><span class="line"> _data = other._data;</span><br><span class="line"> other._data = <span class="literal">nullptr</span>;</span><br><span class="line"> std::cerr << <span class="string">"called move construct"</span> << std::endl;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> MyString& <span class="keyword">operator</span>=(MyString&& other) <span class="keyword">noexcept</span> {</span><br><span class="line"> <span class="keyword">if</span> (<span class="keyword">this</span> != &other) {</span><br><span class="line"> <span class="keyword">delete</span> _data;</span><br><span class="line"> <span class="keyword">this</span>->_data = other._data;</span><br><span class="line"> other._data = <span class="literal">nullptr</span>;</span><br><span class="line"> }</span><br><span class="line"> std::cerr << <span class="string">"called move assignment"</span> << std::endl;</span><br><span class="line"> <span class="keyword">return</span> *<span class="keyword">this</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"><span class="keyword">private</span>:</span><br><span class="line"> <span class="type">int</span> *_data = <span class="literal">nullptr</span>;</span><br><span class="line">};</span><br><span class="line"></span><br><span class="line"><span class="function">MyString <span class="title">func</span><span class="params">()</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> <span class="type">int</span>* d = <span class="keyword">new</span> <span class="built_in">int</span>();</span><br><span class="line"> MyString a{d};</span><br><span class="line"> <span class="keyword">return</span> a;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> MyString a1 = <span class="built_in">func</span>(); <span class="comment">// RVO,直接调用默认构造函数,构造到 a1</span></span><br><span class="line"> MyString a2;</span><br><span class="line"> a2 = <span class="built_in">func</span>(); <span class="comment">// 默认构造后,函数返回值是临时对象,属于右值,所以调用移动赋值</span></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>对于第 46 行代码,由于 <code>RVO</code> (return value optimization,返回值优化)的存在,那么返回值将直接构造在 <code>main</code> 函数中的 <code>a1</code> 对象中,而不是在 <code>func</code> 函数内部创建一个临时对象并将其复制或移动到 <code>a1</code> 中。因此,<code>RVO</code> 不会调用移动构造函数或复制构造函数。</p><p>此外观察代码的 25 行,移动构造没有删除自己的 <code>data</code> 指针,而移动赋值删除了自己的 <code>data</code> 指针。这是因为:</p><ul><li>移动赋值操作符需要释放资源是因为在进行移动赋值操作时,运算表达式的左侧通常已经拥有了资源。 </li><li>而移动构造函数用于构造新对象,新对象的 <code>data</code> 指针并不拥有资源。</li></ul><h2 id="左右值重载"><a href="#左右值重载" class="headerlink" title="左右值重载"></a>左右值重载</h2><p>我们实现一份左右值重载的函数:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">void</span> <span class="title">printValue</span><span class="params">(<span class="type">const</span> <span class="type">int</span>& x)</span> </span>{</span><br><span class="line"> std::cout << <span class="string">"lvalue ref: "</span> << x << std::endl;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">printValue</span><span class="params">(<span class="type">int</span>&& x)</span> </span>{</span><br><span class="line"> std::cout << <span class="string">"rvalue ref: "</span> << x << std::endl;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="type">int</span> a = <span class="number">42</span>;</span><br><span class="line"> <span class="built_in">printValue</span>(a); <span class="comment">// lvalue ref</span></span><br><span class="line"> <span class="built_in">printValue</span>(a * <span class="number">2</span>); <span class="comment">// rvalue ref</span></span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>通过这些方法,我们可以充分利用左值和右值的特性,编写更高效、易于维护的代码。同时,我们还可以在特定情况下针对左值和右值的性能差异进行相应的优化。如 <code>vector</code> 的 <code>push_back</code> 函数,传入左值时,会调用拷贝构造,传入右值时,调用移动构造。</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">vector<Class> v;</span><br><span class="line"></span><br><span class="line">Class c; <span class="comment">// default construct</span></span><br><span class="line">v.<span class="built_in">push_back</span>(c); <span class="comment">// copy construct</span></span><br><span class="line">v.<span class="built_in">push_back</span>(std::<span class="built_in">move</span>(c)) <span class="comment">// move construct</span></span><br></pre></td></tr></table></figure><h1 id="完美转发"><a href="#完美转发" class="headerlink" title="完美转发"></a>完美转发</h1><p>在前文中已经实现了左右值重载的代码,但是这份代码存在一些风险,来看下面的代码示例:如果通过一个中间层 <code>func</code> 去调用重载的 <code>foo</code> 函数(这在线程池中是很常见的行为),则不管对 <code>func</code> 传入的是左值还是右值,一定会调用左值的函数。虽然 <code>func</code> 函数传入的是右值,<strong>但是右值引用 <code>param</code> 是一个左值</strong>,所以会调用 <code>foo(std::string& s)</code> 函数。</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><iostream></span></span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">foo</span><span class="params">(std::string& s)</span> </span>{</span><br><span class="line"> std::cout << <span class="string">" left value ref "</span> << s << std::endl;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">foo</span><span class="params">(std::string&& s)</span> </span>{</span><br><span class="line"> std::cout << <span class="string">"right value ref "</span> << s << std::endl;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">func</span><span class="params">(std::string&& param)</span> </span>{</span><br><span class="line"> <span class="built_in">foo</span>(param);</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"> std::string s{<span class="string">"sad"</span>};</span><br><span class="line"> <span class="built_in">func</span>(s); <span class="comment">// left value ref test</span></span><br><span class="line"> <span class="built_in">func</span>(<span class="string">"test"</span>); <span class="comment">// left value ref test</span></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h2 id="万能引用"><a href="#万能引用" class="headerlink" title="万能引用"></a>万能引用</h2><p>也许你会注意到,在 <code>func</code> 函数中,参数的写法为:<code>std::string&& param</code>,考虑一种情况,如果 <code>func</code> 的参数很多,比如有 <code>n</code> 个,那么 <code>func()</code> 函数就需要 2 的 n 次方个 <code>fun()</code> 函数,显然这不是一个好方法。也就是基于此,才有了万能引用,如果用万能引用的方式,则只需一个函数即可,如下:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">template</span><<span class="keyword">typename</span> T></span></span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">func</span><span class="params">(T&& param)</span></span></span><br></pre></td></tr></table></figure><p>如果一个变量或者参数被声明为类型 <code>T&&</code>,且 <code>T</code> 是一个被推导的类型,那这个变量或参数就是一个万能引用。</p><h2 id="引用折叠"><a href="#引用折叠" class="headerlink" title="引用折叠"></a>引用折叠</h2><p>考虑以下代码:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">template</span><<span class="keyword">typename</span> T></span></span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">func</span><span class="params">(T&& param)</span> </span>{ <span class="comment">// a为万能引用</span></span><br><span class="line"> <span class="comment">// do sth</span></span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="type">int</span> a = <span class="number">1</span>;</span><br><span class="line"> <span class="type">int</span> &b = a;</span><br><span class="line"> <span class="built_in">fun</span>(a); <span class="comment">// OK</span></span><br><span class="line"> <span class="built_in">fun</span>(<span class="number">1</span>); <span class="comment">// OK</span></span><br><span class="line"> <span class="built_in">fun</span>(b);</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>从上述代码可看,<code>b</code> 的类型为左值引用即 <code>int &</code>,如果不考虑引用折叠,那么 <code>fun()</code> 函数中 <code>t</code> 的类型就是 <code>int & &&</code>,显然这种声明方式,编译器会报错。而这里编译器却允许在一定的情况下进行隐含的多层引用推导,这就是 <code>reference collapsing</code> (引用折叠)。C++ 中有两种引用(左值引用和右值引用),因此引用折叠就有四种组合。如果两个引用中至少其中一个引用是左值引用,那么折叠结果就是左值引用;否则折叠结果就是右值引用。</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">using</span> T = <span class="type">int</span> &;</span><br><span class="line">T& r1; <span class="comment">// int& & r1 -> int& r1 </span></span><br><span class="line">T&& r2; <span class="comment">// int& && r2 -> int& r2 </span></span><br><span class="line"></span><br><span class="line"><span class="keyword">using</span> U = <span class="type">int</span> &&;</span><br><span class="line">U& r3; <span class="comment">// int&& & r3 -> int& r3 </span></span><br><span class="line">U&& r4; <span class="comment">// int&& && r4 -> int&& r4</span></span><br></pre></td></tr></table></figure><h2 id="完美转发-1"><a href="#完美转发-1" class="headerlink" title="完美转发"></a>完美转发</h2><p>了解了这么多背景,如何让 <code>func</code> 函数正确工作呢?答案是使用完美转发 <code>forward</code>。<code>std::forward</code> 能够保留传给形参 <code>param</code> 的实参的全部信息。<code>func(param);</code> 中参数 <code>param</code> 是左值,那么 <code>func</code> 传给函数 <code>foo</code> 的就是左值;<code>func(foo + "bar");</code> 中参数 <code>foo + "bar"</code> 是右值,那么 <code>func</code> 传给函数 <code>foo</code>的就是右值。</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><iostream></span></span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">foo</span><span class="params">(std::string& s)</span> </span>{</span><br><span class="line"> std::cout << <span class="string">" left value ref "</span> << s << std::endl;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">foo</span><span class="params">(std::string&& s)</span> </span>{</span><br><span class="line"> std::cout << <span class="string">"right value ref "</span> << s << std::endl;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">template</span><<span class="keyword">typename</span> T></span></span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">func</span><span class="params">(T&& param)</span> </span>{</span><br><span class="line"> <span class="built_in">foo</span>(std::forward<T>(param));</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"> std::string s{<span class="string">"val1"</span>};</span><br><span class="line"> <span class="built_in">func</span>(s);</span><br><span class="line"> <span class="built_in">func</span>(<span class="string">"val2"</span>);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>而完美转发的引用也必须满足以下几个条件:</p><ol><li><code>std::forward</code> 只能用于模板类型和 <code>auto</code> 类型,不能用于普通类型;</li><li><code>std::forward</code> 只有在函数模板中才有意义,因为只有函数模板才能推导出参数的具体类型,从而进行转发;</li><li><code>std::forward</code> 的参数必须是一个万能引用,否则会导致编译错误。</li></ol><p>额外的,<code>forward</code> 的外观非常具有迷惑性,又是尖括号又是圆括号的。实际上,<code>forward</code> 的用法非常单一:永远是 <code>forward<T>(t)</code> 的形式,其中 <code>T</code> 是 <code>t</code> 变量的类型。利用同样是 <code>C++11</code> 的 <code>decltype</code> 就能获得 <code>t</code> 定义时的 <code>T</code>。</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">void</span> <span class="title">some_func</span><span class="params">(<span class="keyword">auto</span> &&arg)</span> </span>{</span><br><span class="line"> <span class="built_in">other_func</span>(std::forward<<span class="keyword">decltype</span>(arg)>(arg));</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>所以 <code>std::forward<decltype(arg)>(arg)</code> 实际才是 <code>forward</code> 的正确用法,只不过因为大多数时候你是模板参数 <code>Arg &&</code>,有的人偷懒,就把 <code>decltype(arg)</code> 替换成已经匹配好的模板参数 <code>Arg</code> 了,实际上是等价的。我们可以定义一个宏:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">define</span> FWD(arg) std::forward<span class="string"><decltype(arg)></span>(arg)</span></span><br></pre></td></tr></table></figure><p>这样就可以简化为:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">void</span> <span class="title function_">some_func</span><span class="params">(<span class="keyword">auto</span> &&arg)</span> {</span><br><span class="line"> other_func(FWD(arg));</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h1 id="构造函数的扩展"><a href="#构造函数的扩展" class="headerlink" title="构造函数的扩展"></a>构造函数的扩展</h1><p>完美转发的东西到这里就结束了,但是看到上面满天飞的构造和赋值函数,结合 <code>vecotr</code> 等容器使用时很容易出错,或者说导致不必要的开销。因此额外在这里扩展一些内容:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><iostream></span></span></span><br><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><vector></span></span></span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">T</span> {</span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> <span class="type">int</span> a;</span><br><span class="line"> <span class="built_in">T</span>() {std::cerr << <span class="string">" default construct "</span> << std::endl;};</span><br><span class="line"> <span class="built_in">T</span>(<span class="type">const</span> T& t) {std::cerr << <span class="string">" copy construct "</span> << std::endl;}</span><br><span class="line"> <span class="built_in">T</span>(<span class="type">const</span> T&& t) {std::cerr << <span class="string">"move construct"</span> << std::endl;}</span><br><span class="line"> T& <span class="keyword">operator</span>=(<span class="type">const</span> T& t) {</span><br><span class="line"> <span class="keyword">if</span> (<span class="keyword">this</span> != &t) {</span><br><span class="line"> a = t.a;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> *<span class="keyword">this</span>;</span><br><span class="line"> }</span><br><span class="line"> T& <span class="keyword">operator</span>=(T&& t) {</span><br><span class="line"> <span class="keyword">if</span> (<span class="keyword">this</span> != &t) {</span><br><span class="line"> a = t.a;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> *<span class="keyword">this</span>;</span><br><span class="line"> }</span><br><span class="line">};</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="comment">// Write C++ code here</span></span><br><span class="line"> std::vector<T*> v1;</span><br><span class="line"> std::vector<T*> v2; <span class="comment">// 指针类型</span></span><br><span class="line"></span><br><span class="line"> T *t = <span class="keyword">new</span> <span class="built_in">T</span>(); <span class="comment">// 调用 default construct</span></span><br><span class="line"> v1.<span class="built_in">push_back</span>(t);</span><br><span class="line"></span><br><span class="line"> v2 = v1; <span class="comment">// 不调用任何构造函数 </span></span><br><span class="line"></span><br><span class="line"> <span class="comment">// -------------------------</span></span><br><span class="line"></span><br><span class="line"> std::vector<T> v3; <span class="comment">// 值类型</span></span><br><span class="line"> std::vector<T> v4;</span><br><span class="line"></span><br><span class="line"> v3.<span class="built_in">reserve</span>(<span class="number">10</span>);</span><br><span class="line"> v4.<span class="built_in">reserve</span>(<span class="number">10</span>);</span><br><span class="line"></span><br><span class="line"> T t1; <span class="comment">// 调用 default construct</span></span><br><span class="line"> v3.<span class="built_in">push_back</span>(t1); <span class="comment">// 将 t1 赋值给 vector 内部的元素,需要调用一次 copy construct</span></span><br><span class="line"> v3.<span class="built_in">emplace_back</span>(); <span class="comment">// 调用一次默认构造,优于上面的两行代码</span></span><br><span class="line"></span><br><span class="line"> v4 = v3; <span class="comment">// 将 v3 中的元素赋值给 v4,需要调用两次 copy construct</span></span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>因此更建议在 <code>vector</code> 中使用指针,或者使用 <code>emplace_back</code>。函数可以在容器中直接构造对象,而不是将对象拷贝或移动到容器中。这可以提高插入性能,特别是对于那些昂贵的拷贝操作或右值资源转移的对象。但是很多项目中 <code>vector</code> 的 <code>emplace_back</code> 用法不恰当,这会调用很多次拷贝构造,导致资源的移动:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">vector<Mat> v;</span><br><span class="line"><span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">0</span>; i < n; i++) {</span><br><span class="line"> v.<span class="built_in">emplace_back</span>(<span class="built_in">Mat</span>(size, elem_type, ...));</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>正确用法:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><iostream></span></span></span><br><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><vector></span></span></span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">Test</span> {</span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> <span class="built_in">Test</span>() = <span class="keyword">delete</span>;</span><br><span class="line"> <span class="type">int</span> x, y;</span><br><span class="line"> <span class="built_in">Test</span>(<span class="type">int</span> a, <span class="type">int</span> b) : x{a}, y{b} {</span><br><span class="line"> std::cerr << <span class="string">"default cons"</span> << std::endl;</span><br><span class="line"> };</span><br><span class="line"></span><br><span class="line"> <span class="built_in">Test</span>(Test&& a) <span class="keyword">noexcept</span> {</span><br><span class="line"> std::cerr << <span class="string">" move cons "</span> << std::endl;</span><br><span class="line"> };</span><br><span class="line">};</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"> std::vector<Test> v1;</span><br><span class="line"> v1.<span class="built_in">reserve</span>(<span class="number">2</span>);</span><br><span class="line"> v1.<span class="built_in">emplace_back</span>(<span class="number">1</span>, <span class="number">2</span>); <span class="comment">// 正确</span></span><br><span class="line"> v1.<span class="built_in">emplace_back</span>(<span class="built_in">Test</span>(<span class="number">3</span>, <span class="number">4</span>)); <span class="comment">// 错误,多走一次移动构造</span></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>但是当 <code>vector</code> 内元素不是指针时:对于数据拷贝开销较大的对象,移动构造函数必须标注 <code>noexcept</code> 关键字,否则扩容时会走拷贝构造带来开销。因为当 <code>push_back、insert、reserve、resize</code> 等函数导致内存重分配时,或当 <code>insert、erase</code> 导致元素位置移动时,<code>vector</code> 会试图把元素“移动”到新的内存区域。<code>vector</code> 通常保证强异常安全性,如果元素类型没有提供一个保证不抛异常的移动构造函数,<code>vector</code> 通常会使用拷贝构造函数。因此,对于拷贝代价较高的自定义元素类型,我们应当定义移动构造函数,并标其为 <code>noexcept</code>。额外的,上面的代码之中:如果我提供了移动构造函数而没有手动提供拷贝构造函数,那后者自动被禁用。</p><h1 id="可变参数模板"><a href="#可变参数模板" class="headerlink" title="可变参数模板"></a>可变参数模板</h1><p>也许已经看到了,完美转发通常会配合模板一起使用。我对模板的认知仅限于以下简单的函数:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">template</span> <<span class="keyword">typename</span> T></span><br><span class="line"><span class="function">T <span class="title">add</span><span class="params">(<span class="type">const</span> T& a, <span class="type">const</span> T& b)</span> </span>{</span><br><span class="line"> <span class="keyword">return</span> a + b;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="type">int</span> result1 = <span class="built_in">add</span>(<span class="number">1</span>, <span class="number">2</span>); <span class="comment">// 实例化为 int 类型的 add 函数</span></span><br><span class="line"> <span class="type">double</span> result2 = <span class="built_in">add</span>(<span class="number">1.5</span>, <span class="number">2.5</span>); <span class="comment">// 实例化为 double 类型的 add 函数</span></span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>那么在文章末尾,将学习一些模板的入门用法:可变参数模板。来看一个代码例子:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><iostream></span> </span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">template</span><<span class="keyword">typename</span> T> </span></span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">print_sum</span><span class="params">(T a, T b)</span> </span>{ </span><br><span class="line"> std::cout << <span class="string">" a + b = "</span> << a + b << std::endl; </span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">template</span> <<span class="keyword">typename</span> Func, <span class="keyword">typename</span>... Args> </span><br><span class="line"><span class="function"><span class="keyword">auto</span> <span class="title">perfect_forward</span><span class="params">(Func&& func, Args&&... args)</span> </span>{ </span><br><span class="line"> <span class="keyword">return</span> <span class="built_in">func</span>(std::forward<Args>(args)...); </span><br><span class="line">} </span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span> </span>{ </span><br><span class="line"> <span class="built_in">perfect_forward</span>(print_sum<<span class="type">int</span>>, <span class="number">2</span>, <span class="number">3</span>); </span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>写到这里,感觉代码难度忽然有所提升,主要的难点是perfect_forward 这个函数,而他常出现在各种线程池中或者作为中间层被调用,还是有必要来学习一下。</p><h2 id="函数相关"><a href="#函数相关" class="headerlink" title="函数相关"></a>函数相关</h2><p>首先使用 <code>Func&& func</code> 以万能引用的形式来接收一个函数,这个在上一节介绍过。</p><p>使用 <code>func(std::forward<Args>(args)...)</code> 来调用函数,并获取返回值。其中,<code>std::forward<Args>(args)...</code> 是可变参数模板,能接收任意长度的参数。在这里的意思就是将函数的参数,也就是 2 和 3 以完美转发的形式传递给 <code>func</code> 函数,执行后获取返回值。那么接下来看一看 <code>args...</code> 到底是个什么。</p><h2 id="参数包"><a href="#参数包" class="headerlink" title="参数包"></a>参数包</h2><p>可变参数模板和普通模板在语义上是一样的,但是在写法上有所区别:在 <code>typename</code> 后面添加省略号:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">template</span><<span class="keyword">typename</span> ... Args></span><br></pre></td></tr></table></figure><p>这就表示 <code>Args</code> 是一个模板参数包,其中可能包含了 0 个或者多个模板参数。而随后的 <code>Args&&... args</code> 就是函数参数包,以万能引用的形式来接收参数。看一个简单的例子:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><iostream></span> </span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">template</span><<span class="keyword">typename</span> ... Args> </span></span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">func</span><span class="params">(Args ... args)</span> </span>{</span><br><span class="line"> std::cout << <span class="keyword">sizeof</span>...(args) << std::endl; </span><br><span class="line">} </span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span> </span>{ </span><br><span class="line"> <span class="built_in">func</span>(); <span class="comment">// 0 </span></span><br><span class="line"> <span class="built_in">func</span>(<span class="number">1</span>); <span class="comment">// 1 </span></span><br><span class="line"> <span class="built_in">func</span>(<span class="number">1</span>, <span class="number">2</span>, <span class="number">3</span>, <span class="number">4</span>); <span class="comment">// 4 </span></span><br><span class="line"> <span class="built_in">func</span>(<span class="number">2</span>, <span class="string">"test"</span>); <span class="comment">// 2 </span></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>; </span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>另外,<code>...</code> 可以接受 0 个或者任意数量的参数,但是可以通过添加一个额外的类型参数,强制模板必须接受一个参数:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">template</span><<span class="keyword">typename</span> Head, <span class="keyword">typename</span> ... Args> </span></span><br><span class="line"><span class="function"></span></span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">func</span><span class="params">(Head h, Args ... args)</span> </span>{ </span><br><span class="line"> std::cout << <span class="keyword">sizeof</span>...(args) << std::endl; </span><br><span class="line">} </span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="built_in">func</span>(<span class="number">1</span>); <span class="comment">// 0 </span></span><br><span class="line"> <span class="built_in">func</span>(<span class="number">1</span>, <span class="number">2</span>, <span class="number">3</span>, <span class="number">4</span>); <span class="comment">// 3 </span></span><br><span class="line"> <span class="built_in">func</span>(<span class="number">2</span>, <span class="string">"test"</span>); <span class="comment">// 1</span></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>; </span><br><span class="line">}</span><br></pre></td></tr></table></figure><h2 id="参数包展开"><a href="#参数包展开" class="headerlink" title="参数包展开"></a>参数包展开</h2><h3 id="递归展开"><a href="#递归展开" class="headerlink" title="递归展开"></a>递归展开</h3><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><iostream></span> </span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">template</span><<span class="keyword">typename</span> T> </span></span><br><span class="line"><span class="function">T <span class="title">sum</span><span class="params">(T val)</span> </span>{ </span><br><span class="line"> <span class="keyword">return</span> val; </span><br><span class="line">} </span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">template</span><<span class="keyword">typename</span> T, <span class="keyword">typename</span> ... Args> </span></span><br><span class="line"><span class="function">T <span class="title">sum</span><span class="params">(T first, Args ... args)</span> </span>{ </span><br><span class="line"> <span class="keyword">return</span> first + <span class="built_in">sum</span><T>(args...); </span><br><span class="line">} </span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span> </span>{ </span><br><span class="line"> <span class="type">int</span> v = <span class="built_in">sum</span>(<span class="number">1</span>, <span class="number">2</span>, <span class="number">3</span>); </span><br><span class="line"> std::cout << v << std::endl; </span><br><span class="line"> </span><br><span class="line"> v = <span class="built_in">sum</span>(<span class="number">1</span>, <span class="number">2</span>, <span class="number">3</span>, <span class="number">4</span>, <span class="number">5</span>); </span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>在递归体函数中,我们将函数参数包的首个元素取出来,参数包 <code>Args...</code> 在展开的过程中递归调用自己,每调用一次参数包中的参数就会少一个,直到所有的参数都展开为止,当没有参数时,则调用非模板函数 <code>sum</code> 终止递归过程。可以通过这种方式实现一个简单的打印多组内容的日志函数:</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><iostream></span> </span></span><br><span class="line"></span><br><span class="line"><span class="keyword">template</span> <<span class="keyword">class</span> <span class="title class_">T</span>> </span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">log</span><span class="params">(T t)</span> </span>{ </span><br><span class="line"> std::cout << t << std::endl; </span><br><span class="line">} </span><br><span class="line"></span><br><span class="line"><span class="keyword">template</span> <<span class="keyword">typename</span> T, <span class="keyword">typename</span> ... Args> </span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">log</span><span class="params">(T first, Args... args)</span> </span>{ </span><br><span class="line"> std::cout << first << <span class="string">" "</span>; </span><br><span class="line"> <span class="built_in">log</span>(args...); </span><br><span class="line">} </span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span> </span>{ </span><br><span class="line"> <span class="built_in">log</span>(<span class="string">"[warning]"</span>, <span class="string">"some thing wrong"</span>); </span><br><span class="line"> <span class="built_in">log</span>(<span class="string">"[ error]"</span>, <span class="string">"some thing fatal"</span>); </span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>; </span><br><span class="line">}</span><br></pre></td></tr></table></figure><h3 id="逗号表达式展开"><a href="#逗号表达式展开" class="headerlink" title="逗号表达式展开"></a>逗号表达式展开</h3><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><iostream></span> </span></span><br><span class="line"></span><br><span class="line"><span class="keyword">template</span> <<span class="keyword">class</span> <span class="title class_">T</span>> </span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">printarg</span><span class="params">(T t)</span> </span>{ </span><br><span class="line"> std::cout << t << std::endl; </span><br><span class="line">} </span><br><span class="line"></span><br><span class="line"><span class="keyword">template</span> <<span class="keyword">class</span> ...Args> </span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">expand</span><span class="params">(Args... args)</span> </span>{ </span><br><span class="line"> <span class="type">int</span> arr[] = {(<span class="built_in">printarg</span>(args), <span class="number">0</span>)...}; </span><br><span class="line">} </span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span> </span>{ </span><br><span class="line"> <span class="built_in">expand</span>(<span class="number">1</span>, <span class="number">2</span>, <span class="number">3</span>, <span class="number">4</span>); </span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>; </span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>这种展开参数包的方式,不需要通过递归终止函数,是直接在 <code>expand</code> 函数体中展开的。<code>printarg</code> 不是一个递归终止函数,只是一个处理参数包中每一个参数的函数。这种就地展开参数包的方式实现的关键是逗号表达式。我们知道逗号表达式会按顺序执行逗号前面的表达式,比如:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">d = (a = b, c); </span><br></pre></td></tr></table></figure><p>这个表达式会按顺序执行:<code>b</code> 会先赋值给 <code>a</code>,接着括号中的逗号表达式返回 <code>c</code> 的值,因此 <code>d</code> 将等于 <code>c</code>。</p><p><code>expand</code> 函数中的逗号表达式:<code>(printarg(args), 0)</code>,也是按照这个执行顺序,先执行 <code>printarg(args)</code>,再得到逗号表达式的结果 <code>0</code>。同时还用到了 <code>C++11</code> 的另外一个特性——初始化列表,通过初始化列表来初始化一个变长数组, <code>{(printarg(args), 0)...}</code> 将会展开成:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">((printarg(arg1),<span class="number">0</span>), (printarg(arg2),<span class="number">0</span>), (printarg(arg3),<span class="number">0</span>)), ...</span><br></pre></td></tr></table></figure><p>最终会创建一个元素值都为 <code>0</code> 的数组 <code>int arr[sizeof...(Args)]</code>。由于是逗号表达式,在创建数组的过程中会先执行逗号表达式前面的部分 <code>printarg(args)</code> 打印出参数,也就是说在构造 <code>int</code> 数组的过程中就将参数包展开了,这个数组的目的纯粹是为了在数组构造的过程展开参数包。</p><ul><li>递归包扩展方式:<ul><li>优点:实现更加灵活,我们可以针对递归终止条件进行不同于递归体函数的操作</li><li>缺点:递归函数会反复压栈弹栈,因此运行时会消耗更多资源</li></ul></li></ul><p>若递归终止条件没有声明在递归体的作用域内,则会导致无限循环(不过所幸的是编译器可以检查出这样的问题)。</p><ul><li>逗号表达式扩展方式:<ul><li>优点:执行的效率高于递归的方式;</li><li>缺点:只能适用于对参数包中的每一个参数都执行相同操作的场景;</li></ul></li></ul><p>浪费了一部分的内存空间,构造出来的初始化列表没有任何作用。</p><h1 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h1><ul><li>完美转发:<a href="https://www.jianshu.com/p/af7c2314e2dc">https://www.jianshu.com/p/af7c2314e2dc</a></li><li>引用折叠:<a href="https://www.zhihu.com/question/40346748">https://www.zhihu.com/question/40346748</a></li></ul>]]></content>
<summary type="html"><p>C++ 漫游的第一部分,起因源于项目中错误的使用 std::ref 和 std::fowrad 导致了一些神奇的 bug。而 std::ref 又涉及到了引用,左右值引用又会联想到移动语义,std::forward 又常用于模板。所以以此为契机,不如仔细学习一下 C++ 中的新特性。</p></summary>
<category term="Cpp" scheme="https://muyuuuu.github.io/tags/Cpp/"/>
</entry>
<entry>
<title>任务流水:加快程序运行和减少内存占用我全都要</title>
<link href="https://muyuuuu.github.io/2024/05/07/multi-pipeline/"/>
<id>https://muyuuuu.github.io/2024/05/07/multi-pipeline/</id>
<published>2024-05-07T15:54:38.000Z</published>
<updated>2024-05-18T07:18:59.255Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" class="aplayer-secondary-style-marker" href="/assets/css/APlayer.min.css"><script src="/assets/js/APlayer.min.js" class="aplayer-secondary-script-marker"></script><p>2018 年计算机组成原理的大作业,五级流水不会写,三级流水写不出来。竟然没想不到多年后还会用到多级流水的思路去设计代码。</p><span id="more"></span><h1 id="Python-线程池"><a href="#Python-线程池" class="headerlink" title="Python 线程池"></a>Python 线程池</h1><p>在介绍多级流水之前,先简单介绍下 <code>Python</code> 线程池的使用:在通过线程池提交任务后,可以调用 <code>result()</code> 方法等待任务执行结束。该方法会阻塞当前线程,直到任务执行结束并返回结果,任务没有返回值时 <code>result()</code> 将获取 <code>None</code>。下面是一个简单的例子。</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> concurrent.futures </span><br><span class="line"><span class="keyword">def</span> <span class="title function_">task</span>(<span class="params">x</span>): </span><br><span class="line"> <span class="comment"># return x * x </span></span><br><span class="line"> <span class="built_in">print</span>(<span class="string">"1"</span>) </span><br><span class="line"> </span><br><span class="line"><span class="comment"># 创建 </span></span><br><span class="line">executor = concurrent.futures.ThreadPoolExecutor(max_workers=<span class="number">1</span>) </span><br><span class="line"><span class="comment"># 提交 </span></span><br><span class="line">wait_token = executor.submit(task, <span class="number">7</span>) </span><br><span class="line"><span class="comment"># 等待结束 </span></span><br><span class="line"><span class="built_in">print</span>(wait_token.result())</span><br></pre></td></tr></table></figure><h1 id="多级流水"><a href="#多级流水" class="headerlink" title="多级流水"></a>多级流水</h1><h2 id="适用场景"><a href="#适用场景" class="headerlink" title="适用场景"></a>适用场景</h2><p>多级流水的核心作用是:通过异步调用来加速代码的执行,和多线程相比只需要更少的内存。尤其适用于以下场景:需要多次的顺序执行若干任务。假设此时有三个任务 1 2 3 需要循环执行 100 次。任务 1 从外界读取输入,而任务 2 的输入是任务 1 的输出,任务 3 的输入是任务 2 的输出,有明显的顺序依赖。</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">for</span> _ <span class="keyword">in</span> <span class="built_in">range</span>(<span class="number">100</span>): </span><br><span class="line"> val = read() </span><br><span class="line"> val = Task1(val) <span class="comment"># IO 任务, 0.1s </span></span><br><span class="line"> val = Task2(val) <span class="comment"># 计算任务, 0.2s,且申请大内存 </span></span><br><span class="line"> val = Task3(val) <span class="comment"># IO 任务, 0.1s </span></span><br><span class="line"> write(val)</span><br></pre></td></tr></table></figure><p>假设任务 1 3 均为 IO 任务,耗时 0.1 ms,任务 2 为计算任务,需要开辟很大的内存,耗时 0.2ms。如果是多线程加速的方式,因为存在明显的数据依赖,会将 1,2,3 视为一个整体进行处理。如前 50 个任务放到一个线程执行,后 50 个任务在另一个线程执行。需要的时间为 50 * (0.1 + 0.2 + 0.1) = 20s。但此时存在<strong>潜在风险</strong>:如果两个线程同时执行任务 2 ,会开辟两块的大内存空间。我用 <code>python</code> 代码搭建了一个具体的例子:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> time </span><br><span class="line"><span class="keyword">import</span> concurrent.futures </span><br><span class="line"></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">task1</span>(<span class="params">in_data, idx</span>): </span><br><span class="line"> in_data[idx] += <span class="number">1</span> </span><br><span class="line"> time.sleep(<span class="number">0.1</span>) </span><br><span class="line"> </span><br><span class="line"><span class="keyword">def</span> <span class="title function_">task2</span>(<span class="params">in_data, idx</span>): </span><br><span class="line"> in_data[idx] *= <span class="number">2</span> </span><br><span class="line"> time.sleep(<span class="number">0.2</span>) </span><br><span class="line"> </span><br><span class="line"><span class="keyword">def</span> <span class="title function_">task3</span>(<span class="params">in_data, idx</span>): </span><br><span class="line"> in_data[idx] -= <span class="number">1</span> </span><br><span class="line"> time.sleep(<span class="number">0.1</span>) </span><br><span class="line"> </span><br><span class="line"><span class="keyword">def</span> <span class="title function_">serial</span>(<span class="params">datas</span>): </span><br><span class="line"> start = time.time() </span><br><span class="line"> <span class="keyword">for</span> i <span class="keyword">in</span> <span class="built_in">range</span>(<span class="built_in">len</span>(datas)): </span><br><span class="line"> task1(datas, i) </span><br><span class="line"> task2(datas, i) </span><br><span class="line"> task3(datas, i) </span><br><span class="line"> end = time.time() </span><br><span class="line"> <span class="built_in">print</span>(<span class="string">" Serial Cost Time: {}"</span>.<span class="built_in">format</span>(end - start))</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">"__main__"</span>: </span><br><span class="line"> n_data_serial = [i <span class="keyword">for</span> i <span class="keyword">in</span> <span class="built_in">range</span>(<span class="number">100</span>)] </span><br><span class="line"> serial(n_data_serial) </span><br></pre></td></tr></table></figure><h2 id="多级流水-1"><a href="#多级流水-1" class="headerlink" title="多级流水"></a>多级流水</h2><p>通过异步调用来实现任务流水的方式,将任务 1 和任务 3 异步执行,在执行任务 2 的时同时完成任务 1 和任务 3 的 IO 处理。如下图所示,虚线框表示为异步执行,实线框为同步执行,相同的颜色区域表示存在数据依赖。</p><p><img data-src="https://s21.ax1x.com/2024/05/18/pkuDePJ.png" alt></p><p>程序如下所示,流水的时间为:100 * (0.1 + 0.2 + 0.1) / 2 = 20s,且不存在同时执行两个任务 2 的情况,所以<strong>所需的峰值内存理论上是多线程的一半</strong>。</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">parall</span>(<span class="params">datas</span>): </span><br><span class="line"> start = time.time() </span><br><span class="line"> n_len = <span class="built_in">len</span>(datas) </span><br><span class="line"> executor = concurrent.futures.ThreadPoolExecutor(max_workers=<span class="number">1</span>) </span><br><span class="line"> wait_token = <span class="literal">None</span> </span><br><span class="line"> <span class="keyword">for</span> i <span class="keyword">in</span> <span class="built_in">range</span>(n_len): </span><br><span class="line"> <span class="keyword">if</span> <span class="number">0</span> == i: </span><br><span class="line"> task1(datas, i) </span><br><span class="line"> wait_token = executor.submit(task1, datas, i + <span class="number">1</span>) </span><br><span class="line"> task2(datas, i) </span><br><span class="line"> <span class="keyword">elif</span> i == n_len - <span class="number">1</span>: </span><br><span class="line"> wait_token.result() </span><br><span class="line"> wait_token = executor.submit(task3, datas, i - <span class="number">1</span>) </span><br><span class="line"> task2(datas, i) </span><br><span class="line"> wait_token.result() </span><br><span class="line"> task3(datas, i) </span><br><span class="line"> <span class="keyword">else</span>: </span><br><span class="line"> wait_token.result() </span><br><span class="line"> wait_token = executor.submit(task31, datas, i - <span class="number">1</span>, i + <span class="number">1</span>) </span><br><span class="line"> task2(datas, i) </span><br><span class="line"> end_time = time.time()</span><br><span class="line"> <span class="built_in">print</span>(<span class="string">" Parallel Cost Time: {}"</span>.<span class="built_in">format</span>(end - start))</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">"__main__"</span>: </span><br><span class="line"> n_data_serial = [i <span class="keyword">for</span> i <span class="keyword">in</span> <span class="built_in">range</span>(<span class="number">100</span>)] </span><br><span class="line"> n_data_parall = [i <span class="keyword">for</span> i <span class="keyword">in</span> <span class="built_in">range</span>(<span class="number">100</span>)] </span><br><span class="line"> </span><br><span class="line"> serial(n_data_serial) </span><br><span class="line"> parall(n_data_parall) </span><br><span class="line"> </span><br><span class="line"> <span class="built_in">print</span>(<span class="string">" Compare Res : {}"</span>.<span class="built_in">format</span>(n_data_serial == n_data_parall))</span><br></pre></td></tr></table></figure>]]></content>
<summary type="html"><p>2018 年计算机组成原理的大作业,五级流水不会写,三级流水写不出来。竟然没想不到多年后还会用到多级流水的思路去设计代码。</p></summary>
<category term="Design" scheme="https://muyuuuu.github.io/tags/Design/"/>
</entry>
<entry>
<title>移动端算法优化</title>
<link href="https://muyuuuu.github.io/2024/03/03/mobile-algorithm-optimize/"/>
<id>https://muyuuuu.github.io/2024/03/03/mobile-algorithm-optimize/</id>
<published>2024-03-03T07:03:22.000Z</published>
<updated>2024-03-03T07:54:05.583Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" class="aplayer-secondary-style-marker" href="/assets/css/APlayer.min.css"><script src="/assets/js/APlayer.min.js" class="aplayer-secondary-script-marker"></script><p>移动端算法优化是个很庞大的话题。从计算机体系到指令,涉及到非常广而深的东西。本文尝试以常见的算法为例,阐述算法在单线程场景下的加速与优化,<del>多线程是最后的收尾,没啥可说的</del>。而至于具体的场景,如金字塔、滤波、降噪等,优化的思路都是相同的:减少 IO,一次 IO 完成尽可能多的计算。</p><p>本文会使用 <code>Neon, OpenCL</code> 来优化算法,如果有可能也会引入 <code>DSP</code>。本文持续更新,整理算法优化相关的经验。额外的,确保打开了 <code>O3</code> 编译选项,打开 <code>release</code> 模式等,否则会影响算法的执行时间。</p><span id="more"></span><h1 id="矩阵乘法"><a href="#矩阵乘法" class="headerlink" title="矩阵乘法"></a>矩阵乘法</h1><p>注:本文不考虑数学角度的优化,如修改计算公式得到相同结果什么的。实现的浮点矩阵计算为:</p><script type="math/tex; mode=display">C = A * B + \text{bias}</script><p>简单起见,$A$ 的维度为 $512\times 128$,矩阵 $B$ 的维度为 $128 \times 256$。在高通骁龙某芯片上,目前的加速结果如下:</p><div class="table-container"><table><thead><tr><th>版本</th><th>时间</th></tr></thead><tbody><tr><td>常规矩阵乘法</td><td>59.84ms</td></tr><tr><td>Neon 加速版本 1</td><td>12.90 ms</td></tr><tr><td>Neon 加速版本 2</td><td>3.85ms</td></tr><tr><td>Cache 友好的矩阵乘法</td><td>2.52ms</td></tr><tr><td>Neon 加速版本 3</td><td>2.77ms</td></tr><tr><td>Neon 加速版本 4</td><td>2.01ms</td></tr><tr><td>Neon 加速版本 5</td><td>1.09ms</td></tr></tbody></table></div><p><del>为什么没 OpenCL?因为还没来得及写,仿佛欠着好多博客。</del></p><h2 id="常规矩阵乘法"><a href="#常规矩阵乘法" class="headerlink" title="常规矩阵乘法"></a>常规矩阵乘法</h2><p><img data-src="https://s11.ax1x.com/2024/03/03/pFBMfR1.png" alt></p><p>以线性代数中的矩阵乘法为例,目标矩阵的第 $i, j$ 个元素是矩阵 $A$ 的第 $i$ 行和矩阵 $B$ 的第 $j$ 列逐元素相乘相加的结果。根据这一原理写出最直观的代码,耗时 59.84ms:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">void</span> <span class="title function_">sgemm_c</span><span class="params">(<span class="type">float</span> *C, <span class="type">float</span> *A, <span class="type">float</span> *B, <span class="type">float</span> *bias, <span class="type">int</span> d0, <span class="type">int</span> d1, <span class="type">int</span> d2)</span></span><br><span class="line">{</span><br><span class="line"> <span class="type">int</span> row, col, m;</span><br><span class="line"> <span class="keyword">for</span> (row = <span class="number">0</span>; row < d0; row++) {</span><br><span class="line"> <span class="keyword">for</span> (col = <span class="number">0</span>; col < d2; col++) {</span><br><span class="line"> <span class="keyword">for</span> (m = <span class="number">0</span>; m < d1; m++) {</span><br><span class="line"> C[row * d2 + col] += A[row * d1 + m] * B[m * d2 + col];</span><br><span class="line"> }</span><br><span class="line"> C[row * d2 + col] += bias[row * d2 + col];</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>我们知道矩阵在计算机中是行朱序存储的,即访问矩阵 $B[i, j]$ 时,会将 $B[i, j+1], B[i, j+2],…$ 等元素也一同取到内存的 <code>cache</code> 中。当需要 $B[i, j+1]$ 时就从 <code>cache</code> 中读取而不是去内存读取,这样会节省很多时间。</p><p>所以上述代码的性能瓶颈在于:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">for</span> (m = <span class="number">0</span>; m < d1; m++) {</span><br><span class="line"> C[row * d2 + col] += A[row * d1 + m] * B[m * d2 + col];</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>由于最内层的循环中 <code>m</code> 逐渐增加,矩阵 $B$ 的寻址方式为跳行寻址。在我们看不见的地方,<code>cache</code> 缓存的数据无法使用,每次读取 $B$ 矩阵的元素时还需要刷新 <code>cache</code>,这就导致这份代码很耗时。</p><h2 id="Neon-加速版本-1"><a href="#Neon-加速版本-1" class="headerlink" title="Neon 加速版本 1"></a>Neon 加速版本 1</h2><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">void</span> <span class="title function_">sgemm_neon1</span><span class="params">(<span class="type">float</span> *C, <span class="type">float</span> *A, <span class="type">float</span> *B, <span class="type">float</span> *bias, <span class="type">int</span> d0, <span class="type">int</span> d1, <span class="type">int</span> d2)</span></span><br><span class="line">{</span><br><span class="line"> <span class="type">int</span> row, col, m;</span><br><span class="line"> <span class="keyword">for</span> (row = <span class="number">0</span>; row < d0; row++) {</span><br><span class="line"> <span class="keyword">for</span> (col = <span class="number">0</span>; col < d2; col+=<span class="number">4</span>) {</span><br><span class="line"> <span class="type">float32x4_t</span> sum4 = vdupq_n_f32(<span class="number">0.0f</span>);</span><br><span class="line"></span><br><span class="line"> <span class="type">float</span> *pa = A + row * d1;</span><br><span class="line"> <span class="type">float</span> *pb = B + col;</span><br><span class="line"> <span class="type">float</span> *pc = C + row * d2 + col;</span><br><span class="line"> <span class="type">float</span> *pd = bias + row * d2 + col;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> (m = <span class="number">0</span>; m < d1; m+=<span class="number">4</span>) {</span><br><span class="line"> <span class="type">float32x4_t</span> a4 = vld1q_f32(pa);</span><br><span class="line"> <span class="type">float32x4_t</span> b0 = vld1q_f32(pb + <span class="number">0</span> * d2);</span><br><span class="line"> <span class="type">float32x4_t</span> b1 = vld1q_f32(pb + <span class="number">1</span> * d2);</span><br><span class="line"> <span class="type">float32x4_t</span> b2 = vld1q_f32(pb + <span class="number">2</span> * d2);</span><br><span class="line"> <span class="type">float32x4_t</span> b3 = vld1q_f32(pb + <span class="number">3</span> * d2);</span><br><span class="line"></span><br><span class="line"> sum4 = vmlaq_lane_f32(sum4, b0, vget_low_f32(a4), <span class="number">0</span>);</span><br><span class="line"> sum4 = vmlaq_lane_f32(sum4, b1, vget_low_f32(a4), <span class="number">1</span>);</span><br><span class="line"> sum4 = vmlaq_lane_f32(sum4, b2, vget_high_f32(a4), <span class="number">0</span>);</span><br><span class="line"> sum4 = vmlaq_lane_f32(sum4, b3, vget_high_f32(a4), <span class="number">1</span>);</span><br><span class="line"></span><br><span class="line"> pa += <span class="number">4</span>;</span><br><span class="line"> pb += <span class="number">4</span> * d2;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="type">float32x4_t</span> d4 = vld1q_f32(pd);</span><br><span class="line"> sum4 = vaddq_f32(sum4, d4);</span><br><span class="line"> vst1q_f32(pc, sum4);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h2 id="Neon-加速版本-2"><a href="#Neon-加速版本-2" class="headerlink" title="Neon 加速版本 2"></a>Neon 加速版本 2</h2><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">void</span> <span class="title function_">sgemm_neon2</span><span class="params">(<span class="type">float</span> *C, <span class="type">float</span> *A, <span class="type">float</span> *B, <span class="type">float</span> *bias, <span class="type">int</span> d0, <span class="type">int</span> d1, <span class="type">int</span> d2)</span></span><br><span class="line">{</span><br><span class="line"> <span class="type">int</span> row, col, m;</span><br><span class="line"> <span class="keyword">for</span> (row = <span class="number">0</span>; row < d0; row+=<span class="number">4</span>) {</span><br><span class="line"> <span class="keyword">for</span> (col = <span class="number">0</span>; col < d2; col+=<span class="number">4</span>) {</span><br><span class="line"></span><br><span class="line"> <span class="type">float</span> *pa = A + row * d1;</span><br><span class="line"> <span class="type">float</span> *pb = B + col;</span><br><span class="line"> <span class="type">float</span> *pc = C + row * d2 + col;</span><br><span class="line"> <span class="type">float</span> *pd = bias + row * d2 + col;</span><br><span class="line"></span><br><span class="line"> <span class="type">float32x4_t</span> sum0 = vld1q_f32(pd + <span class="number">0</span> * d2);</span><br><span class="line"> <span class="type">float32x4_t</span> sum1 = vld1q_f32(pd + <span class="number">1</span> * d2);</span><br><span class="line"> <span class="type">float32x4_t</span> sum2 = vld1q_f32(pd + <span class="number">2</span> * d2);</span><br><span class="line"> <span class="type">float32x4_t</span> sum3 = vld1q_f32(pd + <span class="number">3</span> * d2);</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> (m = <span class="number">0</span>; m < d1; m+=<span class="number">4</span>) {</span><br><span class="line"> <span class="type">float32x4_t</span> b0 = vld1q_f32(pb + <span class="number">0</span> * d2);</span><br><span class="line"> <span class="type">float32x4_t</span> b1 = vld1q_f32(pb + <span class="number">1</span> * d2);</span><br><span class="line"> <span class="type">float32x4_t</span> b2 = vld1q_f32(pb + <span class="number">2</span> * d2);</span><br><span class="line"> <span class="type">float32x4_t</span> b3 = vld1q_f32(pb + <span class="number">3</span> * d2);</span><br><span class="line"></span><br><span class="line"> <span class="type">float32x4_t</span> a0 = vld1q_f32(pa + <span class="number">0</span> * d1);</span><br><span class="line"> <span class="type">float32x4_t</span> a1 = vld1q_f32(pa + <span class="number">1</span> * d1);</span><br><span class="line"> <span class="type">float32x4_t</span> a2 = vld1q_f32(pa + <span class="number">2</span> * d1);</span><br><span class="line"> <span class="type">float32x4_t</span> a3 = vld1q_f32(pa + <span class="number">3</span> * d1);</span><br><span class="line"></span><br><span class="line"> sum0 = vmlaq_lane_f32(sum0, b0, vget_low_f32(a0), <span class="number">0</span>);</span><br><span class="line"> sum0 = vmlaq_lane_f32(sum0, b1, vget_low_f32(a0), <span class="number">1</span>);</span><br><span class="line"> sum0 = vmlaq_lane_f32(sum0, b2, vget_high_f32(a0), <span class="number">0</span>);</span><br><span class="line"> sum0 = vmlaq_lane_f32(sum0, b3, vget_high_f32(a0), <span class="number">1</span>);</span><br><span class="line"></span><br><span class="line"> sum1 = vmlaq_lane_f32(sum1, b0, vget_low_f32(a1), <span class="number">0</span>);</span><br><span class="line"> sum1 = vmlaq_lane_f32(sum1, b1, vget_low_f32(a1), <span class="number">1</span>);</span><br><span class="line"> sum1 = vmlaq_lane_f32(sum1, b2, vget_high_f32(a1), <span class="number">0</span>);</span><br><span class="line"> sum1 = vmlaq_lane_f32(sum1, b3, vget_high_f32(a1), <span class="number">1</span>);</span><br><span class="line"></span><br><span class="line"> sum2 = vmlaq_lane_f32(sum2, b0, vget_low_f32(a2), <span class="number">0</span>);</span><br><span class="line"> sum2 = vmlaq_lane_f32(sum2, b1, vget_low_f32(a2), <span class="number">1</span>);</span><br><span class="line"> sum2 = vmlaq_lane_f32(sum2, b2, vget_high_f32(a2), <span class="number">0</span>);</span><br><span class="line"> sum2 = vmlaq_lane_f32(sum2, b3, vget_high_f32(a2), <span class="number">1</span>);</span><br><span class="line"></span><br><span class="line"> sum3 = vmlaq_lane_f32(sum3, b0, vget_low_f32(a3), <span class="number">0</span>);</span><br><span class="line"> sum3 = vmlaq_lane_f32(sum3, b1, vget_low_f32(a3), <span class="number">1</span>);</span><br><span class="line"> sum3 = vmlaq_lane_f32(sum3, b2, vget_high_f32(a3), <span class="number">0</span>);</span><br><span class="line"> sum3 = vmlaq_lane_f32(sum3, b3, vget_high_f32(a3), <span class="number">1</span>);</span><br><span class="line"></span><br><span class="line"> pa += <span class="number">4</span>;</span><br><span class="line"> pb += <span class="number">4</span> * d2;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> vst1q_f32(pc + <span class="number">0</span> * d2, sum0);</span><br><span class="line"> vst1q_f32(pc + <span class="number">1</span> * d2, sum1);</span><br><span class="line"> vst1q_f32(pc + <span class="number">2</span> * d2, sum2);</span><br><span class="line"> vst1q_f32(pc + <span class="number">3</span> * d2, sum3);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h2 id="Cache-友好的矩阵乘法"><a href="#Cache-友好的矩阵乘法" class="headerlink" title="Cache 友好的矩阵乘法"></a>Cache 友好的矩阵乘法</h2><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">void</span> <span class="title function_">rsgemm_c</span><span class="params">(<span class="type">float</span> *C, <span class="type">float</span> *A, <span class="type">float</span> *B, <span class="type">float</span> *bias, <span class="type">int</span> d0, <span class="type">int</span> d1, <span class="type">int</span> d2)</span></span><br><span class="line">{</span><br><span class="line"> <span class="type">int</span> row, col, m;</span><br><span class="line"> <span class="keyword">for</span>(row = <span class="number">0</span>; row < d0; row++) {</span><br><span class="line"> <span class="keyword">for</span>(m = <span class="number">0</span>; m < d1; m++) {</span><br><span class="line"> <span class="keyword">for</span>(col = <span class="number">0</span>; col < d2; col++) {</span><br><span class="line"> C[row * d2 + col] += A[row * d1 + m] * B[m * d2 + col];</span><br><span class="line"> <span class="keyword">if</span> (<span class="number">0</span> == m) {</span><br><span class="line"> C[row * d2 + col] += bias[row * d2 + col];</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h2 id="Neon-加速版本-3"><a href="#Neon-加速版本-3" class="headerlink" title="Neon 加速版本 3"></a>Neon 加速版本 3</h2><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">void</span> <span class="title function_">rsgemm_neon1</span><span class="params">(<span class="type">float</span> *C, <span class="type">float</span> *A, <span class="type">float</span> *B, <span class="type">float</span> *bias, <span class="type">int</span> d0, <span class="type">int</span> d1, <span class="type">int</span> d2)</span></span><br><span class="line">{</span><br><span class="line"> <span class="type">int</span> row, col, m;</span><br><span class="line"> <span class="keyword">for</span> (row = <span class="number">0</span>; row < d0; row++) {</span><br><span class="line"> <span class="keyword">for</span> (m = <span class="number">0</span>; m < d1; m++) {</span><br><span class="line"></span><br><span class="line"> <span class="type">float32x4_t</span> a4 = vdupq_n_f32(A[row * d1 + m]);</span><br><span class="line"> <span class="type">float</span> *pb = B + m * d2;</span><br><span class="line"> <span class="type">float</span> *pc = C + row * d2;</span><br><span class="line"> <span class="type">float</span> *pd = bias + row * d2;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> (col = <span class="number">0</span>; col < d2; col+=<span class="number">4</span>) {</span><br><span class="line"> <span class="type">float32x4_t</span> b4 = vld1q_f32(pb);</span><br><span class="line"> <span class="type">float32x4_t</span> c4 = vld1q_f32(pc);</span><br><span class="line"> <span class="type">float32x4_t</span> val = vmulq_f32(a4, b4);</span><br><span class="line"> val = vaddq_f32(c4, val);</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (<span class="number">0</span> == m) {</span><br><span class="line"> val = vaddq_f32(vld1q_f32(pd), val);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> vst1q_f32(pc, val);</span><br><span class="line"></span><br><span class="line"> pb += <span class="number">4</span>;</span><br><span class="line"> pc += <span class="number">4</span>;</span><br><span class="line"> pd += <span class="number">4</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h2 id="Neon-加速版本-4"><a href="#Neon-加速版本-4" class="headerlink" title="Neon 加速版本 4"></a>Neon 加速版本 4</h2><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">void</span> <span class="title function_">rsgemm_neon2</span><span class="params">(<span class="type">float</span> *C, <span class="type">float</span> *A, <span class="type">float</span> *B, <span class="type">float</span> *bias, <span class="type">int</span> d0, <span class="type">int</span> d1, <span class="type">int</span> d2)</span></span><br><span class="line">{</span><br><span class="line"> <span class="type">int</span> row, col, m;</span><br><span class="line"> <span class="keyword">for</span> (row = <span class="number">0</span>; row < d0; row++) {</span><br><span class="line"> <span class="keyword">for</span> (m = <span class="number">0</span>; m < d1; m+=<span class="number">4</span>) {</span><br><span class="line"></span><br><span class="line"> <span class="type">float</span> *pb0 = B + (m + <span class="number">0</span>) * d2;</span><br><span class="line"> <span class="type">float</span> *pb1 = B + (m + <span class="number">1</span>) * d2;</span><br><span class="line"> <span class="type">float</span> *pb2 = B + (m + <span class="number">2</span>) * d2;</span><br><span class="line"> <span class="type">float</span> *pb3 = B + (m + <span class="number">3</span>) * d2;</span><br><span class="line"></span><br><span class="line"> <span class="type">float</span> *pc = C + row * d2;</span><br><span class="line"> <span class="type">float</span> *pd = bias + row * d2;</span><br><span class="line"></span><br><span class="line"> <span class="type">float32x4_t</span> a4 = vld1q_f32(A + row * d1 + m);</span><br><span class="line"> <span class="type">float32x4_t</span> a0 = vdupq_n_f32(vgetq_lane_f32(a4, <span class="number">0</span>));</span><br><span class="line"> <span class="type">float32x4_t</span> a1 = vdupq_n_f32(vgetq_lane_f32(a4, <span class="number">1</span>));</span><br><span class="line"> <span class="type">float32x4_t</span> a2 = vdupq_n_f32(vgetq_lane_f32(a4, <span class="number">2</span>));</span><br><span class="line"> <span class="type">float32x4_t</span> a3 = vdupq_n_f32(vgetq_lane_f32(a4, <span class="number">3</span>));</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> (col = <span class="number">0</span>; col < d2; col+=<span class="number">4</span>) {</span><br><span class="line"> <span class="type">float32x4_t</span> c4 = vld1q_f32(pc);</span><br><span class="line"></span><br><span class="line"> c4 = vaddq_f32(c4, vmulq_f32(a0, vld1q_f32(pb0)));</span><br><span class="line"> c4 = vaddq_f32(c4, vmulq_f32(a1, vld1q_f32(pb1)));</span><br><span class="line"> c4 = vaddq_f32(c4, vmulq_f32(a2, vld1q_f32(pb2)));</span><br><span class="line"> c4 = vaddq_f32(c4, vmulq_f32(a3, vld1q_f32(pb3)));</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (<span class="number">0</span> == m) {</span><br><span class="line"> c4 = vaddq_f32(vld1q_f32(pd), c4);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> vst1q_f32(pc, c4);</span><br><span class="line"></span><br><span class="line"> pb0 += <span class="number">4</span>;</span><br><span class="line"> pb1 += <span class="number">4</span>;</span><br><span class="line"> pb2 += <span class="number">4</span>;</span><br><span class="line"> pb3 += <span class="number">4</span>;</span><br><span class="line"></span><br><span class="line"> pc += <span class="number">4</span>;</span><br><span class="line"> pd += <span class="number">4</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h2 id="Neon-加速版本-5"><a href="#Neon-加速版本-5" class="headerlink" title="Neon 加速版本 5"></a>Neon 加速版本 5</h2><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">void</span> <span class="title function_">rsgemm_neon3</span><span class="params">(<span class="type">float</span> *C, <span class="type">float</span> *A, <span class="type">float</span> *B, <span class="type">float</span> *bias, <span class="type">int</span> d0, <span class="type">int</span> d1, <span class="type">int</span> d2)</span></span><br><span class="line">{</span><br><span class="line"> <span class="type">int</span> row, col, m;</span><br><span class="line"> <span class="keyword">for</span> (row = <span class="number">0</span>; row < d0; row+=<span class="number">4</span>) {</span><br><span class="line"> <span class="keyword">for</span> (m = <span class="number">0</span>; m < d1; m+=<span class="number">4</span>) {</span><br><span class="line"></span><br><span class="line"> <span class="type">float</span> *pb0 = B + (m + <span class="number">0</span>) * d2;</span><br><span class="line"> <span class="type">float</span> *pb1 = B + (m + <span class="number">1</span>) * d2;</span><br><span class="line"> <span class="type">float</span> *pb2 = B + (m + <span class="number">2</span>) * d2;</span><br><span class="line"> <span class="type">float</span> *pb3 = B + (m + <span class="number">3</span>) * d2;</span><br><span class="line"></span><br><span class="line"> <span class="type">float</span> *pc0 = C + (<span class="number">0</span> + row) * d2;</span><br><span class="line"> <span class="type">float</span> *pc1 = C + (<span class="number">1</span> + row) * d2;</span><br><span class="line"> <span class="type">float</span> *pc2 = C + (<span class="number">2</span> + row) * d2;</span><br><span class="line"> <span class="type">float</span> *pc3 = C + (<span class="number">3</span> + row) * d2;</span><br><span class="line"></span><br><span class="line"> <span class="type">float</span> *pd0 = bias + (<span class="number">0</span> + row) * d2;</span><br><span class="line"> <span class="type">float</span> *pd1 = bias + (<span class="number">1</span> + row) * d2;</span><br><span class="line"> <span class="type">float</span> *pd2 = bias + (<span class="number">2</span> + row) * d2;</span><br><span class="line"> <span class="type">float</span> *pd3 = bias + (<span class="number">3</span> + row) * d2;</span><br><span class="line"></span><br><span class="line"> <span class="type">float32x4_t</span> a0 = vld1q_f32(A + (row + <span class="number">0</span>) * d1 + m);</span><br><span class="line"> <span class="type">float32x4_t</span> a1 = vld1q_f32(A + (row + <span class="number">1</span>) * d1 + m);</span><br><span class="line"> <span class="type">float32x4_t</span> a2 = vld1q_f32(A + (row + <span class="number">2</span>) * d1 + m);</span><br><span class="line"> <span class="type">float32x4_t</span> a3 = vld1q_f32(A + (row + <span class="number">3</span>) * d1 + m);</span><br><span class="line"></span><br><span class="line"> <span class="type">float32x4_t</span> a00 = vdupq_n_f32(vgetq_lane_f32(a0, <span class="number">0</span>));</span><br><span class="line"> <span class="type">float32x4_t</span> a01 = vdupq_n_f32(vgetq_lane_f32(a0, <span class="number">1</span>));</span><br><span class="line"> <span class="type">float32x4_t</span> a02 = vdupq_n_f32(vgetq_lane_f32(a0, <span class="number">2</span>));</span><br><span class="line"> <span class="type">float32x4_t</span> a03 = vdupq_n_f32(vgetq_lane_f32(a0, <span class="number">3</span>));</span><br><span class="line"></span><br><span class="line"> <span class="type">float32x4_t</span> a10 = vdupq_n_f32(vgetq_lane_f32(a1, <span class="number">0</span>));</span><br><span class="line"> <span class="type">float32x4_t</span> a11 = vdupq_n_f32(vgetq_lane_f32(a1, <span class="number">1</span>));</span><br><span class="line"> <span class="type">float32x4_t</span> a12 = vdupq_n_f32(vgetq_lane_f32(a1, <span class="number">2</span>));</span><br><span class="line"> <span class="type">float32x4_t</span> a13 = vdupq_n_f32(vgetq_lane_f32(a1, <span class="number">3</span>));</span><br><span class="line"></span><br><span class="line"> <span class="type">float32x4_t</span> a20 = vdupq_n_f32(vgetq_lane_f32(a2, <span class="number">0</span>));</span><br><span class="line"> <span class="type">float32x4_t</span> a21 = vdupq_n_f32(vgetq_lane_f32(a2, <span class="number">1</span>));</span><br><span class="line"> <span class="type">float32x4_t</span> a22 = vdupq_n_f32(vgetq_lane_f32(a2, <span class="number">2</span>));</span><br><span class="line"> <span class="type">float32x4_t</span> a23 = vdupq_n_f32(vgetq_lane_f32(a2, <span class="number">3</span>));</span><br><span class="line"></span><br><span class="line"> <span class="type">float32x4_t</span> a30 = vdupq_n_f32(vgetq_lane_f32(a3, <span class="number">0</span>));</span><br><span class="line"> <span class="type">float32x4_t</span> a31 = vdupq_n_f32(vgetq_lane_f32(a3, <span class="number">1</span>));</span><br><span class="line"> <span class="type">float32x4_t</span> a32 = vdupq_n_f32(vgetq_lane_f32(a3, <span class="number">2</span>));</span><br><span class="line"> <span class="type">float32x4_t</span> a33 = vdupq_n_f32(vgetq_lane_f32(a3, <span class="number">3</span>));</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> (col = <span class="number">0</span>; col < d2; col+=<span class="number">4</span>) {</span><br><span class="line"> <span class="type">float32x4_t</span> c04 = vld1q_f32(pc0);</span><br><span class="line"> <span class="type">float32x4_t</span> c14 = vld1q_f32(pc1);</span><br><span class="line"> <span class="type">float32x4_t</span> c24 = vld1q_f32(pc2);</span><br><span class="line"> <span class="type">float32x4_t</span> c34 = vld1q_f32(pc3);</span><br><span class="line"></span><br><span class="line"> <span class="type">float32x4_t</span> b0 = vld1q_f32(pb0);</span><br><span class="line"> <span class="type">float32x4_t</span> b1 = vld1q_f32(pb1);</span><br><span class="line"> <span class="type">float32x4_t</span> b2 = vld1q_f32(pb2);</span><br><span class="line"> <span class="type">float32x4_t</span> b3 = vld1q_f32(pb3);</span><br><span class="line"></span><br><span class="line"> c04 = vaddq_f32(c04, vmulq_f32(a00, b0));</span><br><span class="line"> c04 = vaddq_f32(c04, vmulq_f32(a01, b1));</span><br><span class="line"> c04 = vaddq_f32(c04, vmulq_f32(a02, b2));</span><br><span class="line"> c04 = vaddq_f32(c04, vmulq_f32(a03, b3));</span><br><span class="line"></span><br><span class="line"> c14 = vaddq_f32(c14, vmulq_f32(a10, b0));</span><br><span class="line"> c14 = vaddq_f32(c14, vmulq_f32(a11, b1));</span><br><span class="line"> c14 = vaddq_f32(c14, vmulq_f32(a12, b2));</span><br><span class="line"> c14 = vaddq_f32(c14, vmulq_f32(a13, b3));</span><br><span class="line"></span><br><span class="line"> c24 = vaddq_f32(c24, vmulq_f32(a20, b0));</span><br><span class="line"> c24 = vaddq_f32(c24, vmulq_f32(a21, b1));</span><br><span class="line"> c24 = vaddq_f32(c24, vmulq_f32(a22, b2));</span><br><span class="line"> c24 = vaddq_f32(c24, vmulq_f32(a23, b3));</span><br><span class="line"></span><br><span class="line"> c34 = vaddq_f32(c34, vmulq_f32(a30, b0));</span><br><span class="line"> c34 = vaddq_f32(c34, vmulq_f32(a31, b1));</span><br><span class="line"> c34 = vaddq_f32(c34, vmulq_f32(a32, b2));</span><br><span class="line"> c34 = vaddq_f32(c34, vmulq_f32(a33, b3));</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (<span class="number">0</span> == m) {</span><br><span class="line"> c04 = vaddq_f32(vld1q_f32(pd0), c04);</span><br><span class="line"> c14 = vaddq_f32(vld1q_f32(pd1), c14);</span><br><span class="line"> c24 = vaddq_f32(vld1q_f32(pd2), c24);</span><br><span class="line"> c34 = vaddq_f32(vld1q_f32(pd3), c34);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> vst1q_f32(pc0, c04);</span><br><span class="line"> vst1q_f32(pc1, c14);</span><br><span class="line"> vst1q_f32(pc2, c24);</span><br><span class="line"> vst1q_f32(pc3, c34);</span><br><span class="line"></span><br><span class="line"> pb0 += <span class="number">4</span>;</span><br><span class="line"> pb1 += <span class="number">4</span>;</span><br><span class="line"> pb2 += <span class="number">4</span>;</span><br><span class="line"> pb3 += <span class="number">4</span>;</span><br><span class="line"></span><br><span class="line"> pc0 += <span class="number">4</span>;</span><br><span class="line"> pc1 += <span class="number">4</span>;</span><br><span class="line"> pc2 += <span class="number">4</span>;</span><br><span class="line"> pc3 += <span class="number">4</span>;</span><br><span class="line"></span><br><span class="line"> pd0 += <span class="number">4</span>;</span><br><span class="line"> pd1 += <span class="number">4</span>;</span><br><span class="line"> pd2 += <span class="number">4</span>;</span><br><span class="line"> pd3 += <span class="number">4</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure>]]></content>
<summary type="html"><p>移动端算法优化是个很庞大的话题。从计算机体系到指令,涉及到非常广而深的东西。本文尝试以常见的算法为例,阐述算法在单线程场景下的加速与优化,<del>多线程是最后的收尾,没啥可说的</del>。而至于具体的场景,如金字塔、滤波、降噪等,优化的思路都是相同的:减少 IO,一次 IO 完成尽可能多的计算。</p>
<p>本文会使用 <code>Neon, OpenCL</code> 来优化算法,如果有可能也会引入 <code>DSP</code>。本文持续更新,整理算法优化相关的经验。额外的,确保打开了 <code>O3</code> 编译选项,打开 <code>release</code> 模式等,否则会影响算法的执行时间。</p></summary>
<category term="Algorithm" scheme="https://muyuuuu.github.io/tags/Algorithm/"/>
</entry>
<entry>
<title>C 语言中的黑魔法:宏</title>
<link href="https://muyuuuu.github.io/2024/02/03/define-macro/"/>
<id>https://muyuuuu.github.io/2024/02/03/define-macro/</id>
<published>2024-02-02T17:36:01.000Z</published>
<updated>2024-12-13T15:47:49.093Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" class="aplayer-secondary-style-marker" href="/assets/css/APlayer.min.css"><script src="/assets/js/APlayer.min.js" class="aplayer-secondary-script-marker"></script><p>之前对 <code>C</code> 语言中宏定义的认知十分简单,包括但不限于停留在以下浅薄的层面:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">define</span> PI 3.14</span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> add(a, b) a + b</span></span><br></pre></td></tr></table></figure><p>上述代码完全是大学课本中的用法。但当我看到实际项目中宏的用法后完全是一头雾水,<del>所以自己也要写出那种高逼格让别人看不太懂的代码</del>。宏远远比我想象的要强大,所以本文为每个宏技巧都配备了一个实用场景。</p><ul><li>字符串化操作符,实现一个简单的自动化测试样例</li><li>字符串连接,实现一个具备计时功能的宏</li><li>X 宏,实现根据输入执行不同的函数</li><li>特殊宏 <code>__VA_ARGS__</code>,实现一个简单的日志函数</li></ul><span id="more"></span><h1 id="字符串化操作符"><a href="#字符串化操作符" class="headerlink" title="字符串化操作符"></a>字符串化操作符</h1><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><iostream></span></span></span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> str(a) #a</span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"> std::cout << <span class="built_in">str</span>(FUNC); <span class="comment">// 输出 FUNC</span></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>上述宏 <code>str</code> 通过单井号的形式实现了字符串化操作符,将传入的参数字符串化。</p><h2 id="简单测试框架"><a href="#简单测试框架" class="headerlink" title="简单测试框架"></a>简单测试框架</h2><p>C 语言有一些预定义的宏,比如 <code>__LINE__</code> 表示当前行号,<code>__FILE__</code> 表示当前的文件名。基于这一基础,我们实现一个简单的测试程序。在测试程序时,打印测试用例、文件名、行号、以及是否通过测试。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><stdio.h></span></span></span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> LOG_INFO(format) printf(format)</span></span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> __TO_STR__(x) #x <span class="string">":"</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> __TO_REAL__(x) __TO_STR__(x)</span></span><br><span class="line"><span class="comment">// 文件:行号</span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> __FILE_LINE__ __FILE__ <span class="string">":"</span> __TO_REAL__(__LINE__)</span></span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> CHECK_VAL(val) \</span></span><br><span class="line"><span class="meta"> do { \</span></span><br><span class="line"><span class="meta"> LOG_INFO(__FILE_LINE__ <span class="string">":calling "</span> #val <span class="string">"\n"</span>); \</span></span><br><span class="line"><span class="meta"> <span class="keyword">if</span> (0 == (val)) { \</span></span><br><span class="line"><span class="meta"> LOG_INFO(__FILE_LINE__ <span class="string">":error \n"</span>); \</span></span><br><span class="line"><span class="meta"> goto fail; \</span></span><br><span class="line"><span class="meta"> } <span class="keyword">else</span> { \</span></span><br><span class="line"><span class="meta"> LOG_INFO(__FILE_LINE__ <span class="string">":passed \n"</span>); \</span></span><br><span class="line"><span class="meta"> } \</span></span><br><span class="line"><span class="meta"> } while(0)</span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">test_func</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="keyword">return</span> <span class="number">1</span>;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"></span><br><span class="line"> <span class="type">int</span> n_total = <span class="number">2</span>;</span><br><span class="line"> <span class="type">int</span> n_passed = <span class="number">0</span>;</span><br><span class="line"></span><br><span class="line"> <span class="built_in">CHECK_VAL</span>(<span class="number">1</span> == <span class="built_in">test_func</span>());</span><br><span class="line"> n_passed ++;</span><br><span class="line"></span><br><span class="line"> <span class="built_in">CHECK_VAL</span>(<span class="number">2</span> == <span class="built_in">test_func</span>());</span><br><span class="line"> n_passed ++;</span><br><span class="line"></span><br><span class="line">fail:</span><br><span class="line"></span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"################ summary ###################\n"</span>);</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"passed: %d\n"</span>, n_passed);</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"total: %d\n"</span>, n_total);</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><ul><li><code>#val</code> 会打印测试样例</li><li><code>__FILE_LINE__</code> 会打印当前的文件名和行号</li></ul><p>输出如下:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">demo.cpp:30::calling 1 == test_func()</span><br><span class="line">demo.cpp:30::passed </span><br><span class="line">demo.cpp:33::calling 2 == test_func()</span><br><span class="line">demo.cpp:33::error </span><br><span class="line">################ summary ###################</span><br><span class="line">passed: 1</span><br><span class="line">total: 2</span><br></pre></td></tr></table></figure><h2 id="为什么用-do-while-0-?"><a href="#为什么用-do-while-0-?" class="headerlink" title="为什么用 do-while(0) ?"></a>为什么用 do-while(0) ?</h2><p>当时我看到这一用法也比较疑惑,但 <code>do-while(0)</code> 的用法还是比较常见的。多用于在一个宏定义中出现多条语句的场景中,那我们来分析一下为什么要这么用。如果我们这样定义:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">define</span> SS \</span></span><br><span class="line"><span class="meta"> stmt1; \</span></span><br><span class="line"><span class="meta"> stmt2;</span></span><br></pre></td></tr></table></figure><p>在以下的使用场景中:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> (cond)</span><br><span class="line"> SS;</span><br><span class="line"> stmt3;</span><br></pre></td></tr></table></figure><p>宏展开后,会变成:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> (cond)</span><br><span class="line"> stmt1;</span><br><span class="line"> stmt2;</span><br><span class="line"> ;</span><br><span class="line"> stmt3;</span><br></pre></td></tr></table></figure><p>所以不管 <code>cond</code> 是真是假,<code>stmt2</code> 语句都会执行。而我们自己的意图肯定是,只有 <code>cond</code> 为真的时候,<code>stmt1</code> 和 <code>stmt2</code> 才会执行。那我们给宏加上花括号试一试:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">define</span> SS { \</span></span><br><span class="line"><span class="meta"> stmt1; \</span></span><br><span class="line"><span class="meta"> stmt2; \</span></span><br><span class="line"><span class="meta">}</span></span><br></pre></td></tr></table></figure><p>但是在下面这种情况下,还是会存在一些错误:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> (cond)</span><br><span class="line"> SS;</span><br><span class="line"><span class="keyword">else</span></span><br><span class="line"> stmt3;</span><br></pre></td></tr></table></figure><p>这样宏展开的结果为:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> (cond) { </span><br><span class="line"> stmt1; </span><br><span class="line"> stmt2; </span><br><span class="line">}</span><br><span class="line">;</span><br><span class="line"><span class="keyword">else</span></span><br><span class="line"> stmt3;</span><br></pre></td></tr></table></figure><p>直接导致编译错误,而出错的原因是 <code>else</code> 前面多一个分号。当然也可以在使用 <code>SS</code> 的地方后面不加分号,但是在 C 语言中通常我们习惯性的会在语句后面加一个分号。鉴于上面的这些原因,就有人想出了 <code>do-while(0)</code> 式的用法:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">define</span> SS \</span></span><br><span class="line"><span class="meta"> do { \</span></span><br><span class="line"><span class="meta"> stmt1; \</span></span><br><span class="line"><span class="meta"> stmt2; \</span></span><br><span class="line"><span class="meta"> } while(0)</span></span><br></pre></td></tr></table></figure><h1 id="字符串连接"><a href="#字符串连接" class="headerlink" title="字符串连接"></a>字符串连接</h1><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><iostream></span></span></span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> define_val(tag) \</span></span><br><span class="line"><span class="meta"> int a_##tag = 77</span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="built_in">define_val</span>(MAX);</span><br><span class="line"> std::cout << a_MAX;</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>上面代码的意思是,将 <code>a_</code> 和传入的 <code>tag</code> 连接在一起,意思是:<code>int a_MAX = 77;</code> 的意思。上述代码中完全没有直接出现 <code>a_MAX</code> 这个字符串,但我们依然可以使用。</p><p>这样做的一点点好处是:比如现在有 100 个模块分散在项目的各个角落,需要给各个模块计时统计性能。那么每次都定义起始时间、结束时间,并且计算执行时间,这些操作都是重复的。为了精简重复的操作,我们可以使用这个宏技巧来实现。如下所示的代码,我们把宏放到头文件,用户在引用头文件后,只需要两行代码就可以快速完成对模块的计时功能。</p><h2 id="测试函数执行时间的宏"><a href="#测试函数执行时间的宏" class="headerlink" title="测试函数执行时间的宏"></a>测试函数执行时间的宏</h2><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><stdio.h></span></span></span><br><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><unistd.h></span></span></span><br><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><sys/time.h></span></span></span><br><span class="line"></span><br><span class="line"><span class="keyword">typedef</span> <span class="keyword">struct</span> <span class="title class_">Time</span> {</span><br><span class="line"> <span class="type">double</span> time;</span><br><span class="line">} Time;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">GetTime</span><span class="params">(Time* T)</span> </span>{</span><br><span class="line"> <span class="keyword">struct</span> <span class="title class_">timeval</span> tv;</span><br><span class="line"> <span class="built_in">gettimeofday</span>(&tv, <span class="literal">NULL</span>);</span><br><span class="line"> T->time = (tv.tv_sec * <span class="number">1000.0</span>) + (tv.tv_usec / <span class="number">1000.0</span>);</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> TIME_START(tag) \</span></span><br><span class="line"><span class="meta"> Time tag##_start, tag##_end; \</span></span><br><span class="line"><span class="meta"> do { \</span></span><br><span class="line"><span class="meta"> GetTime(&(tag##_start)); \</span></span><br><span class="line"><span class="meta"> } while(0)</span></span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> TIME_END(tag) \</span></span><br><span class="line"><span class="meta"> do { \</span></span><br><span class="line"><span class="meta"> GetTime(&(tag##_end)); \</span></span><br><span class="line"><span class="meta"> printf(#tag <span class="string">" cost %.2f \n"</span>, tag##_end.time - tag##_start.time); \</span></span><br><span class="line"><span class="meta"> } while(0)</span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">func</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="built_in">usleep</span>(<span class="number">10000</span>);</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 记录开始时间</span></span><br><span class="line"> <span class="built_in">TIME_START</span>(loop_func_20);</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">0</span>; i < <span class="number">20</span>; i++) {</span><br><span class="line"> <span class="built_in">func</span>();</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 记录结束时间</span></span><br><span class="line"> <span class="built_in">TIME_END</span>(loop_func_20);</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>输出如下:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">loop_func_20 cost 202.44ms </span><br></pre></td></tr></table></figure><h1 id="实现泛型"><a href="#实现泛型" class="headerlink" title="实现泛型"></a>实现泛型</h1><p>由于转专业,需要在大二补学大一的课程。在学 <code>C#</code> 的时候,舍友对泛型总结为五个字:「参数化类型」惊艳了连 <code>C</code> 都写不利索的我。就像 <code>C++</code> 的模板一样,为一个函数只写一套代码,把类型看成参数,但是这个函数能支持各个类型。如果 <code>C</code> 要实现泛型,宏定义是必不可少的方案。</p><p>加入此时我们面临一个需求,有一个 <code>matirx</code>,如果数据类型是 <code>u8</code>,那么调用 <code>WriteU8ToFile</code> 函数将数据写到文件;如果数据类型是 <code>u16</code>,那么调用 <code>WriteU16ToFile</code> 函数将数据写出到文件。</p><p>我们大概写一下这俩函数的伪代码:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">typedef</span> <span class="class"><span class="keyword">struct</span> {</span></span><br><span class="line"> ...</span><br><span class="line"> <span class="type">void</span> *data;</span><br><span class="line">} Matrix;</span><br><span class="line"></span><br><span class="line">WriteU8ToFile(Matrix *mat, <span class="type">const</span> <span class="type">char</span> *file) {</span><br><span class="line"> Fopen(file);</span><br><span class="line"></span><br><span class="line"> <span class="type">uint8_t</span> *data = (<span class="type">uint8_t</span> *)(mat->data);</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">0</span>; i < mat->size; i++) {</span><br><span class="line"> FWrite(<span class="string">"%d"</span>, (<span class="type">int</span>)data[i]);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> Fclose(file)</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line">WriteU16ToFile(Matrix *mat, <span class="type">const</span> <span class="type">char</span> *file) {</span><br><span class="line"> Fopen(file);</span><br><span class="line"></span><br><span class="line"> <span class="type">uint16_t</span> *data = (<span class="type">uint16_t</span> *)(mat->data);</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">0</span>; i < mat->size; i++) {</span><br><span class="line"> FWrite(<span class="string">"%d"</span>, (<span class="type">int</span>)data[i]);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> Fclose(file)</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>可以看到,除了函数名中的 <code>U8</code> 和 <code>U16</code>,以及函数体中的 <code>uint8_t</code> 和 <code>uint16_t</code>,其余内容完全一样。能不能像 <code>C++</code> 的模板一样,写一套通用的代码,而不是将代码重复这么多次导致冗余?</p><p>我们可以用宏定义代替 <code>U8</code> 和 <code>uint8_t</code>:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">define</span> WRITE_DATA_TO_TXT(type, typeName) \</span></span><br><span class="line"><span class="meta">void Write##typeName##DataToTxt(Matrix *src, const char *path) { \</span></span><br><span class="line"><span class="meta"> FILE *file_ptr = fopen(path, <span class="string">"w"</span>); \</span></span><br><span class="line"><span class="meta"> for (int i = 0; i <span class="string">< src-></span>h; i++) { \</span></span><br><span class="line"><span class="meta"> type *data = (type *)src->data + i * src->pitch / sizeof(type); \</span></span><br><span class="line"><span class="meta"> for (int j = 0; j <span class="string">< src-></span>w; j++) { \</span></span><br><span class="line"><span class="meta"> <span class="keyword">if</span> (#typeName == <span class="string">"U8"</span>) { \</span></span><br><span class="line"><span class="meta"> fprintf(file_ptr, <span class="string">"U8 %d\n"</span>, (int)data[j]); \</span></span><br><span class="line"><span class="meta"> } <span class="keyword">else</span> { \</span></span><br><span class="line"><span class="meta"> fprintf(file_ptr, <span class="string">"U16 %d\n"</span>, (int)data[j]); \</span></span><br><span class="line"><span class="meta"> } \</span></span><br><span class="line"><span class="meta"> } \</span></span><br><span class="line"><span class="meta"> } \</span></span><br><span class="line"><span class="meta"> fclose(file_ptr); \</span></span><br><span class="line"><span class="meta">}</span></span><br></pre></td></tr></table></figure><p>这样,调用这个宏定义的函数时,可以通过设置 <code>type</code> 和 <code>typeName</code> 字段去生成各个类型的函数:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">WRITE_DATA_TO_TXT(<span class="type">uint8_t</span>, U8) <span class="comment">// 生成 uint8_t 的函数 WriteU8DataToTxt</span></span><br><span class="line"></span><br><span class="line">WRITE_DATA_TO_TXT(<span class="type">uint16_t</span>, U16) <span class="comment">// 生成 uint16_t 的函数 WriteU16DataToTxt</span></span><br></pre></td></tr></table></figure><p>完整代码如下:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><stdio.h></span></span></span><br><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><stdint.h></span></span></span><br><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><stdlib.h></span></span></span><br><span class="line"></span><br><span class="line"><span class="keyword">typedef</span> <span class="class"><span class="keyword">struct</span> {</span></span><br><span class="line"> <span class="type">int</span> w;</span><br><span class="line"> <span class="type">int</span> h;</span><br><span class="line"> <span class="type">int</span> pitch;</span><br><span class="line"> <span class="type">void</span> *data;</span><br><span class="line">} Matrix;</span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> WRITE_DATA_TO_TXT(type, typeName) \</span></span><br><span class="line"><span class="meta">void Write##typeName##DataToTxt(Matrix *src, const char *path) { \</span></span><br><span class="line"><span class="meta"> FILE *file_ptr = fopen(path, <span class="string">"w"</span>); \</span></span><br><span class="line"><span class="meta"> for (int i = 0; i <span class="string">< src-></span>h; i++) { \</span></span><br><span class="line"><span class="meta"> type *data = (type *)src->data + i * src->pitch / sizeof(type); \</span></span><br><span class="line"><span class="meta"> for (int j = 0; j <span class="string">< src-></span>w; j++) { \</span></span><br><span class="line"><span class="meta"> <span class="keyword">if</span> (#typeName == <span class="string">"U8"</span>) { \</span></span><br><span class="line"><span class="meta"> fprintf(file_ptr, <span class="string">"U8 %d\n"</span>, (int)data[j]); \</span></span><br><span class="line"><span class="meta"> } <span class="keyword">else</span> { \</span></span><br><span class="line"><span class="meta"> fprintf(file_ptr, <span class="string">"U16 %d\n"</span>, (int)data[j]); \</span></span><br><span class="line"><span class="meta"> } \</span></span><br><span class="line"><span class="meta"> } \</span></span><br><span class="line"><span class="meta"> } \</span></span><br><span class="line"><span class="meta"> fclose(file_ptr); \</span></span><br><span class="line"><span class="meta">}</span></span><br><span class="line"></span><br><span class="line">WRITE_DATA_TO_TXT(<span class="type">uint8_t</span>, U8)</span><br><span class="line"></span><br><span class="line">WRITE_DATA_TO_TXT(<span class="type">uint16_t</span>, U16)</span><br><span class="line"></span><br><span class="line"><span class="type">int</span> <span class="title function_">main</span><span class="params">()</span> {</span><br><span class="line"> Matrix u8Mat = {.w = <span class="number">3</span>, .h = <span class="number">2</span>, .pitch = <span class="number">3</span> * <span class="keyword">sizeof</span>(<span class="type">uint8_t</span>), .data = <span class="built_in">malloc</span>(<span class="number">3</span> * <span class="number">2</span> * <span class="keyword">sizeof</span>(<span class="type">uint8_t</span>))};</span><br><span class="line"> <span class="type">uint8_t</span> *u8Data = (<span class="type">uint8_t</span> *)u8Mat.data;</span><br><span class="line"> u8Data[<span class="number">0</span>] = <span class="number">1</span>; u8Data[<span class="number">1</span>] = <span class="number">2</span>; u8Data[<span class="number">2</span>] = <span class="number">3</span>;</span><br><span class="line"> u8Data[<span class="number">3</span>] = <span class="number">4</span>; u8Data[<span class="number">4</span>] = <span class="number">5</span>; u8Data[<span class="number">5</span>] = <span class="number">6</span>;</span><br><span class="line"> WriteU8DataToTxt(&u8Mat, <span class="string">"u8_data.txt"</span>);</span><br><span class="line"></span><br><span class="line"> Matrix u16Mat = {.w = <span class="number">3</span>, .h = <span class="number">2</span>, .pitch = <span class="number">3</span> * <span class="keyword">sizeof</span>(<span class="type">uint16_t</span>), .data = <span class="built_in">malloc</span>(<span class="number">3</span> * <span class="number">2</span> * <span class="keyword">sizeof</span>(<span class="type">uint16_t</span>))};</span><br><span class="line"> <span class="type">uint16_t</span> *u16Data = (<span class="type">uint16_t</span> *)u16Mat.data;</span><br><span class="line"> u16Data[<span class="number">0</span>] = <span class="number">10</span>; u16Data[<span class="number">1</span>] = <span class="number">20</span>; u16Data[<span class="number">2</span>] = <span class="number">30</span>;</span><br><span class="line"> u16Data[<span class="number">3</span>] = <span class="number">40</span>; u16Data[<span class="number">4</span>] = <span class="number">50</span>; u16Data[<span class="number">5</span>] = <span class="number">60</span>;</span><br><span class="line"> WriteU16DataToTxt(&u16Mat, <span class="string">"u16_data.txt"</span>);</span><br><span class="line"></span><br><span class="line"> <span class="built_in">free</span>(u8Mat.data);</span><br><span class="line"> <span class="built_in">free</span>(u16Mat.data);</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h1 id="特殊宏"><a href="#特殊宏" class="headerlink" title="特殊宏"></a>特殊宏</h1><p><code>__VA_ARGS__</code> 是一个预处理器宏,用于表示可变参数列表。它通常用于定义可变参数的宏,例如 <code>printf</code> 函数。在宏定义中,<code>__VA_ARGS__</code> 表示可变参数列表部分,可以在宏展开时将其替换为实际的参数列表。官方定义较为玄幻,直接看代码吧:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><stdio.h></span></span></span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> LOG(format, ...) printf(format, ##__VA_ARGS__)</span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="built_in">LOG</span>(<span class="string">"===== info =====\n"</span>); <span class="comment">// 0 参数</span></span><br><span class="line"> <span class="built_in">LOG</span>(<span class="string">"data is %d\n"</span>, <span class="number">2</span>); <span class="comment">// 1 个参数</span></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h2 id="一个简单的打日志函数"><a href="#一个简单的打日志函数" class="headerlink" title="一个简单的打日志函数"></a>一个简单的打日志函数</h2><p>给上述代码加一些辅助信息,就可以实现一个日志函数:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><stdio.h></span></span></span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> LOG(tag, format, ...) \</span></span><br><span class="line"><span class="meta"> printf(<span class="string">"[%s] [%s %s %d] "</span> format, tag, __FILE__, __FUNCTION__, __LINE__, ##__VA_ARGS__)</span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="built_in">LOG</span>(<span class="string">"BASE"</span>, <span class="string">"Nothing\n"</span>);</span><br><span class="line"> <span class="built_in">LOG</span>(<span class="string">"BASE"</span>, <span class="string">" ? info diff >= %d : %.4f %d\n"</span>, <span class="number">2</span>, <span class="number">0.1</span>, <span class="number">2</span>);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>对于</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">LOG</span>(<span class="string">"BASE"</span>, <span class="string">"Nothing"</span>);</span><br></pre></td></tr></table></figure><p>宏展开为:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">printf</span>(<span class="string">"[%s] [%s %s %d] "</span> <span class="string">"Nothing"</span>, <span class="string">"Base"</span>, <span class="string">"demo.cpp"</span>, <span class="string">"main"</span>, <span class="number">7</span>); </span><br></pre></td></tr></table></figure><p>注意,<code>Nothing</code> 这个信息是在 <code>format</code> 中,因此第一个 <code>%s</code> 对应的是 <code>tag</code>,所以最终输出为:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">[BASE] [test.cpp main 8] Nothing</span><br></pre></td></tr></table></figure><p>同理,第二个宏展开后的输出为:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">[BASE] [test.cpp main 7] ? info diff >= 2 : 0.1000 2</span><br></pre></td></tr></table></figure><ul><li>注意:代码中使用 <code>##__VA_ARGS__</code> 而不是 <code>__VA_ARGS__</code>,这是因为 <code>##__VA_ARGS__</code> 用于在可变参数列表为空时删除前面的逗号。在 C 语言中,如果可变参数列表为空,则在逗号之后没有参数,这会导致编译错误。</li></ul><h1 id="X-宏的使用"><a href="#X-宏的使用" class="headerlink" title="X 宏的使用"></a>X 宏的使用</h1><p>通过宏定义的方式,根据指令执行不同的函数。比如输入的指令是 <code>CMD_LED_ON</code>,执行的函数是 <code>led_on</code>;输入的指令是 <code>CMD_LED_OFF</code>,执行的函数是 <code>led_off</code>。首先定义这两个函数:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">static</span> <span class="type">void</span> <span class="title">led_on</span><span class="params">(<span class="type">void</span>* p)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"%s \r\n"</span>, (<span class="type">char</span> *)p);</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">static</span> <span class="type">void</span> <span class="title">led_off</span><span class="params">(<span class="type">void</span>* p)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"%s \r\n"</span>, (<span class="type">char</span> *)p);</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>将这两个指令 <code>CMD_LED_ON</code> 和 <code>CMD_LED_OFF</code> 定义到一个枚举变量中,不过是以宏的形式:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">define</span> MACROS_TABLE \</span></span><br><span class="line"><span class="meta"> X_MACROS(CMD_LED_ON, led_on) \</span></span><br><span class="line"><span class="meta"> X_MACROS(CMD_LED_OFF, led_off) \</span></span><br><span class="line"><span class="meta"></span></span><br><span class="line"><span class="comment">/*定义命令列表*/</span></span><br><span class="line"><span class="keyword">typedef</span> <span class="keyword">enum</span></span><br><span class="line">{</span><br><span class="line"> <span class="meta">#<span class="keyword">define</span> X_MACROS(a, b) a,</span></span><br><span class="line"> MACROS_TABLE</span><br><span class="line"> <span class="meta">#<span class="keyword">undef</span> X_MACROS</span></span><br><span class="line"> CMD_MAX</span><br><span class="line">} cmd_e;</span><br></pre></td></tr></table></figure><p><code>#define X_MACROS(a, b) a</code> 表示取出 <code>(a, b)</code> 中的第一个元素 <code>a</code>,则宏展开后的代码为:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">typedef</span> <span class="keyword">enum</span></span><br><span class="line">{</span><br><span class="line"> <span class="meta">#<span class="keyword">define</span> X_MACROS(a, b) a,</span></span><br><span class="line"> <span class="built_in">X_MACROS</span>(CMD_LED_ON, led_on) \</span><br><span class="line"> <span class="built_in">X_MACROS</span>(CMD_LED_OFF, led_off) \</span><br><span class="line"> <span class="meta">#<span class="keyword">undef</span> X_MACROS</span></span><br><span class="line">}</span><br><span class="line"></span><br><span class="line">继续把 X_MACROS 展开得到:</span><br><span class="line"></span><br><span class="line"><span class="comment">/*定义命令列表*/</span></span><br><span class="line"><span class="keyword">typedef</span> <span class="keyword">enum</span></span><br><span class="line">{</span><br><span class="line"> CMD_LED_ON,</span><br><span class="line"> CMD_LED_OFF,</span><br><span class="line"> CMD_MAX</span><br><span class="line">} cmd_e;</span><br></pre></td></tr></table></figure><p><code>#define X_MACROS(a, b) b,</code> 表示取出宏的第二个元素。使用同样的方法,在定义一个函数数组:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">typedef</span> <span class="title">void</span> <span class="params">(*func)</span><span class="params">(<span class="type">void</span>* p)</span></span>;</span><br><span class="line"><span class="type">const</span> func func_table[] =</span><br><span class="line">{</span><br><span class="line"> <span class="meta">#<span class="keyword">define</span> X_MACROS(a, b) b,</span></span><br><span class="line"> MACROS_TABLE</span><br><span class="line"> <span class="meta">#<span class="keyword">undef</span> X_MACROS</span></span><br><span class="line">};</span><br><span class="line"></span><br><span class="line">宏展开为:</span><br><span class="line"></span><br><span class="line"><span class="type">const</span> func func_table[] =</span><br><span class="line">{</span><br><span class="line"> led_on,</span><br><span class="line"> led_off</span><br><span class="line">};</span><br></pre></td></tr></table></figure><p>此时,<code>func_table[CMD_LED_ON]</code> 指向了 <code>led_on</code> 函数,<code>func_table[CMD_LED_OFF]</code> 指向了 <code>led_off</code> 函数,就实现了简单的根据不同的输入指令执行不同的函数。完成代码如下:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><stdio.h></span></span></span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> MACROS_TABLE \</span></span><br><span class="line"><span class="meta"> X_MACROS(CMD_LED_ON, led_on) \</span></span><br><span class="line"><span class="meta"> X_MACROS(CMD_LED_OFF, led_off) \</span></span><br><span class="line"><span class="meta"></span></span><br><span class="line"><span class="comment">/*定义命令列表*/</span></span><br><span class="line"><span class="keyword">typedef</span> <span class="keyword">enum</span></span><br><span class="line">{</span><br><span class="line"> <span class="meta">#<span class="keyword">define</span> X_MACROS(a, b) a,</span></span><br><span class="line"> MACROS_TABLE</span><br><span class="line"> <span class="meta">#<span class="keyword">undef</span> X_MACROS</span></span><br><span class="line"> CMD_MAX</span><br><span class="line">} cmd_e;</span><br><span class="line"></span><br><span class="line"><span class="comment">/*定义字符串列表用作Log打印*/</span></span><br><span class="line"><span class="type">const</span> <span class="type">char</span>* cmd_str[] =</span><br><span class="line">{</span><br><span class="line"> <span class="meta">#<span class="keyword">define</span> X_MACROS(a, b) #a,</span></span><br><span class="line"> MACROS_TABLE</span><br><span class="line"> <span class="meta">#<span class="keyword">undef</span> X_MACROS</span></span><br><span class="line">};</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">typedef</span> <span class="title">void</span> <span class="params">(*func)</span><span class="params">(<span class="type">void</span>* p)</span></span>;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">static</span> <span class="type">void</span> <span class="title">led_on</span><span class="params">(<span class="type">void</span>* p)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"%s \r\n"</span>, (<span class="type">char</span> *)p);</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">static</span> <span class="type">void</span> <span class="title">led_off</span><span class="params">(<span class="type">void</span>* p)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"%s \r\n"</span>, (<span class="type">char</span> *)p);</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="type">const</span> func func_table[] =</span><br><span class="line">{</span><br><span class="line"> <span class="meta">#<span class="keyword">define</span> X_MACROS(a, b) b,</span></span><br><span class="line"> MACROS_TABLE</span><br><span class="line"> <span class="meta">#<span class="keyword">undef</span> X_MACROS</span></span><br><span class="line">};</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">static</span> <span class="type">void</span> <span class="title">cmd_handle</span><span class="params">(cmd_e cmd)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> <span class="keyword">if</span>(cmd < CMD_MAX)</span><br><span class="line"> {</span><br><span class="line"> func_table[cmd]((<span class="type">void</span>*)cmd_str[cmd]);</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> <span class="built_in">cmd_handle</span>(CMD_LED_ON);</span><br><span class="line"> <span class="built_in">cmd_handle</span>(CMD_LED_OFF);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h1 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h1><ol><li><a href="https://zhuanlan.zhihu.com/p/521073931">X-宏的用法</a></li></ol>]]></content>
<summary type="html"><p>之前对 <code>C</code> 语言中宏定义的认知十分简单,包括但不限于停留在以下浅薄的层面:</p>
<figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">define</span> PI 3.14</span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> add(a, b) a + b</span></span><br></pre></td></tr></table></figure>
<p>上述代码完全是大学课本中的用法。但当我看到实际项目中宏的用法后完全是一头雾水,<del>所以自己也要写出那种高逼格让别人看不太懂的代码</del>。宏远远比我想象的要强大,所以本文为每个宏技巧都配备了一个实用场景。</p>
<ul>
<li>字符串化操作符,实现一个简单的自动化测试样例</li>
<li>字符串连接,实现一个具备计时功能的宏</li>
<li>X 宏,实现根据输入执行不同的函数</li>
<li>特殊宏 <code>__VA_ARGS__</code>,实现一个简单的日志函数</li>
</ul></summary>
<category term="C" scheme="https://muyuuuu.github.io/tags/C/"/>
</entry>
<entry>
<title>一次不太愉快的软件开发</title>
<link href="https://muyuuuu.github.io/2024/02/03/an-unpleasant-experience-dev/"/>
<id>https://muyuuuu.github.io/2024/02/03/an-unpleasant-experience-dev/</id>
<published>2024-02-02T17:19:02.000Z</published>
<updated>2024-02-02T17:30:21.315Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" class="aplayer-secondary-style-marker" href="/assets/css/APlayer.min.css"><script src="/assets/js/APlayer.min.js" class="aplayer-secondary-script-marker"></script><p>整体的开发感受是:缺乏一个合理的、完整的软件开发流程或规范。</p><span id="more"></span><ul><li><p>合理是指:大多需求都是由领导拍脑门、飞书、现场沟通传达。尤其在面临这种前路未知、需求多变的任务时,由于背景知识的缺乏, 沟通会更加吃力。最大的缺点是难以记录,不利于软件的维护、更新等。需要加什么功能,改什么功能,为什么这么做,无从查起。</p></li><li><p>完整是指:什么时候开会和立项,什么时候讨论,怎么样算完成,软件如何发布,如何维护,这些东西没有任何规范。一个软件的生命周期,从需求分析到维护,这些都没有。整体感受和学生时代的大作业没啥区别。</p></li><li><p>沟通效率很低。</p><ul><li>逐字、逐标点符号对文档十分没有必要,应该关注大纲。需要知道目标是什么,有哪些场景即可</li><li>以为刷新一下就实现的东西,说快速让我实现一下。但需要很复杂的数据传输与解析,脑子幻想的东西实现起来也许很费力</li><li>一开始不要讨论代码,浪费时间。一开始的讨论都是基于颅内 <code>debug</code>,到后面会发现之前讨论的代码很可能无法实现,或者说并不是最优的实现方式。代码写到那里,自然而然的会发现更好、更便捷的实现方法,回过头来发现前期的讨论除了浪费时间和耽误进度外,没有任何价值。</li></ul></li></ul><ul><li><p>应用场景,用户需求没有任何调研。</p><ul><li>未调研用户的需求,并没有得到他们的反馈。只是在满足领导想象出来的需求。假设有 50 人用软件,领导说你软件写的不行,不符合他的要求,一直提需求导致软件迟迟没有发布,项目一直 <code>delay</code>,自己很着急,老板很失望。我想写小而美的软件,后面慢慢添加功能;领导希望一次性支持全部功能,这仿佛真的很难实现。比如今天 AI 组又提了一个新需求,超出了我们最开始的规划,真的很难一次性实现全部需求。</li><li>其实呢,也许你的软件 20 人用着是满意的,20 人用着是觉得凑合的,9人觉得还需要改进,只有领导觉得这里不行,应该这样显示;那里不行,应该加个隐藏按钮。但也许 30 人觉得那个隐藏按钮多此一举,10个人觉得千万不要加隐藏按钮,只满足一个人的需求是没有意义的。换句话说,领导该把控大的方向,而不是纠结是否添加一个隐藏按钮。</li></ul></li></ul><ul><li><p>临时添加功能过于繁琐。</p><ul><li>想临时看一下峰值内存、想临时加一下 <code>unknown</code> 函数调用、想取消 <code>unknown</code> 的函数调用、想随便生成一个表看看界面什么样子。这些至少还是能应付的,改几行代码去应付即可,只不过累一些。而这些繁琐的临时需求,会发现写完之后不在需要,只会一点一点的消耗耐心,浪费宝贵的积极性。</li><li>后续再安排新任务时,会下意识的质疑任务的合理性,以及是否有必要去实现,产生一点排斥心理。</li></ul></li></ul><ul><li><p>需求不明确</p><ul><li>当软件过于庞大,输入、功能、需求、应用场景其中之一发生严重变化时,这绝对不是改几行代码能搞定的。只有两条路可走:继续维护屎山代码,或者重写代码。</li><li>所以一开始,最好讨论清楚目标是什么,功能是什么,支持的用户范围,最重要的是:做到什么程度就到此为止,哪些功能不需要实现,哪些用户不需要支持。等一切都清晰后,再开始去写代码。一开始被领导叫去写代码,还被要求实现很多的功能。后面和老板讨论后发现一些功能不用实现,一些功能需要改。<strong>看着手里的屎山代码,我选择了重写。</strong></li><li>如果功能发生大的变化,一定是前期的目标出了问题。作为领导,应该只要求大的方向,而不该关注和过分追求细节:比如按钮在哪个位置,信息如何提示给用户,文件命名等。比如文件名是日期+版本号,还是版本号+日期。小细节前期讨论会很浪费精力,后期修改又会更浪费时间、消耗耐心、浪费经历,十分没有必要。</li></ul></li></ul><p><del>如果某天我当了领导</del>,我大概率会说:先调研,有无现有的高性能实现方案,是写异步函数还是同步函数。然后写技术方案,和我沟通后我确定做的方向与内容,细节你们决定。</p><h1 id="如何维护?"><a href="#如何维护?" class="headerlink" title="如何维护?"></a>如何维护?</h1><ul><li>需要修复一些紧急的 <code>bug</code>,立刻发布</li><li>大家提了一些共性的需求,库会周期性发布,一次性多实现几个功能</li><li>个别需求不考虑实现</li></ul><p>提问和发布暂定使用 <code>gitlab</code>,将软件管理起来。第一次管理软件的维护和发布,处于探索阶段,还需要学习。功能实现或紧急 <code>bug</code> 修复后,关闭对应的 <code>issue</code>。</p>]]></content>
<summary type="html"><p>整体的开发感受是:缺乏一个合理的、完整的软件开发流程或规范。</p></summary>
<category term="Design" scheme="https://muyuuuu.github.io/tags/Design/"/>
</entry>
<entry>
<title>前后端全栈开发:0 基础搭建 UI 界面和提供数据服务</title>
<link href="https://muyuuuu.github.io/2024/01/03/full-stack-amis-tornado/"/>
<id>https://muyuuuu.github.io/2024/01/03/full-stack-amis-tornado/</id>
<published>2024-01-03T14:12:54.000Z</published>
<updated>2024-01-03T14:37:19.483Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" class="aplayer-secondary-style-marker" href="/assets/css/APlayer.min.css"><script src="/assets/js/APlayer.min.js" class="aplayer-secondary-script-marker"></script><p>写在前面。希望你不会有快速搭建 UI 界面为他人服务这种迫切的需求。虽然这是我的博客,但是我并不希望你搜到他。对于完全未知的领域,快速搭建、快速学习、不会就去学、不会就查、速成,通过这种方式写出来的代码一定是不好的,心累的,事倍功半的,也一定存在多多少少的 <code>bug</code> 和无法实现的逻辑。</p><p>但也有一个好消息,如果你完全不会前端后端,只会 <code>Python</code>,看了本文也能搭建完成的前后端服务,但距离入门的全栈工程师还差很远。</p><span id="more"></span><h1 id="前端开发"><a href="#前端开发" class="headerlink" title="前端开发"></a>前端开发</h1><p>在开发初期,我真的以为是弄一些简单的图表就结束,所以没放在心上。但是越往后项目越大,我的 <code>js</code> 和 <code>html</code> 水平实在驾驭不了,工作时也不会给我足够的时间让我从头学这些东西。每天晚上都在给之前的同学打电话询问:这种交互逻辑该怎么实现。在她帮我写了整体架构后,我便在架构上修修改改,查 <code>api</code>,整体是能满足需求的。</p><p>但是后续,项目又变大了,要求这个,要求那个,要求各种各样的 <code>UI</code> 界面和交互。0 前端基础的我实在应付不了,麻烦同学也不是长久之计,于是开始使用 <code>amis</code> 搭建前端界面。</p><h2 id="选择低代码框架-amis"><a href="#选择低代码框架-amis" class="headerlink" title="选择低代码框架 amis"></a>选择低代码框架 amis</h2><p>以上内容摘自百度 <code>amis</code> 的官方文档:</p><ul><li>在经历了十几年的发展后,前端开发变得越来越复杂,门槛也越来越高,要使用当下流行的 <code>UI</code> 组件库,必须懂 <code>npm</code>、<code>webpack</code>、<code>react/vue</code>,必须熟悉 <code>ES6</code> 语法,最好还了解状态管理,比如 <code>Redux</code>,如果没接触过函数式编程,入门都很费劲。而入门之后会发现它还有巨大的生态,相关的库有 2347 个,很多功能相似,挑选成本高。然而前端技术的发展不会停滞,等学完这些后可能会发现大家都用 <code>Hooks</code> 了、某个打包工具取代 <code>Webpack</code> 了……</li><li>用 <code>amis</code> 只需要几百行 <code>JSON</code> 配置,不需要了解 <code>React/Vue</code>、<code>Webpack</code>,甚至不需要了解 <code>JavaScript</code>,即便没学过 <code>amis</code> 也能猜到大部分配置的作用,只需要简单配置就能完成所有页面开发。</li><li>可以借助 <code>amis</code> 的<a href="https://aisuda.github.io/amis-editor-demo/#/hello-world">可视化编辑器</a>,快速完成页面的开发。对于大部分常用页面,应该使用最简单的方法来实现,甚至不需要学习前端框架和工具。</li><li><code>amis</code> 在百度内部得到了广泛使用,在 6 年多的时间里创建了 5 万页面,从内容审核到机器管理,从数据分析到模型训练,<code>amis</code> 满足了各种各样的页面需求。</li></ul><p><img data-src="https://s11.ax1x.com/2024/01/03/pivmv8A.jpg" alt></p><h2 id="下载-amis-并使用"><a href="#下载-amis-并使用" class="headerlink" title="下载 amis 并使用"></a>下载 amis 并使用</h2><p>下载<a href="https://github.com/baidu/amis/releases">链接</a>中的 <code>sdk.tar.gz</code>,解压放到本地文件夹。目录结构:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">sdk/</span><br><span class="line">index.html</span><br></pre></td></tr></table></figure><p><code>index.html</code> 中的内容,重点是 14,15,33 行中的 <code>sdk</code> 路径,需要正确的指定。<code>index.html</code> 中的内容:</p><figure class="highlight html"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta"><!DOCTYPE <span class="keyword">html</span>></span></span><br><span class="line"><span class="tag"><<span class="name">html</span> <span class="attr">lang</span>=<span class="string">"zh"</span>></span></span><br><span class="line"> <span class="tag"><<span class="name">head</span>></span></span><br><span class="line"> <span class="tag"><<span class="name">meta</span> <span class="attr">charset</span>=<span class="string">"UTF-8"</span> /></span></span><br><span class="line"> <span class="tag"><<span class="name">title</span>></span>amis demo<span class="tag"></<span class="name">title</span>></span></span><br><span class="line"> <span class="tag"><<span class="name">meta</span> <span class="attr">http-equiv</span>=<span class="string">"Content-Type"</span> <span class="attr">content</span>=<span class="string">"text/html; charset=utf-8"</span> /></span></span><br><span class="line"> <span class="tag"><<span class="name">meta</span></span></span><br><span class="line"><span class="tag"> <span class="attr">name</span>=<span class="string">"viewport"</span></span></span><br><span class="line"><span class="tag"> <span class="attr">content</span>=<span class="string">"width=device-width, initial-scale=1, maximum-scale=1"</span></span></span><br><span class="line"><span class="tag"> /></span></span><br><span class="line"> <span class="tag"><<span class="name">meta</span> <span class="attr">http-equiv</span>=<span class="string">"X-UA-Compatible"</span> <span class="attr">content</span>=<span class="string">"IE=Edge"</span> /></span></span><br><span class="line"> <span class="tag"><<span class="name">link</span> <span class="attr">rel</span>=<span class="string">"stylesheet"</span> <span class="attr">href</span>=<span class="string">"./sdk/sdk.css"</span> /></span></span><br><span class="line"> <span class="tag"><<span class="name">link</span> <span class="attr">rel</span>=<span class="string">"stylesheet"</span> <span class="attr">href</span>=<span class="string">"./sdk/helper.css"</span> /></span></span><br><span class="line"> <span class="comment"><!-- 从 1.1.0 开始 sdk.css 将不支持 IE 11,如果要支持 IE11 请引用这个 css,并把前面那个删了 --></span></span><br><span class="line"> <span class="comment"><!-- <link rel="stylesheet" href="sdk-ie11.css" /> --></span></span><br><span class="line"> <span class="comment"><!-- 不过 amis 开发团队几乎没测试过 IE 11 下的效果,所以可能有细节功能用不了,如果发现请报 issue --></span></span><br><span class="line"> <span class="tag"><<span class="name">style</span>></span><span class="language-css"></span></span><br><span class="line"><span class="language-css"> <span class="selector-tag">html</span>,</span></span><br><span class="line"><span class="language-css"> <span class="selector-tag">body</span>,</span></span><br><span class="line"><span class="language-css"> <span class="selector-class">.app-wrapper</span> {</span></span><br><span class="line"><span class="language-css"> <span class="attribute">position</span>: relative;</span></span><br><span class="line"><span class="language-css"> <span class="attribute">width</span>: <span class="number">100%</span>;</span></span><br><span class="line"><span class="language-css"> <span class="attribute">height</span>: <span class="number">100%</span>;</span></span><br><span class="line"><span class="language-css"> <span class="attribute">margin</span>: <span class="number">0</span>;</span></span><br><span class="line"><span class="language-css"> <span class="attribute">padding</span>: <span class="number">0</span>;</span></span><br><span class="line"><span class="language-css"> }</span></span><br><span class="line"><span class="language-css"> </span><span class="tag"></<span class="name">style</span>></span></span><br><span class="line"> <span class="tag"></<span class="name">head</span>></span></span><br><span class="line"> <span class="tag"><<span class="name">body</span>></span></span><br><span class="line"> <span class="tag"><<span class="name">div</span> <span class="attr">id</span>=<span class="string">"root"</span> <span class="attr">class</span>=<span class="string">"app-wrapper"</span>></span><span class="tag"></<span class="name">div</span>></span></span><br><span class="line"> <span class="tag"><<span class="name">script</span> <span class="attr">src</span>=<span class="string">"./sdk/sdk.js"</span>></span><span class="tag"></<span class="name">script</span>></span></span><br><span class="line"> <span class="tag"><<span class="name">script</span> <span class="attr">type</span>=<span class="string">"text/javascript"</span>></span><span class="language-javascript"></span></span><br><span class="line"><span class="language-javascript"> (<span class="keyword">function</span> (<span class="params"></span>) {</span></span><br><span class="line"><span class="language-javascript"> <span class="keyword">let</span> amis = <span class="title function_">amisRequire</span>(<span class="string">'amis/embed'</span>);</span></span><br><span class="line"><span class="language-javascript"> <span class="comment">// 通过替换下面这个配置来生成不同页面</span></span></span><br><span class="line"><span class="language-javascript"> <span class="keyword">let</span> amisJSON = {</span></span><br><span class="line"><span class="language-javascript"> <span class="attr">type</span>: <span class="string">'page'</span>,</span></span><br><span class="line"><span class="language-javascript"> <span class="attr">title</span>: <span class="string">'表单页面'</span>,</span></span><br><span class="line"><span class="language-javascript"> <span class="attr">body</span>: {</span></span><br><span class="line"><span class="language-javascript"> <span class="attr">type</span>: <span class="string">'form'</span>,</span></span><br><span class="line"><span class="language-javascript"> <span class="attr">mode</span>: <span class="string">'horizontal'</span>,</span></span><br><span class="line"><span class="language-javascript"> <span class="attr">api</span>: <span class="string">'/saveForm'</span>,</span></span><br><span class="line"><span class="language-javascript"> <span class="attr">controls</span>: [</span></span><br><span class="line"><span class="language-javascript"> {</span></span><br><span class="line"><span class="language-javascript"> <span class="attr">label</span>: <span class="string">'Name'</span>,</span></span><br><span class="line"><span class="language-javascript"> <span class="attr">type</span>: <span class="string">'text'</span>,</span></span><br><span class="line"><span class="language-javascript"> <span class="attr">name</span>: <span class="string">'name'</span></span></span><br><span class="line"><span class="language-javascript"> },</span></span><br><span class="line"><span class="language-javascript"> {</span></span><br><span class="line"><span class="language-javascript"> <span class="attr">label</span>: <span class="string">'Email'</span>,</span></span><br><span class="line"><span class="language-javascript"> <span class="attr">type</span>: <span class="string">'email'</span>,</span></span><br><span class="line"><span class="language-javascript"> <span class="attr">name</span>: <span class="string">'email'</span></span></span><br><span class="line"><span class="language-javascript"> }</span></span><br><span class="line"><span class="language-javascript"> ]</span></span><br><span class="line"><span class="language-javascript"> }</span></span><br><span class="line"><span class="language-javascript"> };</span></span><br><span class="line"><span class="language-javascript"> <span class="keyword">let</span> amisScoped = amis.<span class="title function_">embed</span>(<span class="string">'#root'</span>, amisJSON);</span></span><br><span class="line"><span class="language-javascript"> })();</span></span><br><span class="line"><span class="language-javascript"> </span><span class="tag"></<span class="name">script</span>></span></span><br><span class="line"> <span class="tag"></<span class="name">body</span>></span></span><br><span class="line"><span class="tag"></<span class="name">html</span>></span></span><br></pre></td></tr></table></figure><p>用浏览器打开 <code>index.html</code>,就能看到一个简单的页面。当然,也可以打开百度提供的<a href="https://aisuda.github.io/amis-editor-demo/#/hello-world">前端编辑器</a>,以拖拉拽的形式完成前端界面的开发即可,类似 <code>qtdesigner</code> 或者 <code>C#</code> 开发 <code>.NET FrameWork</code> 的操作。</p><p>友情提示:和任何 <code>UI</code> 开发一样,建议为每个组件提供 <code>flex</code> 布局或者容器,后期容易调整样式,开发出来的 <code>UI</code> 界面也更好看。开发完成之后,点击这个按钮获取 <code>json</code> 文件:</p><p>待补充图片</p><p>然后拷贝到 <code>index.html</code> 中的 <code>let amisJSON =</code> 字段,就完成了 <code>UI</code> 界面的开发。注意:这里只是完成了 <code>UI</code> 界面开发,并没有和后台的数据相关联,并没有捕捉用户的动作,完成交互和响应需要单独写代码。需要在下图的位置添加事件:</p><p>待补充图片</p><p>如果你有幸搞过 <code>Qt</code> 或者 <code>.NET FrameWork</code> 的开发,那么一定对这个东西不陌生。熟练使用事件可以让界面的响应更加流畅。下面开始介绍事件的使用,并和后端相关联。</p><h1 id="后端"><a href="#后端" class="headerlink" title="后端"></a>后端</h1><h1 id="结语"><a href="#结语" class="headerlink" title="结语"></a>结语</h1><p>说实话,入职 3 个月培训结束后,一直在被安排干前后端开发的活,为他人提供一些网站服务。然而实际是我是一个算法工程师,每天到工位都感觉自己像个傻逼。</p>]]></content>
<summary type="html"><p>写在前面。希望你不会有快速搭建 UI 界面为他人服务这种迫切的需求。虽然这是我的博客,但是我并不希望你搜到他。对于完全未知的领域,快速搭建、快速学习、不会就去学、不会就查、速成,通过这种方式写出来的代码一定是不好的,心累的,事倍功半的,也一定存在多多少少的 <code>bug</code> 和无法实现的逻辑。</p>
<p>但也有一个好消息,如果你完全不会前端后端,只会 <code>Python</code>,看了本文也能搭建完成的前后端服务,但距离入门的全栈工程师还差很远。</p></summary>
<category term="Python" scheme="https://muyuuuu.github.io/tags/Python/"/>
</entry>
<entry>
<title>在 python 操作大文件时节省内存</title>
<link href="https://muyuuuu.github.io/2023/12/26/python-memory-optimization/"/>
<id>https://muyuuuu.github.io/2023/12/26/python-memory-optimization/</id>
<published>2023-12-26T15:09:44.000Z</published>
<updated>2023-12-28T15:55:03.637Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" class="aplayer-secondary-style-marker" href="/assets/css/APlayer.min.css"><script src="/assets/js/APlayer.min.js" class="aplayer-secondary-script-marker"></script><p>没想到有一天写 <code>python</code> 的时候也会想着如何去节省内存。平时写 <code>python</code> 的时候根本不会关注这些,变量什么的直接创建和使用就完了,也不用考虑内存的释放,反正有垃圾回收机制。只不过这次数据量过大,<code>debug</code> 的时候发现内存一直在申请,导致系统彻底的卡死。</p><p>可能也是从事算法的优化工作养成了职业病,每次写代码的时候都会想,这些代码消耗的时间怎么样,占用的空间怎么样,数据结构是否可以继续优化,这些逻辑有没有更优雅的写法。</p><p>注:本文程序中使用 <code>psutil</code> 库来监测进程使用的内存大小,需要 <code>pip install psutil</code>一下。</p><span id="more"></span><h1 id="背景"><a href="#背景" class="headerlink" title="背景"></a>背景</h1><p>需要解析一个很大的日志文件,日志文件中含有一些无用的信息,像下面这样:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">有用信息1</span><br><span class="line">无用信息1</span><br><span class="line">有用信息2</span><br><span class="line">有用信息3</span><br><span class="line">无用信息2</span><br><span class="line">...</span><br><span class="line">有用信息N</span><br></pre></td></tr></table></figure><p>解析文件的时候,需要从文件中解析并提取出有用的信息,存入一个对象中,完成后续的处理。<br>但是呢,对于某些特殊的任务和需求,发现文件只解析一次是不行的,也就是需要对文件进行二次解析。</p><p>所以为了避免重复的解析文件,在第一次文件解析完毕后,直接把有用的核心信息序列化出去,这样二次解析的话就不用重新读取源文件在解析,直接读取序列化后的核心数据就好了。</p><h1 id="序列化导出"><a href="#序列化导出" class="headerlink" title="序列化导出"></a>序列化导出</h1><p>最开始的方案是使用一个 <code>list</code> 持续追加解析得到的核心数据,文件解析完毕后把这个很大的 <code>list</code> 序列化出去。监测到进程占用的内存大小为:700MB。</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> random</span><br><span class="line"><span class="keyword">import</span> pickle</span><br><span class="line"><span class="keyword">import</span> time</span><br><span class="line"><span class="keyword">import</span> psutil</span><br><span class="line"><span class="keyword">import</span> os</span><br><span class="line"></span><br><span class="line">data = []</span><br><span class="line"></span><br><span class="line"><span class="keyword">for</span> i <span class="keyword">in</span> <span class="built_in">range</span>(<span class="number">10000000</span>):</span><br><span class="line"> data.append(<span class="built_in">str</span>(random.randint(<span class="number">10000</span>, <span class="number">109070987</span>)))</span><br><span class="line"></span><br><span class="line"><span class="keyword">with</span> <span class="built_in">open</span>(<span class="string">"data.pkl"</span>, <span class="string">"wb"</span>) <span class="keyword">as</span> f:</span><br><span class="line"> pickle.dump(data, f)</span><br><span class="line"></span><br><span class="line"><span class="comment"># 获取当前 Python 进程占用的内存</span></span><br><span class="line">memory_info = process.memory_info()</span><br><span class="line"></span><br><span class="line"><span class="comment"># 打印占用的内存大小,rss 单位为字节</span></span><br><span class="line"><span class="built_in">print</span>(memory_info.rss / <span class="number">1024</span> / <span class="number">1024</span>, <span class="string">"MB"</span>)</span><br></pre></td></tr></table></figure><p>而如果使用序列化追加的方式,仅用 15MB,耗时增加 2s,毕竟每次序列化的时候都需要打开文件并在末尾追加内容:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">with</span> <span class="built_in">open</span>(<span class="string">"data.pkl"</span>, <span class="string">"ab"</span>) <span class="keyword">as</span> f:</span><br><span class="line"> <span class="keyword">for</span> i <span class="keyword">in</span> <span class="built_in">range</span>(<span class="number">10000000</span>):</span><br><span class="line"> pickle.dump(<span class="built_in">str</span>(random.randint(<span class="number">10000</span>, <span class="number">109070987</span>)), f)</span><br></pre></td></tr></table></figure><p>这里可以设置一个 <code>buffer</code> 进行优化,<code>buffer</code> 达到一定大小后在统一序列化出去。</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">SeriesModel</span>:</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self</span>) -> <span class="literal">None</span>:</span><br><span class="line"> self._buf = []</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">series</span>(<span class="params">self, stack, finish=<span class="literal">False</span></span>):</span><br><span class="line"> self._buf.append(stack)</span><br><span class="line"> <span class="keyword">if</span> <span class="number">100</span> < <span class="built_in">len</span>(self._buf) <span class="keyword">or</span> finish <span class="keyword">is</span> <span class="literal">True</span>:</span><br><span class="line"> <span class="keyword">with</span> <span class="built_in">open</span>(config.SERIES_PATH, <span class="string">"ab"</span>) <span class="keyword">as</span> f:</span><br><span class="line"> <span class="keyword">for</span> item <span class="keyword">in</span> self._buf:</span><br><span class="line"> pickle.dump(item, f)</span><br><span class="line"> self._buf = []</span><br></pre></td></tr></table></figure><h1 id="序列化读入"><a href="#序列化读入" class="headerlink" title="序列化读入"></a>序列化读入</h1><p>在二次解析的时候,需要把序列化的数据 <code>load</code> 进来。如果加载序列化的文件并且直接处理数据,同样需要使用 700MB 的内存。这种一次性创建所有元素的行为是没有必要的。</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">with</span> <span class="built_in">open</span>(<span class="string">"data.pkl"</span>, <span class="string">"rb"</span>) <span class="keyword">as</span> f:</span><br><span class="line"> data = pickle.load(f)</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> i <span class="keyword">in</span> data:</span><br><span class="line"> i += <span class="string">" "</span></span><br></pre></td></tr></table></figure><p>可以使用惰性计算来解决这一问题,只有在真正需要这个变量的时候才去创建,而不是一开始就创建所有的变量。考虑到生成器表达式的局限性,我们直接使用 <code>yield</code> 关键字创建一个生成器函数。</p><p><code>yield</code> 语句类似 <code>return</code> 会返回一个值,但它会记住这个返回的位置,下次迭代的时候就从这个位置继续执行,返回下一个元素。这样就消耗内存 15MB。</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">read</span>(<span class="params">file</span>):</span><br><span class="line"> <span class="keyword">with</span> <span class="built_in">open</span>(file, <span class="string">"rb"</span>) <span class="keyword">as</span> f:</span><br><span class="line"> data = pickle.load(f)</span><br><span class="line"> <span class="keyword">for</span> i <span class="keyword">in</span> data:</span><br><span class="line"> <span class="keyword">yield</span> i</span><br><span class="line"></span><br><span class="line"><span class="comment"># data 是生成器</span></span><br><span class="line">data = read(<span class="string">"data.pkl"</span>)</span><br><span class="line"><span class="keyword">for</span> i <span class="keyword">in</span> data:</span><br><span class="line"> i += <span class="string">" "</span></span><br></pre></td></tr></table></figure><h1 id="引申"><a href="#引申" class="headerlink" title="引申"></a>引申</h1><p>任何一个生成器都会定义一个名为 <code>__next__</code> 的方法,这个方法要在最后一个元素之后需抛出 <code>StopIteration</code> 异常。<code>next()</code> 函数的本质就是调用对象的 <code>__next__()</code>。这个方法要么返回迭代的下一项,要么引起结束迭代的异常 <code>StopIteration</code>,下面的示例揭示了生成器的本质。</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">FibGenerator</span>():</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self, n</span>):</span><br><span class="line"> self.__n = n</span><br><span class="line"></span><br><span class="line"> self.__s0 = <span class="number">0</span></span><br><span class="line"> self.__s1 = <span class="number">1</span></span><br><span class="line"> self.__count = <span class="number">0</span></span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__next__</span>(<span class="params">self</span>): <span class="comment"># 用于内建函数 next()</span></span><br><span class="line"> <span class="keyword">if</span> self.__count < self.__n:</span><br><span class="line"> ret = self.__s0</span><br><span class="line"> self.__s0, self.__s1 = self.__s1, (self.__s0 + self.__s1)</span><br><span class="line"> self.__count += <span class="number">1</span></span><br><span class="line"> <span class="keyword">return</span> ret</span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> <span class="keyword">raise</span> StopIteration</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__iter__</span>(<span class="params">self</span>): <span class="comment"># 用于 for 循环语句</span></span><br><span class="line"> <span class="keyword">return</span> self</span><br><span class="line"></span><br><span class="line">fg = FibGenerator(<span class="number">5</span>)</span><br><span class="line"><span class="built_in">print</span>(<span class="built_in">type</span>(fg))</span><br><span class="line"><span class="built_in">print</span>(<span class="built_in">isinstance</span>(fg, Iterable))</span><br><span class="line"><span class="keyword">for</span> i <span class="keyword">in</span> fg:</span><br><span class="line"> <span class="built_in">print</span>(i, end=<span class="string">' '</span>)</span><br><span class="line"></span><br><span class="line">>>></span><br><span class="line"><<span class="keyword">class</span> <span class="string">'__main__.FibGenerator'</span>></span><br><span class="line"><span class="literal">True</span></span><br><span class="line"><span class="number">0</span> <span class="number">1</span> <span class="number">1</span> <span class="number">2</span> <span class="number">3</span></span><br></pre></td></tr></table></figure><p>示例中如果没有定义 <code>__iter__()</code> 方法则只能使用 <code>next()</code> 函数进行迭代,当它定义后,就可以使用 <code>for</code> 和 <code>in</code> 语句访问了,同时定义了这两种方法的对象称为迭代器。生成器表达式和生成器函数产生生成器时,会自动生成名为 <code>__iter__</code> 和 <code>__next__</code> 的方法,所以生成器也是一种迭代器。</p><h1 id="参考链接"><a href="#参考链接" class="headerlink" title="参考链接"></a>参考链接</h1><p><a href="https://pythonhowto.readthedocs.io/zh-cn/latest/iterator.html">https://pythonhowto.readthedocs.io/zh-cn/latest/iterator.html</a></p>]]></content>
<summary type="html"><p>没想到有一天写 <code>python</code> 的时候也会想着如何去节省内存。平时写 <code>python</code> 的时候根本不会关注这些,变量什么的直接创建和使用就完了,也不用考虑内存的释放,反正有垃圾回收机制。只不过这次数据量过大,<code>debug</code> 的时候发现内存一直在申请,导致系统彻底的卡死。</p>
<p>可能也是从事算法的优化工作养成了职业病,每次写代码的时候都会想,这些代码消耗的时间怎么样,占用的空间怎么样,数据结构是否可以继续优化,这些逻辑有没有更优雅的写法。</p>
<p>注:本文程序中使用 <code>psutil</code> 库来监测进程使用的内存大小,需要 <code>pip install psutil</code>一下。</p></summary>
<category term="Python" scheme="https://muyuuuu.github.io/tags/Python/"/>
</entry>
<entry>
<title>如何写出更好的程序二:尽可能减少代码的修改</title>
<link href="https://muyuuuu.github.io/2023/11/17/minimize-code-modification/"/>
<id>https://muyuuuu.github.io/2023/11/17/minimize-code-modification/</id>
<published>2023-11-17T15:50:00.000Z</published>
<updated>2023-12-28T15:56:09.489Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" class="aplayer-secondary-style-marker" href="/assets/css/APlayer.min.css"><script src="/assets/js/APlayer.min.js" class="aplayer-secondary-script-marker"></script><p>职场新人兼新手程序员斗胆开了新坑「如何写出更好的程序」,所见所得都是来自实际写代码时自己的思考,且已脱敏。这一系列不包含任何复杂的技术,也不包含任何难懂的代码。只是将核心问题暴露出来,针对这些场景,如何写出可维护性更高、更简洁优雅的代码。</p><p>目前仅包括 <code>python</code> 装饰器的使用,等某天遇到其他技术也可以减少代码的修改时,会追加到本文。</p><span id="more"></span><h1 id="使用-Python-装饰器"><a href="#使用-Python-装饰器" class="headerlink" title="使用 Python 装饰器"></a>使用 Python 装饰器</h1><h2 id="背景"><a href="#背景" class="headerlink" title="背景"></a>背景</h2><p>一开始写代码的时候,都在想着要尽可能的支持全部功能,要获取各种信息并反馈给用户。于是我写了一大堆代码,创建了各种类、各种数据结构,以及实现了各种方法。</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">A</span>:</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">func1</span>(): ...</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">func2</span>(): ...</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">func3</span>(): ...</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">B</span>:</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">func4</span>(): ...</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">func5</span>(): ...</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">func6</span>(): ...</span><br></pre></td></tr></table></figure><p>为了高效的获取信息,一些数据可以复用,一些逻辑可以跳过,这样写出来的代码也会错综复杂:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">main</span>():</span><br><span class="line"> a = A()</span><br><span class="line"> a.func1()</span><br><span class="line"> b = B()</span><br><span class="line"></span><br><span class="line"> val = some_func()</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">if</span> val < <span class="number">100</span>:</span><br><span class="line"> a.func3()</span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> b.func4()</span><br></pre></td></tr></table></figure><p>某天忽然遇到一个新需求:需要增加一个轻量版的代码,只得到 3 个核心信息就好了,其他信息直接忽略掉。这时我回首我的代码发现:为了得到各种信息,之前的代码十分庞大,有很多类,也有很多方法,复杂的逻辑修改起来并不是件很容易的事。</p><ul><li>为了实现轻量版的代码,重写代码肯定是不值得的,毕竟一些代码逻辑和数据结构可以复用。重写代码势必会导致代码文件增加,冗余代码增多。</li><li>如果复用代码,会发现这个类可以不用创建,这个逻辑可以跳过,一些类的成员方法可以不用执行。</li></ul><h2 id="坏代码"><a href="#坏代码" class="headerlink" title="坏代码"></a>坏代码</h2><p>如果在代码中手动添加 <code>lite</code> 这一轻量化参数,遇到不需要执行的代码就根据 <code>lite</code> 写 <code>if else</code> 分支给代码加岔路口,代码结构会十分繁杂。比如有 <code>lite</code> 选项时,我们需要创建 <code>A</code> 这个类,根据临时结果判断是否需要执行 <code>b.func4()</code>,那么上述代码修改为:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">main</span>():</span><br><span class="line"> lite = <span class="literal">True</span></span><br><span class="line"> a = A()</span><br><span class="line"> <span class="keyword">if</span> <span class="keyword">not</span> lite:</span><br><span class="line"> a.func1()</span><br><span class="line"> b = B()</span><br><span class="line"> </span><br><span class="line"> val = some_func()</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">if</span> val < <span class="number">100</span> <span class="keyword">and</span> lite:</span><br><span class="line"> a.func3()</span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> b.func4()</span><br></pre></td></tr></table></figure><p>对于 1000 多行更加复杂的代码,手动添加 <code>lite</code> 分支并修改逻辑,这是很累的工作,写出来的代码也不好看,通用性也随之变差。</p><h2 id="装饰器优化"><a href="#装饰器优化" class="headerlink" title="装饰器优化"></a>装饰器优化</h2><p>此时我们可以使用装饰器来完成这一工作,如果不知道装饰器是什么东西可以参考我之前的<a href="https://muyuuuu.github.io/2020/01/07/python-wrapper/">文章</a>。在装饰器中首先传入 <code>self</code> 参数,如果检测到类的 <code>lite</code> 属性为 <code>true</code>,直接跳过这一函数不执行。此时我们只需要打开需要改动的类,增加 <code>lite</code> 属性。</p><p>如果确定这个方法可以不执行,给方法增加装饰器即可。而对于 <code>main</code> 函数中的代码,是不需要任何修改的,也不需要增加大量的 <code>if else</code> 分支,减少代码结构的修改和破坏。逻辑处理部分的代码如下所示,相比坏代码部分精简了很多,且 <code>a.func1</code> 和 <code>a.func3</code> 都是不会执行的。</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">use_lite</span>(<span class="params">func</span>):</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">wrapper</span>(<span class="params">self, *args, **kwargs</span>):</span><br><span class="line"> <span class="keyword">if</span> self.is_lite:</span><br><span class="line"> <span class="keyword">pass</span></span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> <span class="keyword">return</span> func(self, *args, **kwargs)</span><br><span class="line"> <span class="keyword">return</span> wrapper</span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">A</span>:</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self, lite=<span class="literal">False</span></span>):</span><br><span class="line"> self.lite = <span class="literal">True</span></span><br><span class="line"><span class="meta"> @use_lite</span></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">func1</span>(): ...</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">main</span>():</span><br><span class="line"> a = A(<span class="literal">True</span>)</span><br><span class="line"> a.func1()</span><br><span class="line"> b = B()</span><br><span class="line"> </span><br><span class="line"> val = some_func()</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> val < <span class="number">100</span>:</span><br><span class="line"> a.func3()</span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> b.func4()</span><br></pre></td></tr></table></figure><p>补充:<code>@use_lite(self.lite)</code> 是会报错的,因为装饰器是外部方法,并不是类的成员,也就无法捕捉类对象。</p>]]></content>
<summary type="html"><p>职场新人兼新手程序员斗胆开了新坑「如何写出更好的程序」,所见所得都是来自实际写代码时自己的思考,且已脱敏。这一系列不包含任何复杂的技术,也不包含任何难懂的代码。只是将核心问题暴露出来,针对这些场景,如何写出可维护性更高、更简洁优雅的代码。</p>
<p>目前仅包括 <code>python</code> 装饰器的使用,等某天遇到其他技术也可以减少代码的修改时,会追加到本文。</p></summary>
<category term="Design" scheme="https://muyuuuu.github.io/tags/Design/"/>
</entry>
<entry>
<title>如何写出更好的程序一:用好配置文件和减少硬编码</title>
<link href="https://muyuuuu.github.io/2023/10/18/data-config-and-decrease-hard-coding/"/>
<id>https://muyuuuu.github.io/2023/10/18/data-config-and-decrease-hard-coding/</id>
<published>2023-10-18T15:15:26.000Z</published>
<updated>2023-12-29T11:01:32.908Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" class="aplayer-secondary-style-marker" href="/assets/css/APlayer.min.css"><script src="/assets/js/APlayer.min.js" class="aplayer-secondary-script-marker"></script><p>职场新人兼新手程序员斗胆开了新坑「如何写出更好的程序」,所见所得都是来自实际写代码时自己的思考,且已脱敏。这一系列不包含任何复杂的技术,也不包含任何难懂的代码。只是将核心问题暴露出来,针对这些场景,如何写出可维护性更高、更简洁优雅的代码。</p><p>以 <code>python</code> 为例,本文的主要内容包括:如何使用配置文件,以及如何减少代码中的硬编码,引申到了代码的组织架构和可维护性上。</p><span id="more"></span><h1 id="如何使用好配置文件"><a href="#如何使用好配置文件" class="headerlink" title="如何使用好配置文件"></a>如何使用好配置文件</h1><h2 id="针对一个代码文件使用配置文件的情况"><a href="#针对一个代码文件使用配置文件的情况" class="headerlink" title="针对一个代码文件使用配置文件的情况"></a>针对一个代码文件使用配置文件的情况</h2><p>假设只有在 <code>main.py</code> 中需要读取配置文件,将配置文件的部分变量以传参的形式交给其他函数使用,这是最简单的场景。举个简单的例子,如果是生产环境,那么 <code>env=debug</code>;如果是开发环境,那么 <code>env=release</code>,当然这是从配置文件里读取得到的。考虑复杂一些的情况,如果是用户 DIY 使用,可能需要的变量并不在配置文件中。</p><p>对于这一场景,建议将配置文件写到 <code>config.py</code> 中,并且用一个类进行封装,变量就是类的成员。当需要根据生产或开发环境执行不同的代码时,只需要在类内进行判断即可。当用户需要增加其他变量时,由用户继承这一个类并添加自己的变量和方法就好。</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">Data</span>:</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self</span>):</span><br><span class="line"> self.env1 = ...</span><br><span class="line"> self.env2 = ...</span><br><span class="line"> self.__setup()</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__setup</span>(<span class="params">self</span>):</span><br><span class="line"> <span class="keyword">if</span> self.env1 == <span class="string">"1"</span>:</span><br><span class="line"> func1()</span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> func3()</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> self.env2 == <span class="string">"2"</span>:</span><br><span class="line"> func2()</span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> func4()</span><br></pre></td></tr></table></figure><h2 id="针对多个文件使用配置文件的情况"><a href="#针对多个文件使用配置文件的情况" class="headerlink" title="针对多个文件使用配置文件的情况"></a>针对多个文件使用配置文件的情况</h2><p>如果此时有几十个代码文件都需要读取配置文件,获取其中的变量并执行对应的代码,总不能每个文件都创建一个类对象并初始化吧。你说参数传递?如果函数的传参很困难又该怎么办呢?具体而言,当开发后端的时候,<code>main.py</code> 读取配置文件并得到了 <code>env=debug</code>,此时打开了网页,点击一些按钮完成一些交互,则 <code>web</code> 端会通过 <code>js</code> 发起了一个 <code>post</code> 请求,告诉你需要执行某些代码,这个请求被 <code>handler.py</code> 拦截到。</p><p>此时存在一个问题:<code>handler.py</code> 中的 <code>get</code> 方法拦截到 <code>web</code> 端请求,并不是 <code>main.py</code> 直接将请求发送到 <code>handler.py</code>。所以此时不能直接传递参数,<code>handler.py</code> 也并不知道 <code>env=debug</code>,所以可能不知道执行哪些代码。再去重新实例化一个类?几十个代码文件都去实例化同一个类,未免浪费空间。</p><p>简单的参数可以加到 <code>post</code> 请求的 <code>url</code> 里,但是当参数高达十几个时,传参和接收参数这会很麻烦。何况配置文件就在那里,<code>handler.py</code> 直接获取会方便很多。这个时候建议将配置文件写到 <code>config.py</code> 中,但不是以类的形式,而是直接写入变量并赋值,如 <code>ENV="DEBUG"</code>。当任何文件需要读取这一变量时,直接 <code>import config; config.ENV</code> 便可获取。有点类似 <code>C</code> 语言中的 <code>#define</code>。</p><h2 id="yaml-或者-json?"><a href="#yaml-或者-json?" class="headerlink" title="yaml 或者 json?"></a>yaml 或者 json?</h2><p>还有一些通过读取 <code>yaml</code>,<code>json</code> 等配置文件来生成变量的,但是这会不可避免的增加代码中的硬编码,而且只能获取变量。根据变量去判断执行哪些方法需要单独实现,所以没有考虑使用。具体而言:</p><ul><li><p>对于情景一中的代码,用类实现配置文件的话可以直接调用类内的 <code>__setup()</code> 方法。如果是 <code>yaml</code> 文件,从文件加载到 <code>env1, env2</code> 后,需要单独去写情景一例子中的 <code>__setup()</code> 方法,不如封装到类内方便。</p></li><li><p>对于情景二,如果几十个代码文件都去执行 <code>import yaml; yaml.load()</code> 来获取配置文件中的变量,这又会造成大量的文件 <code>IO</code>,没有意义。这也是我不考虑使用 <code>yaml,json</code> 作为配置文件的原因。</p></li></ul><h1 id="减少代码的硬编码"><a href="#减少代码的硬编码" class="headerlink" title="减少代码的硬编码"></a>减少代码的硬编码</h1><p>在有了配置文件后,可以有效减少代码中的硬编码,增强代码的可维护性。比如创建了一个字典:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">data[<span class="string">"name"</span>] = ...</span><br><span class="line">data[<span class="string">"value"</span>] = ...</span><br><span class="line">data[<span class="string">"children"</span>] = ...</span><br></pre></td></tr></table></figure><p>但是此时后台的接口忽然发生了变化,<code>children</code> 这个名字忽然改成了 <code>subfunc</code>,后台解析只认 <code>data["subfunc"]</code> 这个字段,上面的写法需要去所有代码文件里一个个的搜索 <code>"children"</code> 并替换为 <code>"subfunc"</code>,显然是很累又不得不干的活。这个时候可以使用配置文件:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">config.py</span><br><span class="line">CHILDREN = <span class="string">"children"</span></span><br><span class="line"></span><br><span class="line">main.py</span><br><span class="line"><span class="keyword">import</span> config</span><br><span class="line">data[config.CHILDREN] = ...</span><br></pre></td></tr></table></figure><p>如果再遇到 <code>children</code> 名字改成了 <code>subfunc</code>,只需要在 <code>config.py</code> 里修改 <code>CHILDREN</code> 的取值就可以了,只需要修改一次,比上面的实现优雅一些。</p><h2 id="重灾区:函数返回值"><a href="#重灾区:函数返回值" class="headerlink" title="重灾区:函数返回值"></a>重灾区:函数返回值</h2><p>另一个硬编码重灾区是函数的返回值,众所周知 <code>python</code> 函数是可以有多个返回值的,对于暂时不需要的返回值可以用下划线忽略掉。</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">func</span>():</span><br><span class="line"> <span class="keyword">return</span> name, info, value, key, address, flag, context</span><br><span class="line"></span><br><span class="line">name, info, value, key, address, _, context = func()</span><br></pre></td></tr></table></figure><p>其实上面获取函数返回值的形式更像列表的切片:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">func</span>():</span><br><span class="line"> <span class="keyword">return</span> name, info, value, key, address, flag, context</span><br><span class="line"></span><br><span class="line">return_val = func()</span><br><span class="line">name = return_val[<span class="number">0</span>]</span><br><span class="line">info = return_val[<span class="number">1</span>]</span><br><span class="line">value = return_val[<span class="number">2</span>]</span><br><span class="line">key = return_val[<span class="number">3</span>]</span><br><span class="line">address = return_val[<span class="number">4</span>]</span><br><span class="line">context = return_val[<span class="number">6</span>]</span><br></pre></td></tr></table></figure><p>可以看到,如果要调用 <code>func</code> 函数,就必须牢记返回值的顺序,当代码文件很多时并不友好,也不优雅。当需要增加或减少返回值的数量时,切片访问函数返回值的形式也很难处理。比如当不需要返回 <code>name</code> 字段时,或者需要增加一个 <code>param</code> 参数,下标都需要修改。增加返回值时, 别说把这个返回值放到所有函数返回值的最后,这只是为了代码能运行起来做的妥协,没意思。以上情况对于调用 <code>func</code> 的函数而言都需要一个个手动修改,简直是一场灾难。</p><p>这个时候建议使用类对象或者字典,道理是一样的:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">func</span>():</span><br><span class="line"> <span class="keyword">return</span> {</span><br><span class="line"> <span class="string">"name"</span>: name,</span><br><span class="line"> <span class="string">"info"</span>: info</span><br><span class="line"> }</span><br></pre></td></tr></table></figure><p>这样,就在也不需要记住返回值的顺序,也不必担心函数增加或减少返回值,甚至不用关注返回值的顺序。都可以直接通过字典的 <code>key</code> 访问。你说 <code>"name", "info"</code> 这样的硬编码不好?可以用前面讲述的配置文件避免掉它呀。</p><p><code>C</code> 这种语言并不支持函数返回多个变量,需要返回多个变量时都是使用结构体来完成,这种想法值得借鉴。对于 <code>python</code> 语言,字典也好,类对象也罢(对象的话就是通过成员访问),取决于具体的适用场景,但是都可以避免通过切片这样的硬编码方式去获取函数的返回值。</p><h2 id="使用类规范函数返回值"><a href="#使用类规范函数返回值" class="headerlink" title="使用类规范函数返回值"></a>使用类规范函数返回值</h2><p>对于一个函数,接受原生的数据 <code>raw_data</code> 完成解析,并返回各种信息数据:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">func</span>(<span class="params">raw_data</span>):</span><br><span class="line"> ...</span><br><span class="line"> <span class="keyword">return</span> info1, info2, info3, info4, info5, info6</span><br></pre></td></tr></table></figure><p>但是其他函数使用返回值时,info1到info6这些信息并不是全部都需要使用。有时候仅仅需要使用 <code>info1</code> 和 <code>info4</code>,很烂的写法有两种:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="number">1.</span> </span><br><span class="line">info1, _, _, info4, _, _ = func(raw_data)</span><br><span class="line"></span><br><span class="line"><span class="number">2.</span></span><br><span class="line">data = func(raw_data)</span><br><span class="line">info1 = data[<span class="number">0</span>]</span><br><span class="line">info4 = data[<span class="number">3</span>]</span><br></pre></td></tr></table></figure><p>上述写法,当 <code>func</code> 函数发生变化,如:增加其他返回值、删除无用的返回值时,对于代码维护而言都是一场灾难。千万不要假设需求不会变化,也不要假设针对接口编程时接口始终不变,永远不知道会面临什么新的鬼需求和变动。就算是针对接口编程,每个函数的返回值是什么,返回值的顺序都需要记住,是一种很累的事情。</p><p>除了上文讲述的使用字典或者类之外,还有一种其他方法:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">Info</span>:</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self</span>):</span><br><span class="line"> self.__idx = {</span><br><span class="line"> <span class="string">"info1"</span> : <span class="number">0</span>,</span><br><span class="line"> <span class="string">"info2"</span> : <span class="number">1</span>,</span><br><span class="line"> <span class="string">"info3"</span> : <span class="number">2</span>,</span><br><span class="line"> <span class="string">"info4"</span> : <span class="number">3</span>,</span><br><span class="line"> <span class="string">"info5"</span> : <span class="number">4</span>,</span><br><span class="line"> <span class="string">"info6"</span> : <span class="number">5</span>,</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">get_item</span>(<span class="params">self, data, args</span>):</span><br><span class="line"> return_val = []</span><br><span class="line"> <span class="keyword">for</span> i <span class="keyword">in</span> args:</span><br><span class="line"> return_val.append(data[self.__idx[i]])</span><br><span class="line"> <span class="keyword">if</span> <span class="built_in">len</span>(return_val) == <span class="number">1</span>:</span><br><span class="line"> <span class="keyword">return</span> return_val[<span class="number">0</span>]</span><br><span class="line"> <span class="keyword">return</span> return_val</span><br><span class="line"></span><br><span class="line">info = Info()</span><br><span class="line">info1, info2 = info.get_item(func(raw_data), [<span class="string">"info1"</span>, <span class="string">"info2"</span>])</span><br></pre></td></tr></table></figure><p>只需要创建一个对象,在 <code>get_item</code> 这个函数的参数中指定自己想要获取的参数和顺序即可。即使函数 <code>func</code> 的返回值发生了顺序、数量等方面的变化,也只需要修改一下 <code>__idx</code> 成员即可。</p><p>仿佛不如字典简单?确切来说,这种方法有自己的适用场景:当 <code>A</code> 函数获取 <code>info.get_item</code> 信息后需要进行 <code>postA</code> 的后处理,当 <code>B</code> 函数获取 <code>info.get_item</code> 信息后需要进行 <code>postB</code> 的后处理。这样,就可以把 <code>postA</code> 和 <code>postB</code> 放入到 <code>class Info</code> 中,将分散到各地的相同逻辑的代码整合到一起。至于 <code>"info1"</code> 和 <code>"info2"</code> 这种硬编码,也可以用前面讲的东西规避掉。</p><p>需要注意的是,这种实现是比较耗时的。如果这个方法到处被调用,会增加程序的执行时间。耗时这一点是通过 <code>py-spy + speedscope</code> 这两个工具发现的,推荐一下这两个工具,用来观察 <code>python</code> 代码中的性能瓶颈。</p><h1 id="关于代码的组织架构"><a href="#关于代码的组织架构" class="headerlink" title="关于代码的组织架构"></a>关于代码的组织架构</h1><p>文件、文件夹都要做好各司其职,不要怕麻烦,写好 <code>__init__.py</code>,不要把很多文件胡乱的扔到单个文件夹里随意的调用,甚至没有文件夹。时间长了或者当别人用的时候,真的很乱。这次任务我实现了经典的 <code>MVC</code> 模式。</p><ul><li><code>model</code> 就是数据解析,存储和维护一些数据结构,如果想要的数据不能直接获取,也可以在 <code>model</code> 里增加一些获取数据的接口。建议将 <code>model</code> 封装为一个类,在一个方法里读取文件,解析得到数据结构,并放到类成员中,方便接口调用获取数据,也避免重复读文件和数据传来传去带来的拷贝开销。交由一个对象去维护数据,由对象的接口去操作数据。而不是将数据读取放到全局变量,任由各个代码、各个函数随意操作。</li><li><code>view</code> 是数据的展示,以什么形式和结构展示给用户,显示界面、写出文件或命令行输出等形式;</li><li><code>control</code> 是交互的控制,用于捕捉用户请求,按照请求访问 <code>model</code> 的接口并获得想要的数据,再调用 <code>view</code> 接口反馈给用户。</li></ul><p>当需要获取很多种类型的数据时,开发重点在 <code>model</code> 部分,因为 <code>control</code> 只是调用获取数据的接口,<code>view</code> 只是展示数据。当需要 <code>A</code> 类型的数据时,<code>control</code> 调用 <code>model</code> 的 <code>getA()</code> 方法即可,当需要 <code>B</code> 类型的数据时,调用 <code>model</code> 的 <code>getB()</code> 方法。</p><p>重点就是这两个方法去如何实现,如何设计高效的数据结构去维护数据,来减少数据的拷贝和优化获取数据的效率。总不能 <code>getA()</code> 的时候重新读文件,<code>getB()</code> 的时候再去读文件,对吧。这就需要在 <code>model</code> 部分下工夫,比如这次就用到了数据结构中经典的 <code>dfs</code>+树的后根法快速解析了数据。<del>leetcode 没白刷了属于是</del></p><h1 id="关于代码维护"><a href="#关于代码维护" class="headerlink" title="关于代码维护"></a>关于代码维护</h1><p>额外的,在开发 <code>model</code> 时也有其他的收获:写代码尽可能将各个模块独立封装,写出高内聚,低耦合的传说级代码。虽然当函数很多时会很看着有些乱,怎么到处是函数?但是也有重要的优点:代码和数据重用方便。比如要增加一个新功能,只需要写一点函数,其他函数也许已经实现了,我们直接调用就好,而且不易出错。</p><p>如果写一个大函数完成一个功能 <code>A</code>,在写另外一个大函数完成功能 <code>B</code>,这两个大函数操作的变量会有重叠,也会有一些重复的逻辑。当其中的逻辑过于复杂时,难免出错。十分建议将功能剥离开来。</p><p>这种低耦合+配置文件的形式也可以灵活的解决一些暂时不确定的场景。领导告诉你说:暂时有 <code>A,B,C,D,E</code> 这五种类型,需要分类处理,后面可能会有改动。你兴冲冲的把这些类型作为字典的 <code>key</code> 完成了分类处理。</p><p>某天领导又说,把 <code>A,B,C</code> 归类为类型 1,把 <code>D,E</code> 归类为类型 2,根据不同的类型创建不同的文件夹,但是后面可能还会变动。不到半小时,又收到通知说把 <code>D</code> 归为类型 1,<code>A</code> 的名字改为 <code>Afunc</code>,删除类型 2,并增加 <code>F,G,H</code> 为类型 3。既要修改类型,又要映射关系,去大段的代码函数里修改这些内容真的很累的,也很容易出错。这个时候可以在配置文件里写一个映射函数,每次修改这个小函数并调用就可以了。</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">map</span>(<span class="params">name</span>):</span><br><span class="line"> <span class="keyword">if</span> name <span class="keyword">in</span> [<span class="string">"Afunc"</span>, <span class="string">"B"</span>, <span class="string">"C"</span>, <span class="string">"D"</span>]:</span><br><span class="line"> <span class="keyword">return</span> <span class="number">1</span></span><br><span class="line"> <span class="keyword">elif</span> name <span class="keyword">in</span> [<span class="string">"F"</span>, <span class="string">"G"</span>, <span class="string">"H"</span>]:</span><br><span class="line"> <span class="keyword">return</span> <span class="number">3</span></span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> <span class="keyword">return</span> -<span class="number">1</span></span><br></pre></td></tr></table></figure><p>总结,<strong>不要假设需求是不变的,这样写出来的代码很烂;需求发生改变时,代码修改难度也很大。</strong></p><ul><li>它就应该是这样,不存在其他情况;</li><li>这种情况不会出现,就先不考虑了;</li></ul><p>程序员最好杜绝以上想法,不然写代码一时爽,改代码火葬场。场景会发生变化,需求永远是在变化。异常情况做好处理,减少代码的硬编码,降低代码功能的耦合度,针对接口编程,学过的设计模式也都可以用起来。避免需求发生变化时大量的修改代码,尽可能通过增加新接口和新函数来适应新的需求。</p>]]></content>
<summary type="html"><p>职场新人兼新手程序员斗胆开了新坑「如何写出更好的程序」,所见所得都是来自实际写代码时自己的思考,且已脱敏。这一系列不包含任何复杂的技术,也不包含任何难懂的代码。只是将核心问题暴露出来,针对这些场景,如何写出可维护性更高、更简洁优雅的代码。</p>
<p>以 <code>python</code> 为例,本文的主要内容包括:如何使用配置文件,以及如何减少代码中的硬编码,引申到了代码的组织架构和可维护性上。</p></summary>
<category term="Design" scheme="https://muyuuuu.github.io/tags/Design/"/>
</entry>
<entry>
<title>git 实操记录</title>
<link href="https://muyuuuu.github.io/2023/09/13/git-use-1/"/>
<id>https://muyuuuu.github.io/2023/09/13/git-use-1/</id>
<published>2023-09-13T15:16:44.000Z</published>
<updated>2024-11-27T16:40:30.206Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" class="aplayer-secondary-style-marker" href="/assets/css/APlayer.min.css"><script src="/assets/js/APlayer.min.js" class="aplayer-secondary-script-marker"></script><p>2023.7.10 入职距今已经两个月零 3 天了,培训课程十分紧张也没来得及做一些技术的思考和整理。<del>主要是下班回家后只想躺着玩手机,周末持续性出去撒欢。</del> 但只学习不思考和整理是程序员的忌讳。培训课程结束后,会对这两个月的培训时间进行思考,同时对未来该怎么更好的工作也进行一个思考,甚至包括如何更好的休息锻炼来保持充沛的精力。</p><p>回到正文,<code>git</code> 是程序员写框架和交流代码时的必要工具,而<strong>过于贫瘠的实操经验导致我真的不会这玩意</strong>。尤其是多人协作 <code>pull, merge</code> 或者 <code>reset</code> 时,时常把代码搞的乱七八糟。所以在这里记录 <code>git</code> 的踩坑记录。</p><p><code>git</code> 操作时很大程度受限于实际的情景,本地基于什么分支进行了什么修改,是否暂存,是否提交,是否有冲突等等等等。出问题后去网上搜索时,网上的例子和本地的例子不一定很符合,或者说只有一半符合。往往不知道该执行哪些命令,是否会把文件弄的很乱无法撤回。</p><p>这个时候建议把实际情景描述一下,去问问 <code>GPT</code>,以我的使用经验,得到的回答 99.9% 都是可用的。</p><span id="more"></span><h1 id="git-开发时,A-分支的代码泄漏到了-B-分支-?"><a href="#git-开发时,A-分支的代码泄漏到了-B-分支-?" class="headerlink" title="git 开发时,A 分支的代码泄漏到了 B 分支 ?"></a><code>git</code> 开发时,A 分支的代码泄漏到了 B 分支 ?</h1><h2 id="问题背景"><a href="#问题背景" class="headerlink" title="问题背景"></a>问题背景</h2><p>当时想实现 <code>master</code> 分支只有 <code>README.md, .gitignore, 3rdparty</code> 等公共文件。</p><ul><li>对于任务一,新建 <code>dev1</code> 分支,并在 <code>dev1</code> 文件夹里面写代码</li><li>对于任务二,新建 <code>dev2</code> 分支,并在 <code>dev2</code> 文件夹里面写代码</li></ul><p>这样 <code>dev1</code> 和 <code>dev2</code> 分支的代码位于不同文件夹,互不干扰。最后全部合并到 <code>master</code> 分支的时候,也不会产生冲突。</p><h2 id="错误操作"><a href="#错误操作" class="headerlink" title="错误操作"></a>错误操作</h2><p>在实现期间出现了一个漏洞,当完成 <code>dev1</code> 任务的代码后,直接在 <code>dev1</code> 分支下 <code>git checkout -b dev2</code>,这样就会发生:<code>dev2</code> 分支下有 <code>dev1</code> 的代码,不是很优雅。</p><p>当时培训课程的进度比较紧张,也没有刻意去关注这个问题。只是在 <code>dev2</code> 分支下手动删除了 <code>dev1</code> 文件夹的代码,这样在 <code>git status</code> 的时候会看到很多 <code>delete</code> 信息,且会随着 <code>dev2</code> 分支的提交而提交到 <code>gitlab</code> 中,<code>merge</code> 时会看到很多无用的删除文件信息。</p><h2 id="正确做法"><a href="#正确做法" class="headerlink" title="正确做法"></a>正确做法</h2><p>随着课程的陆续学习,框架规模越来越大,代码文件也越来越复杂。由于自己的 <code>git</code> 实操很少,担心 <code>git</code> 误操作后导致分支或文件过于混乱。又回过头来重新看这一问题,在本地进行一些简单的实验后发现了正确做法。</p><p>在完成 <code>dev1</code> 分支的代码并提交后,应该 <code>git checkout master</code>,在 <code>master</code> 分支下新建 <code>dev2</code> 分支,这样才能实现 <code>dev2</code> 分支不含 <code>dev1</code> 的代码,保证提交代码时的信息足够干净。</p><h1 id="记一次代码污染"><a href="#记一次代码污染" class="headerlink" title="记一次代码污染"></a>记一次代码污染</h1><h2 id="背景"><a href="#背景" class="headerlink" title="背景"></a>背景</h2><p>起因:需求是将本地 <code>local</code> 分支提交到 <code>develop</code> 分支。我理解成了将本地的 <code>local</code> 分支提交到 <code>develop</code> 分支,并向 <code>master</code> 提交 <code>PR</code>。于是执行了:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git push -u origin local:develop</span><br></pre></td></tr></table></figure><p>这样就导致了代码污染。因为可能有其他人基于 <code>develop</code> 分支开发代码,而我的 <code>local</code> 代码直接覆盖了远程的 <code>develop</code> 代码。</p><ul><li>其他人提交代码的时候,会导致代码冲突;</li><li>其他人获取 <code>develop</code> 代码时,会获取到我的 <code>local</code> 代码,但是我的 <code>local</code> 代码没有经过检查和测试,负责模块整合的人也没有处理我这个模块可能存在的异常。所以很可能在运行期间存在错误。</li></ul><h2 id="正确做法-1"><a href="#正确做法-1" class="headerlink" title="正确做法"></a>正确做法</h2><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git push -u origin local</span><br></pre></td></tr></table></figure><p>这样远程仓库中就会有一个 <code>local</code> 分支,提交 <code>PR</code> 时将 <code>local</code> 分支提交到 <code>develop</code> 分支即可。为什么要添加 <code>-u</code> 参数?</p><p>如果你在本地仓库中使用 <code>git clone</code> 命令克隆了一个远程仓库,并在本地仓库中使用 <code>git checkout -b A</code> 命令创建了一个名为 <code>A</code> 的新分支,并使用 <code>git push A</code> 命令将该分支推送到远程仓库,那么远程仓库将会有一个名为 <code>A</code> 的分支。</p><p>但是在使用 <code>git push</code> 命令时,你需要指定要推送的分支和远程仓库的名称。如果你使用 <code>git push A</code> 命令,<code>git</code> 将会尝试将本地仓库中名为 <code>A</code> 的分支推送到远程仓库中名为 <code>A</code> 的分支,但是如果远程仓库中不存在名为 <code>A</code> 的分支,<code>git</code> 将会报错。</p><p>因此,如果你想要将本地仓库中的 <code>A</code> 分支推送到远程仓库,并且希望在远程仓库中创建一个名为 <code>A</code> 的分支,应该使用以下命令:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git push -u origin A</span><br></pre></td></tr></table></figure><p>这将会将本地仓库中的 <code>A</code> 分支推送到名为 <code>origin</code> 的远程仓库,并在远程仓库中创建一个名为 <code>A</code> 的分支。</p><h2 id="使用代码回撤来解决代码污染"><a href="#使用代码回撤来解决代码污染" class="headerlink" title="使用代码回撤来解决代码污染"></a>使用代码回撤来解决代码污染</h2><p>但是现在已经做错了,需要使用代码回撤来修复污染。可以使用 <code>git reflog</code> 命令查看本地仓库的提交历史,找到 <code>develop</code> 分支的提交记录。使用<code>git reset</code> 命令将代码重置到 <code>develop</code>。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">$ git reflog</span><br><span class="line"></span><br><span class="line">...</span><br><span class="line">HEAD@{1}: commit: <commit message></span><br><span class="line">HEAD@{2}: commit: <commit message></span><br><span class="line">HEAD@{3}: commit: <commit message></span><br><span class="line">HEAD@{4}: commit: <commit message></span><br><span class="line">HEAD@{5}: commit: <commit message></span><br><span class="line">HEAD@{6}: commit: <commit message></span><br><span class="line">HEAD@{7}: commit: <commit message></span><br><span class="line">...</span><br></pre></td></tr></table></figure><p>找到最后一个 <code>develop</code> 分支的提交记录,记下该提交的哈希值。运行 <code>git reset</code> 命令将本地仓库的 <code>develop</code> 分支重置到该提交记录。例如,如果最后一个 <code>develop</code> 分支的提交记录的哈希值为 <code>abc123</code>,则可以运行以下命令:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ git reset --hard abc123</span><br></pre></td></tr></table></figure><p>运行 <code>git push --force</code> 命令将本地仓库的 <code>develop</code> 分支强制推送到远程仓库。请注意,这将覆盖远程仓库中的 <code>develop</code> 分支,因此请确保已经找到了正确的提交记录。这样就能恢复 <code>develop</code> 分支之前的代码。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ git push --force origin develop</span><br></pre></td></tr></table></figure><h1 id="git-实用命令"><a href="#git-实用命令" class="headerlink" title="git 实用命令"></a>git 实用命令</h1><h2 id="一般的开发流程"><a href="#一般的开发流程" class="headerlink" title="一般的开发流程"></a>一般的开发流程</h2><p>首先克隆仓库</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git clone git@xxx.git</span><br></pre></td></tr></table></figure><p>创建本地分支,并对应远程分支</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">git branch -a // 查看分支</span><br><span class="line">git checkout -b local_branch remote_branch // 切换分支并对应远程分支</span><br><span class="line"></span><br></pre></td></tr></table></figure><h2 id="获取新分支"><a href="#获取新分支" class="headerlink" title="获取新分支"></a>获取新分支</h2><p><code>clone</code> 仓库的 1 天后,有新分支提交到了远程仓库,所以本地没有这个分支。为了查看新分支的代码,需要更新分支:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git remote update origin -p</span><br></pre></td></tr></table></figure><h2 id="暂存修改"><a href="#暂存修改" class="headerlink" title="暂存修改"></a>暂存修改</h2><p>在新分支开发代码时,遇到紧急任务需要切换到其他分支修复漏洞。但是新分支的代码才写了一点点还没有 <code>commit</code>,如果直接 <code>git checkout</code> 会报错,因为新分支的修改没有被存下来或提交。此时可以暂存修改:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">git stash save "your_label" // 暂存当前未提交的更改</span><br><span class="line">git checkout <branch_name> // 切换到另一个分支</span><br></pre></td></tr></table></figure><p><code>save</code> 用于给 <code>stash</code> 的内容添加一个标签,后期可以根据加入的 label 恢复 <code>stash</code> 时容易找到。 </p><p>需要注意的是,<code>stash</code> 命令不会暂存新添加的文件,只会暂存修改的文件。如果要暂存新添加的文件:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git stash save "your_label" --include-untracked // 会暂存新添加的文件</span><br></pre></td></tr></table></figure><p>当你完成其他工作并切换回原分支时,先 <code>git pull</code> 拉取分支的远程更新,如果有冲突就处理冲突,之后可以使用以下命令还原暂存的更改:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git stash pop</span><br></pre></td></tr></table></figure><p>不建议以下的操作,因为这会直接放弃当前分支的修改:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git checkout -f <branch_name> // 切换到另一个分支并丢弃未提交的更改</span><br></pre></td></tr></table></figure><h2 id="git-丢弃本地的修改"><a href="#git-丢弃本地的修改" class="headerlink" title="git 丢弃本地的修改"></a>git 丢弃本地的修改</h2><p>代码改的乱七八糟不想要了:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git reset --hard HEAD</span><br></pre></td></tr></table></figure><p>撤销对单个文件的修改(文件依然在工作区):<code>git checkout scripts/run_android.sh</code> </p><h2 id="临时代码推送"><a href="#临时代码推送" class="headerlink" title="临时代码推送"></a>临时代码推送</h2><p>临时创建了一个文件夹复现了某个问题,需要把这份代码提交到某个仓库。在 <code>git init</code> 之后增加远程仓库:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git remote add origin git@xxx:xxx.git</span><br></pre></td></tr></table></figure><p>因为是临时新建的仓库,所以目前处于 <code>master</code> 分支。执行下面命令,将本地的 <code>master</code> 分支推送到远程的 <code>test</code> 分支(远程没有的话会自动创建):</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git push origin master:test // 不加 master: 会报错,因为本地没有 test 分支</span><br></pre></td></tr></table></figure><h2 id="修改错别字,不值得重新-commit"><a href="#修改错别字,不值得重新-commit" class="headerlink" title="修改错别字,不值得重新 commit"></a>修改错别字,不值得重新 commit</h2><p>首先修改小错误,然后:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">git add .</span><br><span class="line">git commmit --amend</span><br></pre></td></tr></table></figure><p>如果此时直接 <code>push</code> 会报错,因为 <code>git status</code> 显示并没有新的内容。如果是提交到自己的分支,在不影响他人的开发的情况下可以直接:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git push origin master:test -f</span><br></pre></td></tr></table></figure><p>这样仓库上只显示一次 <code>commit</code> 记录。如果不是强制推送,那么会遇到下面的问题:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">To git@github.xxxx.git</span><br><span class="line"> ! [rejected] master -> main (non-fast-forward)</span><br><span class="line">error: failed to push some refs to 'git@github.xxxx.git'</span><br><span class="line">hint: Updates were rejected because a pushed branch tip is behind its remote</span><br><span class="line">hint: counterpart. Check out this branch and integrate the remote changes</span><br><span class="line">hint: (e.g. 'git pull ...') before pushing again.</span><br><span class="line">hint: See the 'Note about fast-forwards' in 'git push --help' for details.</span><br></pre></td></tr></table></figure><p>起因是在这次 <code>push</code> 之前有一次 <code>git commit --amend</code> 修改错别字的操作,当时这个修改是没有提交的。所以再次修改代码并提交时,就遇到了冲突。因为同一文件同样的位置有不同的内容,无法自动合并,所以 <code>push</code> 的时候报错。</p><p>此时需要手动 <code>git pull</code> 一下,由用户自己手动 <code>merge</code> 处理冲突。如果是 <code>vscode</code> 的话,看一下哪里修改,如果保留当前版本,点击 <code>accept current change</code> 即可。再次 <code>git add commit push</code> 就没问题了。</p><h2 id="代码写到一半,需要同步同事的代码"><a href="#代码写到一半,需要同步同事的代码" class="headerlink" title="代码写到一半,需要同步同事的代码"></a>代码写到一半,需要同步同事的代码</h2><p>此时只需要在当前分支下 <code>pull</code> 代码。把自己的代码完成后,再次提交到分支。假设远程分支叫 <code>B</code>,基于 <code>B</code> 分支 <code>checkout -b</code> 出 <code>A</code> 分支,在 <code>A</code> 分支写代码。远程分支有更新, <code>merge</code> 了一些修改,让 <code>A</code> 获取到 <code>B</code> 的更新:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git pull origin B</span><br></pre></td></tr></table></figure><h2 id="git-查看本地分支对应的远程分支名"><a href="#git-查看本地分支对应的远程分支名" class="headerlink" title="git 查看本地分支对应的远程分支名"></a>git 查看本地分支对应的远程分支名</h2><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git branch -vv</span><br></pre></td></tr></table></figure><h2 id="git-查看分支差异"><a href="#git-查看分支差异" class="headerlink" title="git 查看分支差异"></a>git 查看分支差异</h2><p><code>git</code> 基于 <code>B</code> 分支创建了分支 <code>A</code>,并在 <code>A</code> 分支进行了修改和提交,提交后,<code>vscode</code> 等编辑器内无法看到修改内容。可以通过下述命令查看 <code>A</code> 分支和 <code>B</code> 分支的差异,也就是看 <code>A</code> 分支都改动了哪里。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git diff A B file_path</span><br></pre></td></tr></table></figure><h2 id="多次-commit-记录合并"><a href="#多次-commit-记录合并" class="headerlink" title="多次 commit 记录合并"></a>多次 <code>commit</code> 记录合并</h2><p>为了保证提交信息的整洁,可以使用 <code>git rebase</code> 命令来将多个 <code>commit</code> 合并成一个,并保留代码的修改。以下是具体步骤:</p><ol><li><p>使用 <code>git log</code> 命令查看你想要合并的 <code>commit</code> 记录的哈希值,例如将以下 3 个 <code>commit</code> 记录合并成一个:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">$ git log --oneline</span><br><span class="line">3a2b1c3 Add feature A</span><br><span class="line">2b3c4d5 Fix bug B</span><br><span class="line">1c2d3e4 Implement feature C</span><br></pre></td></tr></table></figure></li><li><p>使用 <code>git rebase -i HEAD~3</code> 命令来打开交互式 <code>rebase</code> 编辑器,其中 <code>HEAD~3</code> 表示要合并的 <code>commit</code> 记录数量。在编辑器中,将第二个和第三个 <code>commit</code> 记录的操作改为 <code>squash</code>,表示将它们合并到第一个 <code>commit</code> 记录中。例如:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">pick 3a2b1c3 Add feature A</span><br><span class="line">squash 2b3c4d5 Fix bug B</span><br><span class="line">squash 1c2d3e4 Implement feature C</span><br></pre></td></tr></table></figure><p>保存并关闭编辑器。</p></li></ol><p><code>pick</code> 操作会将一个提交应用到当前分支,而 <code>squash</code> 操作会将一个提交合并到前一个提交中,从而将多个提交合并成一个。</p><ol><li><p><code>git</code> 会自动打开另一个编辑器,让你编辑合并后的 <code>commit</code> 信息。你可以保留第一个 <code>commit</code> 记录的信息,或者修改为新的 <code>commit</code> 信息。保存并关闭编辑器。</p></li><li><p>使用 <code>git push --force</code> 命令将修改后的 <code>commit</code> 记录推送到远程仓库。注意,由于使用了 <code>--force</code> 参数,这会覆盖远程仓库中的历史记录,因此请确保你的操作不会影响其他人的工作。</p></li></ol><h2 id="git-reset-用法"><a href="#git-reset-用法" class="headerlink" title="git reset 用法"></a>git reset 用法</h2><p><code>git reset</code> 命令用于将当前分支的 <code>HEAD</code> 指针移动到指定的提交,同时可以选择是否修改暂存区和工作目录。<code>--hard</code> 和 <code>--soft</code> 是 <code>git reset</code> 命令的两个选项,它们的区别在于是否修改暂存区和工作目录。</p><ul><li><p><code>--hard</code> 选项会将 <code>HEAD</code> 指针、暂存区和工作目录都重置为指定的提交。这意味着所有未提交的更改都会被丢弃,工作目录中的文件会被覆盖为指定提交中的文件。</p></li><li><p><code>--soft</code> 选项只会将 <code>HEAD</code> 指针移动到指定的提交,而不会修改暂存区和工作目录。这意味着所有未提交的更改都会保留在工作目录中,可以通过 <code>git status</code> 命令查看它们的状态。</p></li></ul><p>一般来说,如果你想完全撤销所有未提交的更改并回到指定的提交,可以使用 <code>--hard</code> 选项。如果你只是想将 <code>HEAD</code> 指针移动到指定的提交,但保留未提交的更改,可以使用 <code>--soft</code> 选项。</p><p>这个命令我用的不多,实际场景用到时在补充。</p>]]></content>
<summary type="html"><p>2023.7.10 入职距今已经两个月零 3 天了,培训课程十分紧张也没来得及做一些技术的思考和整理。<del>主要是下班回家后只想躺着玩手机,周末持续性出去撒欢。</del> 但只学习不思考和整理是程序员的忌讳。培训课程结束后,会对这两个月的培训时间进行思考,同时对未来该怎么更好的工作也进行一个思考,甚至包括如何更好的休息锻炼来保持充沛的精力。</p>
<p>回到正文,<code>git</code> 是程序员写框架和交流代码时的必要工具,而<strong>过于贫瘠的实操经验导致我真的不会这玩意</strong>。尤其是多人协作 <code>pull, merge</code> 或者 <code>reset</code> 时,时常把代码搞的乱七八糟。所以在这里记录 <code>git</code> 的踩坑记录。</p>
<p><code>git</code> 操作时很大程度受限于实际的情景,本地基于什么分支进行了什么修改,是否暂存,是否提交,是否有冲突等等等等。出问题后去网上搜索时,网上的例子和本地的例子不一定很符合,或者说只有一半符合。往往不知道该执行哪些命令,是否会把文件弄的很乱无法撤回。</p>
<p>这个时候建议把实际情景描述一下,去问问 <code>GPT</code>,以我的使用经验,得到的回答 99.9% 都是可用的。</p></summary>
<category term="git" scheme="https://muyuuuu.github.io/tags/git/"/>
</entry>
<entry>
<title>优雅的解决 hexo 推送 index.html 内容为空的奇奇怪怪</title>
<link href="https://muyuuuu.github.io/2023/07/04/hexo-index-null/"/>
<id>https://muyuuuu.github.io/2023/07/04/hexo-index-null/</id>
<published>2023-07-04T10:52:48.000Z</published>
<updated>2023-07-05T09:05:13.461Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" class="aplayer-secondary-style-marker" href="/assets/css/APlayer.min.css"><script src="/assets/js/APlayer.min.js" class="aplayer-secondary-script-marker"></script><p>某天闲来无聊的时候,恍惚的发现我竟然还有个博客?主要是太忙了。 <del>其实是自己过于懈怠没学新东西,休息了半年多也没缓过来</del>。尝试推送了一下,也许是某次滚动更新 Linux 的时候升级了 <code>Node.js</code> ,结果 <code>Node.js</code> 版本过高和 <code>hexo</code> 版本不匹配。这就导致博客推送后, github 仓库中全部的 <code>html</code> 文件内容为空。网上绝大多数博客都是写的降级 <code>Node.js</code>,但这总不是办法,所以不如升级 <code>hexo</code> 来解决问题。</p><span id="more"></span><p>也许在大学的时候遇到过:代码或者软件无法跑通的情况,去问学长或者老师的时候他们就会说,你用的版本太新了,新版本不好用,换成旧版本和我一样就没问题了。总会有人因为可以方便的向老师或者学长提问而屈服于选择旧软件。但从软件开发和维护的角度而言,软件在不断的更新,旧版本无人维护或功能不全。事物在不断的发展,古人都知道不要刻舟求剑,为何抱着老旧软件不放而不选择新软件呢?对于个人使用而言,咬咬牙解决一些 bug 或者版本冲突,问题也就解决了。扯远了,一共两种解决方案,分别是 <code>Node.js</code> 降级或者 <code>hexo</code> 升级,本文推荐后者。</p><h1 id="hexo-与-Node-js-的版本对应关系"><a href="#hexo-与-Node-js-的版本对应关系" class="headerlink" title="hexo 与 Node.js 的版本对应关系"></a>hexo 与 Node.js 的版本对应关系</h1><p>打开 <code>hexo</code> 的<a href="https://hexo.io/zh-cn/docs/index.html">官方文档</a>可以看到 <code>hexo</code> 与 <code>Node.js</code> 的版本对应关系:</p><div class="table-container"><table><thead><tr><th><code>hexo</code> 版本</th><th>最低版本 (<code>Node.js</code> 版本)</th><th>最高版本 (<code>Node.js</code> 版本)</th></tr></thead><tbody><tr><td>6.2+</td><td>12.13.0</td><td>latest</td></tr><tr><td>6.0+</td><td>12.13.0</td><td>18.5.0</td></tr><tr><td>5.0+</td><td>10.13.0</td><td>12.0.0</td></tr><tr><td>4.1 - 4.2</td><td>8.10</td><td>10.0.0</td></tr><tr><td>4.0</td><td>8.6</td><td>8.10.0</td></tr><tr><td>3.3 - 3.9</td><td>6.9</td><td>8.0.0</td></tr><tr><td>3.2 - 3.3</td><td>0.12</td><td>未知</td></tr><tr><td>3.0 - 3.1</td><td>0.10 或 iojs</td><td>未知</td></tr><tr><td>0.0.1 - 2.8</td><td>0.10</td><td>未知</td></tr></tbody></table></div><p>由于我的博客是在 20 年初迁移到新电脑的,<code>hexo</code> 是 3.9.0 的旧版本,而 <code>Node.js</code> 被更新到 20.3.1,也就是版本不匹配,导致博客一波被清空,各种 <code>html</code> 文件没有任何内容。</p><h1 id="Plan1:Node-js-降级"><a href="#Plan1:Node-js-降级" class="headerlink" title="Plan1:Node.js 降级"></a>Plan1:Node.js 降级</h1><p>打开浏览器搜索,这个就是绝大多数的解决方案。这里建议使用 <code>nvm</code> 管理 <code>Node.js</code> 的版本,之后对 <code>nvm</code> 换源,并安装各个版本的 <code>Node.js</code>。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">sudo pacman -Ss nvm // 安装</span><br><span class="line">export NVM_Node.js_ORG_MIRROR=https://npm.taobao.org/mirrors/node/ // 换源</span><br><span class="line">nvm install 12.0.0 // 选择适配 3.9.0 的 nodejs 版本</span><br></pre></td></tr></table></figure><p>通过上述命令,如果没有遇到其他奇怪的 bug 的话,<code>Node.js</code> 12.0 版本就被安装成功了。由于 <code>hexo</code> 默认使用系统安装的 <code>Node.js</code>,而不是 <code>nvm</code> 安装的 <code>Node.js</code>。所以在每次更新博客时需要调用 <code>nvm</code> 切换 <code>Node.js</code> 版本进行推送:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">nvm use 12.0.0 // 切换版本</span><br><span class="line">hexo g --d // 推送博客</span><br></pre></td></tr></table></figure><p>而且由于 <code>hexo</code> 默认使用系统安装的 <code>Node.js</code>,这个版本的 <code>Node.js</code> 不被 <code>nvm</code> 所管理,所以每次推送必须使用 <code>use</code> 命令来切换版本,这个就很繁琐,不够优雅。下述命令是无法起作用的:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">nvm alias default 12.0.0</span><br></pre></td></tr></table></figure><p>此时虽然能推送博客,但由于 <code>hexo</code> 版本过低,在推送时仍然会提示有异常信息:<code>ERROR Plugin load failed: hexo-cli</code>,反正就看着很不爽。</p><p><img data-src="https://user-images.githubusercontent.com/43681138/250651466-27494e36-ce61-41e4-b668-ed1f9cd2d7b5.png" alt></p><p>此外,我使用了 <code>fish</code> 终端,这个终端安装和使用 <code>nvm</code> 有些许的费劲,这里给个<a href="https://eshlox.net/2019/01/27/how-to-use-nvm-with-fish-shell/">教程</a>,防止未来某天我自己忘掉。</p><h1 id="Plan2:hexo-升级"><a href="#Plan2:hexo-升级" class="headerlink" title="Plan2:hexo 升级"></a>Plan2:hexo 升级</h1><p>如上所述,软件升级是不可避免的,每次推送博客需要使用 <code>nvm</code> 去切换版本也过于繁琐。那不如直接升级 <code>hexo</code> 一劳永逸?</p><p>我当时是卸载了全部的 <code>npm,Node.js hexo</code> 重新安装。备注:<code>nvm</code> 是 <code>Node.js</code> 的版本管理工具,<code>npm</code> 是 <code>Node.js</code> 下面的库安装工具,类似 python 的 <code>pip</code>:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">npm uninstall hexo-cli // 卸载 hexo</span><br><span class="line">sudo pacman -Rsc -n nodejs // 卸载 nodejs</span><br><span class="line">sudo pacman -Sy nodejs // 重新安装 nodejs</span><br><span class="line">sudo pacman -Sy npm // 重新安装 npm</span><br></pre></td></tr></table></figure><p>之后,给 <code>npm</code> 换源,并安装 <code>hexo</code> 即可,备注:如果安装无响应或无权限,给下面的命令加个 <code>sudo</code> 即可。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">npm config set registry https://registry.npm.taobao.org // 换源</span><br><span class="line">npm install -g hexo-cli // 安装 hexo</span><br></pre></td></tr></table></figure><p>但是呢我发现,安装后的 <code>hexo</code> 依然是 3.9.0 的旧版本,所以我选择给 <code>hexo</code> 升级,同样,下面的命令如果无法执行时,就加个 <code>sudo</code>。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">npm cache clean -f //清除缓存</span><br><span class="line">// 进入 Hexo 根目录,执行如下命令</span><br><span class="line">npm install -g npm-check // 检查之前安装的插件,都有哪些是可以升级的 </span><br><span class="line">npm install -g npm-upgrade // 升级系统中的插件</span><br><span class="line">npm-check</span><br><span class="line">npm-upgrade</span><br><span class="line"></span><br><span class="line">npm update // 更新 Hexo 及所有插件</span><br></pre></td></tr></table></figure><p>这样,就升级了 <code>hexo</code>,本文升级到了 6.3.0,正好适配最新的 <code>Node.js</code>,推送博客没有任何问题。</p><p>由于我的博客主题配置文件好多年没有更新,而最新的 <code>hexo</code> 和博客的 <code>_config.yaml</code>还有一个冲突:<code>external_link</code> 报错,只需要打开博客配置文件 <code>_config.yaml</code>,找到:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">external_link: true # Open external links in new tab</span><br></pre></td></tr></table></figure><p>修改为:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">external_link:</span><br><span class="line"> enable: true # Open external links in new tab</span><br><span class="line"> field: site # Apply to the whole site</span><br><span class="line"> exclude: ''</span><br></pre></td></tr></table></figure><p>至此,<code>hexo</code> 推送博客时没有任何报错,清清爽爽。</p><h1 id="一些彩蛋"><a href="#一些彩蛋" class="headerlink" title="一些彩蛋"></a>一些彩蛋</h1><p>当时本人对于如何解决这个问题也是一头雾水,胡乱的查阅各种文档,走了很多弯路,试了很多错,在无数次卸载重装后解决了问题。期间一个手滑把 <code>node_modules</code> 给删除了,后面重新安装了数学渲染的库,但 <code>equation</code> 和 <code>aligned</code> 这种环境依然无法被正确渲染,处于乱码的状态,按照<a href="https://mercer5.github.io/2021/10/31/hexo%E6%B8%B2%E6%9F%93%E6%95%B0%E5%AD%A6%E5%85%AC%E5%BC%8F/">这一文章</a>可以正确修复行间公式无法渲染的漏洞。</p><p>该上班了,学到新知识后也许博客可以勤快的更新起来?哦对还有,查阅文档时看到的一个乐子:</p><p><img data-src="https://s1.ax1x.com/2023/07/04/pCs78OO.png" alt></p><h1 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h1><ul><li><a href="https://guoguocai.github.io/2022/06/05/%E5%A6%82%E4%BD%95%E5%8D%87%E7%BA%A7-Hexo-%E7%89%88%E6%9C%AC/">npm 升级 hexo</a></li><li><a href="https://boriskp.github.io/upgrade-hexo-to-v5-1-1/">hexo版本更新报错: “external_link”</a></li></ul>]]></content>
<summary type="html"><p>某天闲来无聊的时候,恍惚的发现我竟然还有个博客?主要是太忙了。 <del>其实是自己过于懈怠没学新东西,休息了半年多也没缓过来</del>。尝试推送了一下,也许是某次滚动更新 Linux 的时候升级了 <code>Node.js</code> ,结果 <code>Node.js</code> 版本过高和 <code>hexo</code> 版本不匹配。这就导致博客推送后, github 仓库中全部的 <code>html</code> 文件内容为空。网上绝大多数博客都是写的降级 <code>Node.js</code>,但这总不是办法,所以不如升级 <code>hexo</code> 来解决问题。</p></summary>
<category term="Blog" scheme="https://muyuuuu.github.io/tags/Blog/"/>
</entry>
<entry>
<title>一个不是很规范的致谢</title>
<link href="https://muyuuuu.github.io/2023/03/03/thanks-thesis/"/>
<id>https://muyuuuu.github.io/2023/03/03/thanks-thesis/</id>
<published>2023-03-03T09:18:27.000Z</published>
<updated>2023-11-13T16:01:55.211Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" class="aplayer-secondary-style-marker" href="/assets/css/APlayer.min.css"><script src="/assets/js/APlayer.min.js" class="aplayer-secondary-script-marker"></script><p>写一个正经的致谢吧,作为学生时代的一个小结尾。毕业论文里的致谢太八股了,前一半内容一定要大幅的感谢老师,感谢老师给的机会和培养。后四分之一写实验室同学,在后面写父母。不能感谢自己,最后一段感谢论文评委,过于官方的东西没意思的很。所以写一些不能放到论文里面的致谢。</p><span id="more"></span><p>想来想去一时间不知道从哪里开始谢起,先感谢一下 <a href="https://github.com/note286">Carol</a> 老师吧,写的 <code>xduts</code> 模板和接口过于强大,让我能愉快的使用 <code>TeX</code> 写硕士毕业论文,不用再花费过多的精力去调整复杂的格式,使用期间也没有遇到任何排版上问题和困难,还耐心的解答了我的各种疑问。<code>Carol</code> 老师原话:毕业论文除内容外的所有东西他都会,比如 pdf 裁边这种很微小但又很细节的东西。在邻近毕业的时候,我没有在微信上走任何形式去感谢任何人,唯独卡老师是个例外。当时说:希望毕业后在工作中还能遇到你这样的人,这大概是我能想到的最高赞扬了。当时还说等我工作赚钱了一定去打赏 <code>xduts</code>。在某天忽然想起来时,陆陆续续打赏了 800 大洋,就当赞助用爱发电的开源项目了。</p><p>想起来 2020 年研究生入学的时候,那时候充满了惶恐和焦虑。当时年轻也不知道如何去选择一个好组和一个好老师,听说了老师的各种事迹后焦虑到呼吸困难。入学后直线加深了我的焦虑,时常担忧未来而在夜里无法入眠。感谢我亲爱的 ykc 师兄不知道和我们在夜里交流了多少次,研一很多次,研二很多次,研三很多次,在实验室,在操场,在小饭馆。虽然他也肩负很大的压力,但也尽可能的舒缓我们的情绪,每次和他聊完都感觉身心安定,坚定了读下去的心。也感谢大师兄 wz,帮我们顶住了老师的压力,每次都尽力的和我们讨论问题,帮我们度过一次次的难关,在其他生活琐事和医食住行等方面也给了我们很多帮助。</p><p>在 21 年 11 月的时候,步入了人生的低谷,整日浑浑噩噩沉迷于无所事事。感谢我的师弟 wzb,和我一起开发华为算子中的难点,帮我分担了很大的压力。在今年的 1 月和 2 月帮我跑毕业论文中的部分实验,再次帮我分担压力和焦虑,让我有时间和经历去写毕业论文。真的十分感谢,我当时还在想,毕业后要不要给师弟买个 PS5。</p><p>除此之外,由于进的组人数极少且没有任何形式的合作和交流,我更多的社交也都在互联网上了。感谢一个水群的网友,来自五湖四海但因写代码相识,和你们聊天消耗了我日常 70% 的话语,代码技术聊到人生哲学,甚至偶尔搞搞黄色和八卦,让我感觉没那么孤独。</p><p>十分感谢给予我经济援助的小伙伴们。研一下半年的经济状况过于贫困,也不好意思去找家里要钱,每天都在芹菜、豆芽、粉条、白菜、西葫芦、豆腐和西红柿之间轮换,因为很便宜。连续吃了几个月之后导致我现在看到这些食物依然反胃,迫于无奈选择了靠程序辅导去赚点钱,感谢你们一笔一笔的经济援助和支持,让我有足够的钱去吃肉、买新衣服、回家能坐高铁,让我活的更加体面。你们人都很好,也希望你们在告别短暂的计算机编程之后,能迎来更好的人生。</p><p>尤其感谢期间认识的 tcr 小姐姐,2022 年的 8 9 月份,找工作压力很大期间还生了一次大病,她不断的安慰和鼓励我,每次都发很多很多的话和语音,给我很多建议,希望我坚持下去打败困难,对于我理解不了的内容还打电话特意解释。大恩不言谢,日后必定请吃饭,请最贵的那种。之后感谢 qq,hkx 和 bmh,不嫌弃和我这样的发疯人士聊天,承担了我大多数孤独和压抑的情绪,在我多次发疯后依然不介意尝试去疏导我的情绪和压力。hkx 在听说我读研的遭遇后,二话不说给我买了很多零食,qq 在知道我失眠后给我邮寄了酸枣仁,原来我还不是孤魂野鬼。</p><p>昨晚在写论文摘要的时候,想起来一件事情。18 年打比赛的时候,最后一天的凌晨 4 点累的不知道自己是谁,就去跟老师说,我写不动了,你能帮我写下摘要吗?老师说行。我直接睡了过去,再次醒来就是 8 点了。老师 40 多岁,还是通宵帮我把所有事情都弄好,我永远像个孩子一样。后来每次写论文摘要的时候,都会想起他的样子。我很感谢我的本科老师,他把我带入了新的生活和世界,让我学到了编程和建模,从此走上了不一样的道路。我还记得他说过的话:学以致用。我还记得最感动的一次, 大三的时候我在犹豫要不要去打比赛,他说:如果我要去,他就把最后一个名额留给我,人我随便挑;如果我不去,最后一个名额也不准备带别人了,当时感动了很久。那年全校 100 多个队伍参赛,只有 4 个一等奖,我是其中之一,那年我的获奖证书被放到学校招新的海报中,也一步步的保研成功。</p><p>也许,人生大部分时候都是痛苦的,只有少数的幸福时刻,就像河面上的少许的波光粼粼。但就是这些少许的亮光,能让河流看起来更美,能照亮绝大多数的平庸或难熬时刻,温暖着我们继续走下去。</p><p>甚至还想感谢 XM,给我提供了人生的第一份工作,开了极具诱惑力的薪资,还是我很向往的工作方向。本科学的 A 方向,对 B 方向感兴趣,研究生学的 C 方向,对 E 方向感兴趣,但没有 E 方向的相关知识储备和项目经验,所以找工作准备的 D 方向。最后 XM 提供的工作方向是 E,兜兜转转还是遇到了最喜欢的方向,真的十分满足。其实还有一些宿命论的味道,我第一门学习的编程语言大一开设的 C++ 课程,之后对编程萌发了兴趣转专业去学计算机,未来的工作方向也是 C++,很长一段时间内都要靠它吃饭了。</p>]]></content>
<summary type="html"><p>写一个正经的致谢吧,作为学生时代的一个小结尾。毕业论文里的致谢太八股了,前一半内容一定要大幅的感谢老师,感谢老师给的机会和培养。后四分之一写实验室同学,在后面写父母。不能感谢自己,最后一段感谢论文评委,过于官方的东西没意思的很。所以写一些不能放到论文里面的致谢。</p></summary>
<category term="Life" scheme="https://muyuuuu.github.io/tags/Life/"/>
</entry>
<entry>
<title>2022, 随便写点</title>
<link href="https://muyuuuu.github.io/2022/12/02/2022/"/>
<id>https://muyuuuu.github.io/2022/12/02/2022/</id>
<published>2022-12-02T14:14:21.000Z</published>
<updated>2022-12-11T09:07:56.166Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" class="aplayer-secondary-style-marker" href="/assets/css/APlayer.min.css"><script src="/assets/js/APlayer.min.js" class="aplayer-secondary-script-marker"></script><p>6年前的12月1号<br>体育课下课后在操场跑了几圈<br>背着当时的初中用过来的破旧书包,去兰园一楼吃了顿鸭扒饭<br>晚上去自习室学高数,分部积分<br>之后每年的12.1都会回忆起那普通的一天,宣告着这一年还有最后一个月<br>今年也不例外 </p><span id="more"></span><p>12岁的时候,觉得动画片这么好看大人怎么不喜欢看呢。总以为20岁以后时间密度和快乐会和童年一样,不断打开的新鲜生活是应接不暇的,每一件事都会历历在目,念念不忘,生活也一定五彩斑斓,总有新领域等待我去玩耍。</p><p>20岁后的这几年才明白,因为各种主客观的壁垒,成年以后的人生在收窄,只能在一个地方永远停留下去,重复的事物越来越多,时间在重复里飞速进行,总觉得根本没做什么一年就过去了。</p><p>人不能同时拥有青春和感受青春,也大概理解了年轻真好的意思,年轻人还有时间去改变一些东西,成年人如果想去改变自己的现状,可以,但会付出很大很大的成本与代价。</p><p>年历仍是在更迭的,但每年都像被水浸泡过一般,界限逐步模糊,无法像幼时那样能一一分得清楚,有期待感。只觉得这几年里都是循环的情绪,堆砌的熟稔,往复的麻木,仿佛依靠惯性在活着。即便偶遇意外的惊喜或猝然的悲恸,事后冷静想想,也好像都是从前早已领教过的二手货。</p><p>今年去西安的时候,下了大雪,我寻思着瑞雪兆丰年;今天完成了找工作的最后一步,寄三方,又下了大雪;两场大雪,也许宣告了青春的结束。</p><p>之前不顺心的时候,总是想着努努力忍一忍,以后去个好地方永远的告别这里,高中是这样,研究生也是这样。</p><p>最近在忙毕设,学校的压迫程度,资本家都自叹不如,期望在毕业之前我的博客还能有所技术产出。</p>]]></content>
<summary type="html"><p>6年前的12月1号<br>体育课下课后在操场跑了几圈<br>背着当时的初中用过来的破旧书包,去兰园一楼吃了顿鸭扒饭<br>晚上去自习室学高数,分部积分<br>之后每年的12.1都会回忆起那普通的一天,宣告着这一年还有最后一个月<br>今年也不例外 </p></summary>
<category term="Life" scheme="https://muyuuuu.github.io/tags/Life/"/>
</entry>
<entry>
<title>秋招结束</title>
<link href="https://muyuuuu.github.io/2022/10/23/fuck-autumn-recruitment/"/>
<id>https://muyuuuu.github.io/2022/10/23/fuck-autumn-recruitment/</id>
<published>2022-10-22T16:06:26.000Z</published>
<updated>2022-10-22T16:30:18.774Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" class="aplayer-secondary-style-marker" href="/assets/css/APlayer.min.css"><script src="/assets/js/APlayer.min.js" class="aplayer-secondary-script-marker"></script><p>寒气逼人的惨淡秋招终于 tnnd 的结束了,4月中旬开始投递,10月中旬拿到 offer,耗时6个月。就业形式异常艰难,简历挂,笔试挂,面试挂,感谢信收割机。一种被累垮的感觉。</p><span id="more"></span><p>大家仿佛都是在 3 月份开始了背八股文,我当时觉得没啥意思就顺手参加了个比赛。本人找的算法岗,由于懒惰和各种原因,在第一场面试开始的时候,我都没有背八股,连梯度消失这样的问题偶没有回答上来。拖延到6月下旬才开始背八股,背的时间不长,断断续续的一个月,每次面试前看看笔记就行,剩下的随缘发挥。</p><p>但同组的就不一样了,他们投的 java 开发,从 java 基础,多线程,JVM,框架,分布式,数据库,网络,系统等等等等他们都要背,如果说我要掌握的知识一周就可以背完,他们的知识至少要背十周。找工作的时候,他们是睡在实验室的。</p><p>最艰难的时候是从7月30号开始的,我清楚的记得那天能投10家公司,除了快手通知我面试外,其余全挂,可惜快手也是一面就挂。在8月的某天下午和晚上连开四场笔试,极限操作,头晕脑胀,手在颤抖,从8月到9月,持续一个月不间断的面试和笔试。这辈子也不想在回忆这种头晕的感觉。</p><p>来形容一下某头部大厂的面试,开局两个 hard 级别的 leetcode 题,我写上来了。结果以为后面会很顺畅,结果呢,面试全程就三个字,嗯,啊,好,面试结束。后来才知道他想用代码题来劝退我,早知道我就不写了。在形容一下某硬件大厂的面试:你了解过XX吗,我说没有;你用过XX吗,我说不好意思只听说过。面试直接结束,全程不到5分钟,至于简历里写了什么,你是做什么的一概不问。</p><p>京东,网易和腾讯的题目都是令人劝退的难度。如果说数学不会还能写个解,编程不会甚至不能写个空格。我还清楚的记得今年的网易,京东和百度都在围绕 <code>red</code> 这个字符串出题,红色意味着警告,可能告诉我们今年形式很严峻吧。蚂蚁笔试干脆交了白卷,后续的笔试也没有参加,不是看不上蚂蚁,是我真的累了;字节笔试一个不会,瞪着屏幕发呆两小时的感觉很难受。7月投了多少公司,8月就收了多少感谢信。</p><p>我在17年因为喜欢代码转专业到了计算机。但是秋招的很长一段时间内患上了代码 PTSD,一看到代码题目就头晕,想睡,本人十分厌恶刷题,找不到丝毫写代码的乐趣,也没有学习的乐趣,一股为了学习而学习的中学味,令人呕吐。以至于后来面试的时候,明明很简单的题,我的下意识反应都是我不会,很简单的题我会想的很复杂。比如求最长回文子串,明明是一个很简单的暴力模拟题,我看到最这个字就往动态规划那边去想,结果写出来的程序又臭又长,我自己都看不下去,写到一半干脆说了不会。</p><p>就像准备了很久的高考,上了考场发现自己害怕,不会,也不敢动笔。百度是这样,快手是这样,滴滴也是这样;我是这样,同组得这样,舍友也是这样,大家都被拖的很累。经济形势不好,今年的就业形势到处是槽点。百度和快手的面试官态度是最好的,夸一下。</p><p>今年最大的意外就是:本科学的A方向,研究生是B方向,准备的C方向,最后的工作是D方向。至于我能拿offer跟我实力没有半毛钱关系,计算机卷的起飞,我被挤到了芯片,医疗,金融,VR等各个方向,没有一个和计算机相关。面试凭实力?错,全凭运气,有的厂的笔试很简单大二学生都会,面试也能聊得来;有的令人想直接关了屏幕再你妈的见。如果可以,我还是想回到大学的校园里,好好补补基础课。面到最后发现还是大学课程的基本功,可惜大学的黄金时光被我荒废。</p><p>感谢各位大哥的帮助,尤其是田学姐数次救我狗命于水火之中。</p>]]></content>
<summary type="html"><p>寒气逼人的惨淡秋招终于 tnnd 的结束了,4月中旬开始投递,10月中旬拿到 offer,耗时6个月。就业形式异常艰难,简历挂,笔试挂,面试挂,感谢信收割机。一种被累垮的感觉。</p></summary>
<category term="Life" scheme="https://muyuuuu.github.io/tags/Life/"/>
</entry>
<entry>
<title>从 0 开始的 TorchScript</title>
<link href="https://muyuuuu.github.io/2022/10/03/torch-jit-1/"/>
<id>https://muyuuuu.github.io/2022/10/03/torch-jit-1/</id>
<published>2022-10-02T16:04:09.000Z</published>
<updated>2022-10-02T16:08:35.887Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" class="aplayer-secondary-style-marker" href="/assets/css/APlayer.min.css"><script src="/assets/js/APlayer.min.js" class="aplayer-secondary-script-marker"></script><p>上一次正儿八经写博客是今年 2 月,5 月做了个比赛总结,其余的博客竟然都是刷题和算法,实属无聊。艰难的日子已经过去,准备学点模型部署相关的东西以及参与一个实际的开源项目,争取数据、算法和工程全链路打通。众所周知,对于一个不是很常用的东西,学完就忘,如 <code>spark, Go</code> 等学过的但很少用的东西,已经被我抛到九霄云外了。所以,这次学完模型的 <code>trace</code> 之后,尝试部署一些能实际运行的软件。</p><span id="more"></span><h1 id="基本概念"><a href="#基本概念" class="headerlink" title="基本概念"></a>基本概念</h1><p><code>TorchScript</code> 是 <code>PyTorch</code> 的 <code>JIT</code> 实现。<code>JIT</code> 全程是 Just In Time Compilation,也就是即使编译。在深度学习中 <code>JIT</code> 的思想更是随处可见,最明显的例子就是 <code>Keras</code> 框架的 model.compile 创建的静态图。</p><ul><li>静态图需要先构建再运行,优势是在运行前可以对图结构进行优化,比如常数折叠、算子融合等,可以获得更快的前向运算速度。缺点也很明显,就是只有在计算图运行起来之后,才能看到变量的值,像 <code>TensorFlow1.x</code> 中的 <code>session.run</code> 那样。</li><li>动态图是一边运行一边构建,优势是可以在搭建网络的时候看见变量的值,便于检查。缺点是前向运算不好优化,因为根本不知道下一步运算要算什么。动态图模型通过牺牲一些高级特性来换取易用性。</li></ul><p>那么那到底 <code>JIT</code> 有哪些特性,使得 <code>torch</code> 这样的动态图框架也要走 <code>JIT</code> 这条路呢?或者说在什么情况下不得不用到 <code>JIT</code> 呢?下面主要通过介绍 <code>TorchScript</code> 来分析 <code>JIT</code> 到底带来了哪些好处。</p><p><code>JIT</code> 是 <code>Python</code> 和 <code>C++</code> 的桥梁,我们可以使用 <code>Python</code> 训练模型,然后通过 <code>JIT</code> 将模型转为语言无关的模块,从而让 <code>C++</code> 可以非常方便得调用,从此「使用 <code>Python</code> 训练模型,使用 <code>C++</code> 将模型部署到生产环境」对 <code>PyTorch</code> 来说成为了一件很容易的事。而因为使用了 <code>C++</code>,我们现在几乎可以把 <code>PyTorch</code> 模型部署到任意平台和设备上:树莓派、iOS、Android 等等。不然每次都要通过 <code>python</code> 调用模型,性能会大打折扣。</p><p>既然是为部署生产所提供的特性,那免不了在性能上面做了极大的优化,如果推断的场景对性能要求高,则可以考虑将模型(<code>torch.nn.Module</code>)转换为 <code>TorchScript Module</code>,再进行推断。有两种方式可以转换:</p><ol><li>使用 <code>TorchScript Module</code> 的更简单的办法是使用 <code>Tracing</code>,<code>Tracing</code> 可以直接将 <code>PyTorch</code> 模型(<code>torch.nn.Module</code>)转换成 <code>TorchScript Module</code>。「 <code>trace</code> 」顾名思义,就是需要提供一个「输入」来让模型 <code>forward</code> 一遍,以通过该输入的流转路径,获得图的结构。这种方式对于 <code>forward</code> 逻辑简单的模型来说非常实用,但如果 <code>forward</code> 里面本身夹杂了很多流程控制语句,就会存在问题,因为同一个输入不可能遍历到所有的逻辑分枝。<strong>而没有被经过的分支就不会被 <code>trace</code> 。</strong></li><li>可以直接使用 <code>TorchScript Language</code> 来定义一个 <code>PyTorch JIT Module</code>,然后用 <code>torch.jit.script</code> 来将他转换成 <code>TorchScript Module</code> 并保存成文件。而 <code>TorchScript Language</code> 本身也是 <code>Python</code> 代码,所以可以直接写在 <code>Python</code> 文件中。对于 <code>TensorFlow</code> 我们知道不能直接使用 <code>Python</code> 中的 <code>if</code> 等语句来做条件控制,而是需要用 <code>tf.cond</code>,但对于 <code>TorchScript</code> 我们依然能够直接使用 <code>if</code> 和 <code>for</code> 等条件控制语句,所以即使是在静态图上,<code>PyTorch</code> 依然秉承了「易用」的特性。</li></ol><h1 id="简单例子"><a href="#简单例子" class="headerlink" title="简单例子"></a>简单例子</h1><h2 id="trace-方法"><a href="#trace-方法" class="headerlink" title="trace 方法"></a>trace 方法</h2><p>首先定义一个简单的模型:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> torch</span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">MyDecisionGate</span>(torch.nn.Module):</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">forward</span>(<span class="params">self, x</span>):</span><br><span class="line"> <span class="comment"># 分支判断</span></span><br><span class="line"> <span class="keyword">if</span> x.<span class="built_in">sum</span>() > <span class="number">0</span>:</span><br><span class="line"> <span class="keyword">return</span> x</span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> <span class="keyword">return</span> -x</span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">MyCell</span>(torch.nn.Module):</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self</span>):</span><br><span class="line"> <span class="built_in">super</span>(MyCell, self).__init__()</span><br><span class="line"> self.dg = MyDecisionGate()</span><br><span class="line"> self.linear = torch.nn.Linear(<span class="number">4</span>, <span class="number">4</span>)</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">forward</span>(<span class="params">self, x, h</span>):</span><br><span class="line"> y = torch.tanh(self.dg(self.linear(x)) + h)</span><br><span class="line"> <span class="keyword">return</span> y</span><br><span class="line"></span><br><span class="line">my_cell = MyCell()</span><br><span class="line"><span class="built_in">print</span>(my_cell)</span><br><span class="line">x, h = torch.rand(<span class="number">1</span>, <span class="number">4</span>), torch.rand(<span class="number">1</span>, <span class="number">4</span>)</span><br><span class="line"><span class="built_in">print</span>(my_cell(x, h))</span><br></pre></td></tr></table></figure><p>我们可以绑定输入对模型进行 <code>trace</code>:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> torch</span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">MyDecisionGate</span>(torch.nn.Module):</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">forward</span>(<span class="params">self, x</span>):</span><br><span class="line"> <span class="keyword">if</span> x.<span class="built_in">sum</span>() > <span class="number">0</span>:</span><br><span class="line"> <span class="keyword">return</span> x</span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> <span class="keyword">return</span> -x</span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">MyCell</span>(torch.nn.Module):</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self</span>):</span><br><span class="line"> <span class="built_in">super</span>(MyCell, self).__init__()</span><br><span class="line"> self.dg = MyDecisionGate()</span><br><span class="line"> self.linear = torch.nn.Linear(<span class="number">4</span>, <span class="number">4</span>)</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">forward</span>(<span class="params">self, x, h</span>):</span><br><span class="line"> y = torch.tanh(self.dg(self.linear(x)) + h)</span><br><span class="line"> <span class="keyword">return</span> y</span><br><span class="line"></span><br><span class="line">my_cell = MyCell()</span><br><span class="line">x, h = torch.rand(<span class="number">1</span>, <span class="number">4</span>), torch.rand(<span class="number">1</span>, <span class="number">4</span>)</span><br><span class="line">trace_model = torch.jit.trace(my_cell, (x, h))</span><br><span class="line"><span class="built_in">print</span>(trace_model(x, h))</span><br><span class="line"><span class="built_in">print</span>(trace_model.code)</span><br><span class="line"><span class="comment"># def forward(self,</span></span><br><span class="line"><span class="comment"># x: Tensor,</span></span><br><span class="line"><span class="comment"># h: Tensor) -> Tensor:</span></span><br><span class="line"><span class="comment"># dg = self.dg</span></span><br><span class="line"><span class="comment"># linear = self.linear</span></span><br><span class="line"><span class="comment"># _0 = torch.add((dg).forward((linear).forward(x, ), ), h)</span></span><br><span class="line"><span class="comment"># return torch.tanh(_0)</span></span><br></pre></td></tr></table></figure><p>可以看到没有出现 <code>if-else</code> 的分支, <code>trace</code> 做的是:运行代码,记录出现的运算,构建 <code>ScriptModule</code>,但是控制流就丢失了。然后流程丢失并不是好事,在 <code>trace</code> 只会对一个输入进行处理的情况下,对不同的输入得到的结果是不一样的,因为输入只会满足一个分支,因此 <code>trace</code> 的程序也只包含一个分支。</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> torch</span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">MyDecisionGate</span>(torch.nn.Module):</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">forward</span>(<span class="params">self, x</span>):</span><br><span class="line"> <span class="keyword">if</span> x.<span class="built_in">sum</span>() > <span class="number">0</span>:</span><br><span class="line"> <span class="keyword">return</span> x</span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> <span class="keyword">return</span> -x</span><br><span class="line"></span><br><span class="line">my_cell = MyDecisionGate()</span><br><span class="line">x = torch.tensor([-<span class="number">0.1</span>, <span class="number">0.05</span>]) <span class="comment"># 这两个 x trace 到的代码是不一样的</span></span><br><span class="line"><span class="comment"># x = torch.tensor([0.1, -0.05])</span></span><br><span class="line">trace_model = torch.jit.trace(my_cell, (x))</span><br><span class="line"><span class="built_in">print</span>(trace_model(x))</span><br><span class="line"><span class="built_in">print</span>(trace_model.code)</span><br></pre></td></tr></table></figure><p>因此,我们认为这样的 <code>trace</code> 没有泛化能力。而这种现象普遍发生在动态控制流中,即:具体执行哪个算子取决于输入的数据。</p><ul><li><code>if x[0] == 4: x += 1</code> 是动态控制流</li><li><code>model: nn.Sequential = ... [m(x) for x in model]</code> 不是</li><li><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">A</span>(nn.Module):</span><br><span class="line"> backbone: nn.Module</span><br><span class="line"> head: Optiona[nn.Module]</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">forward</span>(<span class="params">self, x</span>):</span><br><span class="line"> x = self.backbone(x)</span><br><span class="line"> <span class="keyword">if</span> self.head <span class="keyword">is</span> <span class="keyword">not</span> <span class="literal">None</span>:</span><br><span class="line"> x = self.head(x)</span><br><span class="line"> <span class="keyword">return</span> x</span><br></pre></td></tr></table></figure>不是</li></ul><p>在之后的文章中,会介绍如何使 <code>trace</code> 具备泛化能力。</p><h2 id="script-方法"><a href="#script-方法" class="headerlink" title="script 方法"></a>script 方法</h2><p><code>script</code> 方法直接分析 <code>python</code> 代码进行转换:使用他们提供的 <code>script</code> 编译器,将 <code>python</code> 的代码进行语法分析,并重新解释为 <code>TorchScript</code>。</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> torch</span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">MyDecisionGate</span>(torch.nn.Module):</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">forward</span>(<span class="params">self, x</span>):</span><br><span class="line"> <span class="keyword">if</span> x.<span class="built_in">sum</span>() > <span class="number">0</span>:</span><br><span class="line"> <span class="keyword">return</span> x</span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> <span class="keyword">return</span> -x</span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">MyCell</span>(torch.nn.Module):</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self, dg</span>):</span><br><span class="line"> <span class="built_in">super</span>(MyCell, self).__init__()</span><br><span class="line"> self.dg = dg</span><br><span class="line"> self.linear = torch.nn.Linear(<span class="number">4</span>, <span class="number">4</span>)</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">forward</span>(<span class="params">self, x, h</span>):</span><br><span class="line"> new_h = torch.tanh(self.dg(self.linear(x)) + h)</span><br><span class="line"> <span class="keyword">return</span> new_h, new_h</span><br><span class="line"></span><br><span class="line">scripted_gate = torch.jit.script(MyDecisionGate())</span><br><span class="line"><span class="built_in">print</span>(scripted_gate.code) <span class="comment"># 含有流程控制</span></span><br><span class="line">my_cell = MyCell(scripted_gate)</span><br><span class="line">traced_cell = torch.jit.script(my_cell)</span><br><span class="line"><span class="built_in">print</span>(traced_cell.code)</span><br></pre></td></tr></table></figure><ol><li><code>TorchScript</code> 代码可以被它自己的解释器(一个受限的 <code>Python</code> 解释器)调用。这个解释器不需要获得全局解释锁GIL,这样很多请求可以同时处理。</li><li>这个格式可以让我们保存模型到硬盘上,在另一个环境中加载,例如服务器,也可以使用非 <code>python</code> 的语言。</li><li><code>TorchScript</code> 提供的表示可以做编译器优化,做到更有效地执行。</li><li><code>TorchScript</code> 可以与其他后端/设备运行时进行对接,他们只需要处理整个项目,无需关心细节运算。</li></ol><h1 id="Trace-和-Script-谁更好?"><a href="#Trace-和-Script-谁更好?" class="headerlink" title="Trace 和 Script 谁更好?"></a>Trace 和 Script 谁更好?</h1><p>通过上文我们可以了解到:</p><ul><li><p><code>trace</code> 只记录走过的 <code>tensor</code> 和对 <code>tensor</code> 的操作,不会记录任何控制流信息,如 <code>if</code> 条件句和循环。因为没有记录控制流的另外的路,也没办法对其进行优化。好处是 <code>trace</code> 深度嵌入 <code>python</code> 语言,复用了所有 <code>python</code> 的语法,在计算流中记录数据流。</p></li><li><p><code>script</code> 会去理解所有的 <code>code</code>,真正像一个编译器一样去进行词法分析语法分析句法分析,形成 <code>AST</code> 树,最后再将 <code>AST</code> 树线性化。<code>script</code> 相当于一个嵌入在 <code>Python/Pytorch</code> 的 <code>DSL</code>,其语法只是 <code>Pytorch</code> 语法的子集,这意味着存在一些 <code>op</code> 和语法 <code>script</code> 不支持,这样在编译的时候就会遇到问题。此外,<code>script</code> 的编译优化方式更像是 <code>CPU</code> 上的传统编译优化,重点对于图进行硬件无关优化,并对 <code>if</code>、<code>loop</code> 进行优化。</p></li></ul><p>在大模型的部署上 <code>trace</code> 更好,因为可以有效的优化复杂的计算图,如下所示:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">A</span>(nn.Module):</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">forward</span>(<span class="params">self, x1, x2, x3</span>):</span><br><span class="line"> z = [<span class="number">0</span>, <span class="number">1</span>, <span class="number">2</span>]</span><br><span class="line"> xs = [x1, x2, x3]</span><br><span class="line"> <span class="keyword">for</span> k <span class="keyword">in</span> z: x1 += xs[k]</span><br><span class="line"> <span class="keyword">return</span> x1</span><br><span class="line">model = A()</span><br><span class="line"><span class="built_in">print</span>(torch.jit.script(model).code)</span><br><span class="line"><span class="comment"># def forward(self, x1: Tensor, x2: Tensor, x3: Tensor) -> Tensor:</span></span><br><span class="line"><span class="comment"># z = [0, 1, 2]</span></span><br><span class="line"><span class="comment"># xs = [x1, x2, x3]</span></span><br><span class="line"><span class="comment"># x10 = x1</span></span><br><span class="line"><span class="comment"># for _0 in range(torch.len(z)):</span></span><br><span class="line"><span class="comment"># k = z[_0]</span></span><br><span class="line"><span class="comment"># x10 = torch.add_(x10, xs[k])</span></span><br><span class="line"><span class="comment"># return x10</span></span><br><span class="line"><span class="built_in">print</span>(torch.jit.trace(model, [torch.tensor(<span class="number">1</span>)] * <span class="number">3</span>).code)</span><br><span class="line"><span class="comment"># def forward(self, x1: Tensor, x2: Tensor, x3: Tensor) -> Tensor:</span></span><br><span class="line"><span class="comment"># x10 = torch.add_(x1, x1)</span></span><br><span class="line"><span class="comment"># x11 = torch.add_(x10, x2)</span></span><br><span class="line"><span class="comment"># return torch.add_(x11, x3)</span></span><br></pre></td></tr></table></figure><p>因为 <code>script</code> 试图忠实地表示 <code>Python</code> 代码,所以即使其中一些是不必要的。例如:并不能对 <code>Python</code> 代码中的某些循环或数据结构进行优化。如上例,所以它实际上有变通方法,或者循环可能会在以后的优化过程中得到优化。但关键是:这个编译器并不总是足够聪明。对于复杂的模型, <code>script</code> 可能会生成一个具有不必要复杂性且难以优化的计算图。</p><p><code>tracing</code> 有许多优点,事实上,在 <code>Facebook/Meta</code> 部署的分割和检测模型中,<code>tracing</code> 是默认的选择,仅当必要的时候使用 <code>scripting</code>。因为 <code>trace</code> 不会破坏代码质量,可以结合 <code>script</code> 来避免一些限制。</p><p><code>python</code> 是一个很大很动态的语言,编译器最多只能支持其语法功能和内置函数的子集,同理,<code>script</code> 也不例外。这个编译器支持 <code>Python</code> 的哪个子集?一个粗略的答案是:编译器对最基本的语法有很好的支持,但对任何更复杂的东西(类、内置函数、动态类型等)的支持度很低或者不支持。但并没有明确的答案:即使是编译器的开发者,通常也需要运行代码,看看能不能编译去判断是否支持。</p><p>所以不完整的 <code>Python</code> 编译器限制了用户编写代码的方式。尽管没有明确的约束列表,但可以从经验中看出它们对大型项目的影响:<code>script</code> 的问题会影响代码质量。很多项目只停留在了代码能 <code>script</code> 成功这一层面,使用基础语法,没有自定义类型,没有继承,没有内置函数,没有 <code>lambda</code> 等等的高级特性。因为这些高级的功能编译器并不支持或者部分支持,就会导致在某些情况下成功,但在其他情况下失败。而且由于没有明确的规范哪些是被支持的,因此用户无法推理或解决故障。因此,最终用户会仅仅停留在代码成功搬移,而不考虑可维护性和性能问题,会导致开发者因为害怕报错而停止进一步的探索高级特性。</p><p>如此下去,代码质量可能会严重恶化:垃圾代码开始积累,因为优良的代码有时无法编译。此外,由于编译器的语法限制,无法轻松进行抽象以清理垃圾代码。该项目的可维护状况逐渐走下坡路。如果认为 <code>script</code> 似乎适用于我的项目,基于过去在一些支持 <code>script</code> 的项目中的经验,我可能会出于以下原因建议不要这样做:</p><ul><li>编译成功可能比你想象的更脆弱(除非将自己限制在基本语法上):你的代码现在可能恰好可以编译,但是有一天你会在模型中添加一些更改,并发现编译失败;</li><li>基本语法是不够的:即使目前你的项目似乎不需要更复杂的抽象和继承,但如果预计项目会增长,未来将需要更多的语言特性。</li></ul><p>以多任务检测器为例:</p><ul><li>可能有 10 个输入,因此最好使用一些结构/类。</li><li>检测器有许多架构选择,这使得继承很有用。</li><li>大型、不断增长的项目肯定需要不断发展的抽象来保持可维护性。</li></ul><p>因此,这个问题的现状是:<code>script</code> 迫使你编写垃圾的代码,因此我们仅在必要时使用它。</p><h1 id="Trace-细节"><a href="#Trace-细节" class="headerlink" title="Trace 细节"></a>Trace 细节</h1><p><code>trace</code> 让模型的 <code>trace</code> 更清楚,对代码质量有很少的影响。</p><p>如果模型不是以 <code>Pytorch</code> 格式表示的计算图,则 <code>script</code> 和 <code>trace</code> 都不起作用。例如,如果模型具有 <code>DataParallel</code> 子模块,或者如果模型将张量转换为 <code>numpy</code> 数组并调用 <code>OpenCV</code> 函数等,则必须对其进行重构。除了这个明显的限制之外,对 <code>trace</code> 只有两个额外的要求:</p><ul><li><p>输入/输出格式是 <code>Tensor</code> 类型时才能被 <code>trace</code>。但是,这里的格式约束不适用于子模块:子模块可以使用任何输入/输出格式:类、<code>kwargs</code> 以及 <code>Python</code> 支持的任何内容。格式要求仅适用于最外层的模型,因此很容易解决。如果模型使用更丰富的格式,只需围绕它创建一个简单的包装器,它可以与 <code>Tuple[Tensor]</code> 相互转换。</p></li><li><p><code>shape</code>。<code>tensor.size(0)</code> 是 <code>eager</code> 模式下的整数,但它是 <code>tracing mode</code> 下的 <code>tensor</code>。这个差异在 <code>trace</code> 时是必要的,<code>shape</code> 的计算可以被捕获为计算图中的算子。由于不同的返回类型,如果返回的一部分是 <code>shape</code> 是整数则无法 <code>trace</code> ,这通常可以简单的解决。此外,一个有用的函数是 <code>torch.jit.is_tracing</code>,它检查代码是否在 <code>trace</code> 模式下执行。</p></li></ul><p>我们来看个例子:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">>>> </span>a, b = torch.rand(<span class="number">1</span>), torch.rand(<span class="number">2</span>)</span><br><span class="line"><span class="meta">>>> </span><span class="keyword">def</span> <span class="title function_">f1</span>(<span class="params">x</span>): <span class="keyword">return</span> torch.arange(x.shape[<span class="number">0</span>])</span><br><span class="line"><span class="meta">>>> </span><span class="keyword">def</span> <span class="title function_">f2</span>(<span class="params">x</span>): <span class="keyword">return</span> torch.arange(<span class="built_in">len</span>(x))</span><br><span class="line"><span class="meta">>>> </span><span class="comment"># See if the two traces generalize from a to b:</span></span><br><span class="line"><span class="meta">>>> </span>torch.jit.trace(f1, a)(b)</span><br><span class="line">tensor([<span class="number">0</span>, <span class="number">1</span>])</span><br><span class="line"><span class="meta">>>> </span>torch.jit.trace(f2, a)(b)</span><br><span class="line">tensor([<span class="number">0</span>]) <span class="comment"># WRONG!</span></span><br><span class="line"><span class="meta">>>> </span><span class="comment"># Why f2 does not generalize? Let's compare their code:</span></span><br><span class="line"><span class="meta">>>> </span><span class="built_in">print</span>(torch.jit.trace(f1, a).code, torch.jit.trace(f2, a).code)</span><br><span class="line"><span class="keyword">def</span> <span class="title function_">f1</span>(<span class="params">x: Tensor</span>) -> Tensor:</span><br><span class="line"> _0 = ops.prim.NumToTensor(torch.size(x, <span class="number">0</span>))</span><br><span class="line"> _1 = torch.arange(annotate(number, _0), dtype=<span class="literal">None</span>, layout=<span class="number">0</span>, device=torch.device(<span class="string">"cpu"</span>), pin_memory=<span class="literal">False</span>)</span><br><span class="line"> <span class="keyword">return</span> _1</span><br><span class="line"><span class="keyword">def</span> <span class="title function_">f2</span>(<span class="params">x: Tensor</span>) -> Tensor:</span><br><span class="line"> _0 = torch.arange(<span class="number">1</span>, dtype=<span class="literal">None</span>, layout=<span class="number">0</span>, device=torch.device(<span class="string">"cpu"</span>), pin_memory=<span class="literal">False</span>)</span><br><span class="line"> <span class="keyword">return</span> _0</span><br></pre></td></tr></table></figure><p>在 <code>trace f2</code> 函数时,<code>lex(x)</code> 是一个定值而非 <code>tensor</code>,这样在传入其他长度的数据时就回报错。除了 <code>len()</code>,这个问题也可能出现在:</p><ul><li><code>.item()</code> 将张量转换为 <code>int/float</code>。</li><li>将 <code>Torch</code> 类型转换为 <code>numpy/python</code> 原语的任何其他代码。</li></ul><p><code>tensor.size()</code> 在 <code>trace</code> 期间返回 <code>Tensor</code>,以便在图中捕获形状计算。用户应避免意外将张量形状转换为常量。使用 <code>tensor.size(0)</code> 而不是 <code>len(tensor)</code>,因为后者是一个 <code>int</code>。这个函数对于将大小转换为张量很有用,在 <code>trace</code> 和 <code>eager</code> 模式下都可以使用。对于自定义类,实现 <code>.size()</code> 方法或使用 <code>.__len__()</code> 而不是 <code>len()</code>,不要通过 <code>int()</code> 转换大小,因为它们会捕获常量。</p><p>这就是 <code>trace</code> 所需要的一切。最重要的是,模型实现中允许使用任何 <code>Python</code> 语法,因为 <code>trace</code> 根本不关心语法。</p><h2 id="Trace-的泛化问题"><a href="#Trace-的泛化问题" class="headerlink" title="Trace 的泛化问题"></a>Trace 的泛化问题</h2><h3 id="Trace-和-Script-混合"><a href="#Trace-和-Script-混合" class="headerlink" title="Trace 和 Script 混合"></a>Trace 和 Script 混合</h3><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">>>> </span><span class="keyword">def</span> <span class="title function_">f</span>(<span class="params">x</span>):</span><br><span class="line"><span class="meta">... </span> <span class="keyword">return</span> torch.sqrt(x) <span class="keyword">if</span> x.<span class="built_in">sum</span>() > <span class="number">0</span> <span class="keyword">else</span> torch.square(x)</span><br><span class="line"><span class="meta">>>> </span>m = torch.jit.trace(f, torch.tensor(<span class="number">3</span>))</span><br><span class="line"><span class="meta">>>> </span><span class="built_in">print</span>(m.code)</span><br><span class="line"><span class="keyword">def</span> <span class="title function_">f</span>(<span class="params">x: Tensor</span>) -> Tensor:</span><br><span class="line"> <span class="keyword">return</span> torch.sqrt(x)</span><br></pre></td></tr></table></figure><p>注意这种代码在 <code>trace</code> 时不会报错,只有 <code>warning</code> 的输出,因此我们要特别关注。<code>trace</code> 和 <code>script</code> 都有各自的问题,最好的方法是混合使用他们。避免影响代码质量,主要的部分进行 <code>trace</code>,必要时进行 <code>script</code>。如果有一个 <code>module</code> 里面有很多选择,但是我们不希望在 <code>TorchScript</code> 里出现,那么应该使用 <code>tracing</code> 而不是 <code>scripting</code>,这个时候,<code>trace</code> 将内联 <code>script</code> 模块的代码。</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> torch</span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">MyDecisionGate</span>(torch.nn.Module):</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">forward</span>(<span class="params">self, x</span>):</span><br><span class="line"> <span class="keyword">if</span> x.<span class="built_in">sum</span>() > <span class="number">0</span>:</span><br><span class="line"> <span class="keyword">return</span> x</span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> <span class="keyword">return</span> -x</span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">MyCell</span>(torch.nn.Module):</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self, dg</span>):</span><br><span class="line"> <span class="built_in">super</span>(MyCell, self).__init__()</span><br><span class="line"> self.dg = dg</span><br><span class="line"> self.linear = torch.nn.Linear(<span class="number">4</span>, <span class="number">4</span>)</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">forward</span>(<span class="params">self, x, h</span>):</span><br><span class="line"> new_h = torch.tanh(self.dg(self.linear(x)) + h)</span><br><span class="line"> <span class="keyword">return</span> new_h, new_h</span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">MyRNNLoop</span>(torch.nn.Module):</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self, scripted_gate, x, h</span>):</span><br><span class="line"> <span class="built_in">super</span>(MyRNNLoop, self).__init__()</span><br><span class="line"> <span class="comment"># 对控制流进行 trace</span></span><br><span class="line"> self.cell = torch.jit.trace(MyCell(scripted_gate), (x, h))</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">forward</span>(<span class="params">self, xs</span>):</span><br><span class="line"> h, y = torch.zeros(<span class="number">3</span>, <span class="number">4</span>), torch.zeros(<span class="number">3</span>, <span class="number">4</span>)</span><br><span class="line"> <span class="keyword">for</span> i <span class="keyword">in</span> <span class="built_in">range</span>(xs.size(<span class="number">0</span>)):</span><br><span class="line"> y, h = self.cell(xs[i], h)</span><br><span class="line"> <span class="keyword">return</span> y, h</span><br><span class="line"></span><br><span class="line">x, h = torch.rand(<span class="number">2</span>, <span class="number">4</span>), torch.rand(<span class="number">2</span>, <span class="number">4</span>)</span><br><span class="line">scripted_gate = torch.jit.script(MyDecisionGate())</span><br><span class="line">rnn_loop = torch.jit.script(MyRNNLoop(scripted_gate, x, h))</span><br><span class="line"><span class="built_in">print</span>(rnn_loop.code)</span><br><span class="line"><span class="built_in">print</span>(rnn_loop.cell.code)</span><br></pre></td></tr></table></figure><p>我们简化一下:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">model.submodule = torch.jit.script(model.submodule)</span><br><span class="line">torch.jit.trace(model, inputs)</span><br></pre></td></tr></table></figure><p>对于不能正确 <code>trace</code> 的子模块,可以进行 <code>script</code> 处理。但是并不推荐,更建议使用 <code>@script_if_tracing</code>,因为这样修改 <code>script</code> 仅限于子模块的内部,而不影响模块的接口。使用 <code>@script_if_tracing</code> 装饰器,在 <code>torch.jit.trace</code> 时,<code>@script_if_tracing</code> 装饰器可以通过 <code>script</code> 编译。通常,这只需要对前向逻辑进行少量重构,以分离需要编译的部分(具有控制流的部分):</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">forward</span>(<span class="params">self, ...</span>):</span><br><span class="line"> <span class="comment"># ... some forward logic</span></span><br><span class="line"><span class="meta"> @torch.jit.script_if_tracing</span></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">_inner_impl</span>(<span class="params">x, y, z, flag: <span class="built_in">bool</span></span>):</span><br><span class="line"> <span class="comment"># use control flow, etc.</span></span><br><span class="line"> <span class="keyword">return</span> ...</span><br><span class="line"> output = _inner_impl(x, y, z, flag)</span><br><span class="line"> <span class="comment"># ... other forward logic</span></span><br></pre></td></tr></table></figure><p>只 <code>script</code> 需要的部分,代码质量相对于全部 <code>script</code> 被破坏的很少,被 <code>@script_if_tracing</code> 装饰的函数必须是不包含 <code>tensor</code> 模块运算的纯函数。因此,有时需要进行更多重构:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># Before:</span></span><br><span class="line"><span class="keyword">if</span> x.numel() > <span class="number">0</span>: <span class="comment"># This branch cannot be compiled by @script_if_tracing because it refers to `self.layers`</span></span><br><span class="line"> x = preprocess(x)</span><br><span class="line"> output = self.layers(x)</span><br><span class="line"><span class="keyword">else</span>:</span><br><span class="line"> output = torch.zeros(...) <span class="comment"># Create empty outputs</span></span><br><span class="line"></span><br><span class="line"><span class="comment"># After:</span></span><br><span class="line"><span class="keyword">if</span> x.numel() > <span class="number">0</span>: <span class="comment"># This branch can now be compiled by @script_if_tracing</span></span><br><span class="line"> x = preprocess(x)</span><br><span class="line"><span class="keyword">else</span>:</span><br><span class="line"> x = torch.zeros(...) <span class="comment"># Create empty inputs</span></span><br><span class="line"><span class="comment"># Needs to make sure self.layers accept empty inputs.</span></span><br><span class="line"><span class="comment"># If necessary, add such condition branch into self.layers as well.</span></span><br><span class="line">output = self.layers(x)</span><br></pre></td></tr></table></figure><p>同样的,我们可以在 <code>script</code> 中嵌套 <code>trace</code>:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">model.submodule = torch.jit.trace(model.submodule, submodule_inputs)</span><br><span class="line">torch.jit.script(model)</span><br></pre></td></tr></table></figure><p>这里的子模块是 <code>trace</code>,但是实际中并不常用,因为会影响子模块的推理(当且仅当子模块的输入和输出都是 <code>tensor</code> 时才适用),这是很大的限制。但是 <code>trace</code> 作为子模块的时候也有很试用的场景:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">A</span>(nn.Module):</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">forward</span>(<span class="params">self, x</span>):</span><br><span class="line"> <span class="comment"># Dispatch to different submodules based on a dynamic, data-dependent condition:</span></span><br><span class="line"> <span class="keyword">return</span> self.submodule1(x) <span class="keyword">if</span> x.<span class="built_in">sum</span>() > <span class="number">0</span> <span class="keyword">else</span> self.submodule2(x)</span><br></pre></td></tr></table></figure><p><code>@script_if_tracing</code> 不能处理这样的控制流,因为它只支持纯函数。如果子模块很复杂不能被 <code>script</code>,使用 <code>trace</code> <code>trace</code> 子模块是很好的选择,这里就是 <code>self.submodule2</code> 和 <code>self.submodule1</code>,类 <code>A</code> 还是要 <code>script</code> 的。</p><h3 id="Script-优势"><a href="#Script-优势" class="headerlink" title="Script 优势"></a>Script 优势</h3><p>事实上,对于大多数视觉模型,动态控制流仅在少数易于编写 <code>script</code> 的子模块中需要。<code>script</code> 相对于 <code>trace</code>,有两个有点:</p><ul><li>一个数据有很多属性的控制流,<code>trace</code> 无法处理</li><li><code>trace</code> 只支持 <code>forward</code> 方法,<code>script</code> 支持更多的方法</li></ul><p>实际上,上述两个功能都在做同样的事情:它们允许以不同的方式使用导出的模型,即根据调用者的请求执行不同的运算符序列。下面是一个这样的特性很有用的示例场景:如果 <code>Detector</code> 是 <code>script</code> 化,调用者可以改变它的 <code>do_keypoint</code> 属性来控制它的行为,或者如果需要直接调用 <code>predict_keypoint</code> 方法。</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">Detector</span>(nn.Module):</span><br><span class="line"> do_keypoint: <span class="built_in">bool</span></span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">forward</span>(<span class="params">self, img</span>):</span><br><span class="line"> box = self.predict_boxes(img)</span><br><span class="line"> <span class="keyword">if</span> self.do_keypoint:</span><br><span class="line"> kpts = self.predict_keypoint(img, box)</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">predict_boxes</span>(<span class="params">self, img</span>): <span class="keyword">pass</span></span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">predict_keypoint</span>(<span class="params">self, img, box</span>): <span class="keyword">pass</span></span><br></pre></td></tr></table></figure><p>这种要求并不常见。但是如果需要,如何在 <code>trace</code> 中实现这一点?我有一个不是很优雅的解决方案:<code>Tracing</code> 只能捕获一个序列的算子,所以自然的方式是对模型进行两次 <code>Tracing</code>:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">det1 = torch.jit.trace(Detector(do_keypoint=<span class="literal">True</span>), inputs)</span><br><span class="line">det2 = torch.jit.trace(Detector(do_keypoint=<span class="literal">False</span>), inputs)</span><br></pre></td></tr></table></figure><p>然后我们可以为它们的模型设置别名(以不重复存储),并将两个 <code>trace</code> 合并到一个模块中以编写 <code>script</code>:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">det2.submodule.weight = det1.submodule.weight</span><br><span class="line"><span class="keyword">class</span> <span class="title class_">Wrapper</span>(nn.ModuleList):</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">forward</span>(<span class="params">self, img, do_keypoint: <span class="built_in">bool</span></span>):</span><br><span class="line"> <span class="keyword">if</span> do_keypoint:</span><br><span class="line"> <span class="keyword">return</span> self[<span class="number">0</span>](img)</span><br><span class="line"> <span class="keyword">else</span>:</span><br><span class="line"> <span class="keyword">return</span> self[<span class="number">1</span>](img)</span><br><span class="line">exported = torch.jit.script(Wrapper([det1, det2]))</span><br></pre></td></tr></table></figure><h3 id="单元测试"><a href="#单元测试" class="headerlink" title="单元测试"></a>单元测试</h3><p>还可以使用单元测试来判断 <code>trace</code> 是否成功:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">assert</span> allclose(torch.jit.trace(model, input1)(input2), model(input2))</span><br></pre></td></tr></table></figure><h3 id="程序优化"><a href="#程序优化" class="headerlink" title="程序优化"></a>程序优化</h3><p>此外,还可以通过<a href="https://github.com/pytorch/pytorch/issues/56998">优化</a>程序,避免掉不必要的特殊情况:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> x.numel() > <span class="number">0</span>:</span><br><span class="line"> output = self.layers(x)</span><br><span class="line"><span class="keyword">else</span>:</span><br><span class="line"> output = torch.zeros((<span class="number">0</span>, C, H, W)) <span class="comment"># Create empty outputs</span></span><br></pre></td></tr></table></figure><h3 id="设备"><a href="#设备" class="headerlink" title="设备"></a>设备</h3><p>此外还需要注意设备问题,在 <code>trace</code> 期间会记录使用的设备,而 <code>trace</code> 不会对不同的设备进行泛化,但是部署时都会有固定的设备,这个问题不用担心。</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">>>> </span><span class="keyword">def</span> <span class="title function_">f</span>(<span class="params">x</span>):</span><br><span class="line"><span class="meta">... </span> <span class="keyword">return</span> torch.arange(x.shape[<span class="number">0</span>], device=x.device)</span><br><span class="line"><span class="meta">>>> </span>m = torch.jit.trace(f, torch.tensor([<span class="number">3</span>]))</span><br><span class="line"><span class="meta">>>> </span><span class="built_in">print</span>(m.code)</span><br><span class="line"><span class="keyword">def</span> <span class="title function_">f</span>(<span class="params">x: Tensor</span>) -> Tensor:</span><br><span class="line"> _0 = ops.prim.NumToTensor(torch.size(x, <span class="number">0</span>))</span><br><span class="line"> _1 = torch.arange(annotate(number, _0), dtype=<span class="literal">None</span>, layout=<span class="number">0</span>, device=torch.device(<span class="string">"cpu"</span>), pin_memory=<span class="literal">False</span>)</span><br><span class="line"> <span class="keyword">return</span> _1</span><br><span class="line"><span class="meta">>>> </span>m(torch.tensor([<span class="number">3</span>]).cuda()).device</span><br><span class="line">device(<span class="built_in">type</span>=<span class="string">'cpu'</span>) <span class="comment"># WRONG!</span></span><br></pre></td></tr></table></figure><h1 id="结论"><a href="#结论" class="headerlink" title="结论"></a>结论</h1><p><code>trace</code> 有明显的局限性:本文大部分时间都在讨论 <code>trace</code> 的局限性以及如何解决它们。我实际上认为这是 <code>trace</code> 的优点:它有明确的限制和解决方案,所以你可以推断它是否有效。相反, <code>script</code> 更像是一个黑匣子:在尝试之前没有人知道它是否有效。</p><p><code>trace</code> 具有较小的代码破坏范围: <code>trace</code> 和 <code>script</code> 都会影响代码的编写方式,但 <code>trace</code> 的代码破坏范围要小得多,并且造成的损害要小得多:</p><ul><li>它限制了输入/输出格式,但仅限于最外层的模块。</li><li>在 <code>trace</code> 中混合 <code>script</code>,但可以只更改受影响模块的内部实现,而不是它们的接口。</li></ul><p>另一方面, <code>script</code> 对以下方面有影响:</p><ul><li>涉及的每个模块和子模块的接口,接口需要高级语法特性,针对接口编程时,千万别在接口设计上妥协。</li><li>这也可能最终影响训练,因为接口通常在训练和推理之间共享。</li></ul><p>这也是为什么 <code>script</code> 会对代码质量造成很大损害的原因。<code>Detectron2</code> 支持 <code>script</code>,但不推荐其他大型项目以可 <code>script</code> 且不丢失抽象为目标,因为这实在有点难度,除非它们也能像阿里巴巴那样得到 <code>PyTorch</code> 团队的支持。</p><p><code>PyTorch</code> 深受用户喜爱,最重要的是编写 <code>Python</code> 控制流。但是 <code>Python</code> 的其他语法也很重要。如果能够编写 <code>Python</code> 控制流( 使用 <code>script</code> )意味着失去其他优秀的语法,我宁愿放弃编写 <code>Python</code> 控制流的能力。事实上,如果 <code>PyTorch</code> 对 <code>Python</code> 控制流不那么执着,并且像这样(类似于 <code>tf.cond</code> 的 <code>API</code>)为我提供了诸如 <code>torch.cond</code> 之类的符号控制流:</p><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">f</span>(<span class="params">x</span>):</span><br><span class="line"> <span class="keyword">return</span> torch.cond(x.<span class="built_in">sum</span>() > <span class="number">0</span>, <span class="keyword">lambda</span>: torch.sqrt(x), <span class="keyword">lambda</span>: torch.square(x))</span><br></pre></td></tr></table></figure><p>然后 <code>f</code> 可以正确 <code>trace</code>,不再需要担心 <code>script</code>。</p><h1 id="保存和加载模型"><a href="#保存和加载模型" class="headerlink" title="保存和加载模型"></a>保存和加载模型</h1><figure class="highlight py"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">traced.save(<span class="string">'wrapped_rnn.pt'</span>)</span><br><span class="line"></span><br><span class="line">loaded = torch.jit.load(<span class="string">'wrapped_rnn.pt'</span>)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(loaded)</span><br><span class="line"><span class="built_in">print</span>(loaded.code)</span><br></pre></td></tr></table></figure><h1 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h1><ol><li>基本概念,<a href="https://zhuanlan.zhihu.com/p/370455320">https://zhuanlan.zhihu.com/p/370455320</a></li><li>两者的优势,<a href="https://zhuanlan.zhihu.com/p/410507557">https://zhuanlan.zhihu.com/p/410507557</a></li><li>trace vs script,<a href="https://ppwwyyxx.com/blog/2022/TorchScript-Tracing-vs-Scripting/">https://ppwwyyxx.com/blog/2022/TorchScript-Tracing-vs-Scripting/</a></li></ol>]]></content>
<summary type="html"><p>上一次正儿八经写博客是今年 2 月,5 月做了个比赛总结,其余的博客竟然都是刷题和算法,实属无聊。艰难的日子已经过去,准备学点模型部署相关的东西以及参与一个实际的开源项目,争取数据、算法和工程全链路打通。众所周知,对于一个不是很常用的东西,学完就忘,如 <code>spark, Go</code> 等学过的但很少用的东西,已经被我抛到九霄云外了。所以,这次学完模型的 <code>trace</code> 之后,尝试部署一些能实际运行的软件。</p></summary>
<category term="Pytorch" scheme="https://muyuuuu.github.io/tags/Pytorch/"/>
</entry>
<entry>
<title>算法系列:双指针</title>
<link href="https://muyuuuu.github.io/2022/08/06/double-pointer/"/>
<id>https://muyuuuu.github.io/2022/08/06/double-pointer/</id>
<published>2022-08-06T15:26:44.000Z</published>
<updated>2022-08-06T16:20:49.366Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" class="aplayer-secondary-style-marker" href="/assets/css/APlayer.min.css"><script src="/assets/js/APlayer.min.js" class="aplayer-secondary-script-marker"></script><p>这几天接连遇到了一些双指针的问题,但是说实话,并没有从这些题中看到一种通用的东西,也就不是能很好的做一个总结,但不得不说双指针是一个很神奇的东西,所以做一道记一道吧。</p><span id="more"></span><h1 id="快慢指针"><a href="#快慢指针" class="headerlink" title="快慢指针"></a>快慢指针</h1><p>快慢指针也是双指针,但是两个指针从同一侧开始遍历数组,将这两个指针分别定义为快指针(fast)和慢指针(slow),两个指针以不同的策略移动,直到两个指针的值相等(或其他特殊条件)为止,如fast每次增长两个,slow每次增长一个。</p><p>常用于链表问题,如:slow开始移动,由于移动速度是 fast 的一半,那么 fast 移动到链表的末尾时,slow 就位于链表的中央,可以用这这种方法求链表的中点。</p><h2 id="26-删除有序数组中的重复项"><a href="#26-删除有序数组中的重复项" class="headerlink" title="26. 删除有序数组中的重复项"></a>26. 删除有序数组中的重复项</h2><p>给你一个升序排列的数组 <code>nums</code>,请你原地删除重复出现的元素,使每个元素只出现一次 ,返回删除后数组的新长度。元素的相对顺序应该保持 一致 。</p><p>由于在某些语言中不能改变数组的长度,所以必须将结果放在数组 <code>nums</code> 的第一部分。更规范地说,如果在删除重复项之后有 <code>k</code> 个元素,那么 <code>nums</code> 的前 <code>k</code> 个元素应该保存最终结果。将最终结果保存到 <code>nums</code> 的前 <code>k</code> 个位置后返回 <code>k</code> 。</p><p>不要使用额外的空间,你必须在原地修改输入数组 并在使用 $O(1)$ 额外空间的条件下完成。</p><p>这个题乍一看还真不会,于是果断看了题解:</p><ul><li>fast 和 slow 初始为 1,因为就算数组内全是重复元素,那么 1 也可以表示其中不重复的数量</li><li>如果 fast 和 fast-1 对应的元素相等,表示有重复元素,此时 fast++,继续搜索后面的元素</li><li>如果 fast 和 fast-1 不相等,表示没有重复元素,且,不重复的元素是 <code>nums[fast]</code>,此时我们让 <code>nums[slow]=nums[fast]</code>,slow 和 fast 同时向后移动即可</li></ul><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">Solution</span> {</span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> <span class="function"><span class="type">int</span> <span class="title">removeDuplicates</span><span class="params">(vector<<span class="type">int</span>>& nums)</span> </span>{</span><br><span class="line"> <span class="type">int</span> slow = <span class="number">1</span>, fast = <span class="number">1</span>;</span><br><span class="line"> <span class="type">int</span> n = nums.<span class="built_in">size</span>();</span><br><span class="line"> <span class="keyword">while</span> (fast < n) {</span><br><span class="line"> <span class="keyword">if</span> (nums[fast] != nums[fast<span class="number">-1</span>]) {</span><br><span class="line"> nums[slow] = nums[fast];</span><br><span class="line"> slow++;</span><br><span class="line"> }</span><br><span class="line"> fast++;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> slow;</span><br><span class="line"> }</span><br><span class="line">};</span><br></pre></td></tr></table></figure><h2 id="剑指-Offer-II-022-链表中环的入口节点"><a href="#剑指-Offer-II-022-链表中环的入口节点" class="headerlink" title="剑指 Offer II 022. 链表中环的入口节点"></a>剑指 Offer II 022. 链表中环的入口节点</h2><p>给定一个链表,返回链表开始入环的第一个节点。 从链表的头节点开始沿着 next 指针进入环的第一个节点为环的入口节点。如果链表无环,则返回 null。</p><p>为了表示给定链表中的环,我们使用整数 pos 来表示链表尾连接到链表中的位置(索引从 0 开始)。 如果 pos 是 -1,则在该链表中没有环。注意,pos 仅仅是用于标识环的情况,并不会作为参数传递到函数中。</p><p><img data-src="https://s1.ax1x.com/2022/08/06/vubX6J.png" alt></p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">输入:head = [3,2,0,-4], pos = 1</span><br><span class="line">输出:返回索引为 1 的链表节点</span><br><span class="line">解释:链表中有一个环,其尾部连接到第二个节点。</span><br></pre></td></tr></table></figure><p>首先明确一点,使用哈希存储地址肯定可以做出来,但这里是为了熟悉双指针。</p><p><img data-src="https://s1.ax1x.com/2022/08/06/vubz01.png" alt></p><ul><li>假设 fast 指针和 slow 指针在紫色节点处相交</li><li>对于 fast 指针,走过的距离为 $a+n(b+c) + b$,$n$ 为任意整数</li><li>对于 slow 指针,走过的距离为 $a+m(b+c) + b$,$m$ 为任意整数</li></ul><p>由于 fast 移动的距离是 slow 的二倍,因此:</p><p>\brgin{equation}<br>a+n(b+c)+b = 2 [a+m(b+c) + b] \\<br>\Rightarrow a = (n-2m)(b+c) - b<br>\end{equation}</p><p>也就是说,$a$ 的长度等于整数倍的环的长度减去 $b$ 的长度。得到这个等式后,我们让一个指针从 <code>head</code> 出发,<code>slow</code> 指针从相交处出发,两者相交时,就是环的入口节点。</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">Solution</span> {</span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> <span class="function">ListNode *<span class="title">detectCycle</span><span class="params">(ListNode *head)</span> </span>{</span><br><span class="line"> ListNode *slow = head, *fast = head;</span><br><span class="line"> <span class="keyword">while</span> (fast != <span class="literal">nullptr</span>) {</span><br><span class="line"> slow = slow->next;</span><br><span class="line"> <span class="keyword">if</span> (fast->next == <span class="literal">nullptr</span>) {</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nullptr</span>;</span><br><span class="line"> }</span><br><span class="line"> fast = fast->next->next;</span><br><span class="line"> <span class="keyword">if</span> (fast == slow) {</span><br><span class="line"> ListNode *ptr = head;</span><br><span class="line"> <span class="keyword">while</span> (ptr != slow) {</span><br><span class="line"> ptr = ptr->next;</span><br><span class="line"> slow = slow->next;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> ptr;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nullptr</span>;</span><br><span class="line"> }</span><br><span class="line">};</span><br></pre></td></tr></table></figure><h1 id="对撞指针"><a href="#对撞指针" class="headerlink" title="对撞指针"></a>对撞指针</h1><p>对撞数组适用于有序数组、利用数组两侧求最值、只用数组内的两个元素等问题,应该第一时间想到用对撞指针解题。</p><h2 id="11-盛最多水的容器"><a href="#11-盛最多水的容器" class="headerlink" title="11. 盛最多水的容器"></a>11. 盛最多水的容器</h2><p>给定一个长度为 <code>n</code> 的整数数组 <code>height</code> 。有 <code>n</code> 条垂线,第 <code>i</code> 条线的两个端点是 <code>(i, 0)</code> 和 <code>(i, height[i])</code> 。找出其中的两条线,使得它们与 <code>x</code> 轴共同构成的容器可以容纳最多的水。返回容器可以储存的最大水量。</p><p><img data-src="https://s1.ax1x.com/2022/08/06/vubRSg.jpg" alt></p><ul><li>双指针,左指针位于左侧,右指针位于右侧,求一次能存储的最大水量</li><li>如果左边低,为了能求存储的最大水量,就需要将左指针向右移动,同理,如果右边低,就需要将右指针向左移动</li><li>每次移动的时候求极值就可以了</li></ul><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">Solution</span> {</span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> <span class="function"><span class="type">int</span> <span class="title">maxArea</span><span class="params">(vector<<span class="type">int</span>>& height)</span> </span>{</span><br><span class="line"> <span class="type">int</span> n = height.<span class="built_in">size</span>();</span><br><span class="line"> <span class="type">int</span> l = <span class="number">0</span>, r = n - <span class="number">1</span>;</span><br><span class="line"> <span class="type">int</span> res = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">while</span> (l < r) {</span><br><span class="line"> <span class="type">int</span> t1 = height[l];</span><br><span class="line"> <span class="type">int</span> t2 = height[r];</span><br><span class="line"> res = <span class="built_in">max</span>(res, <span class="built_in">min</span>(t1, t2) * (r - l));</span><br><span class="line"> <span class="keyword">if</span> (t1 < t2) {</span><br><span class="line"> l++;</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> r--;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> res;</span><br><span class="line"> }</span><br><span class="line">};</span><br></pre></td></tr></table></figure><h2 id="881-救生艇"><a href="#881-救生艇" class="headerlink" title="881. 救生艇"></a>881. 救生艇</h2><p>给定数组 <code>people</code> 。<code>people[i]</code> 表示第 <code>i</code> 个人的体重,船的数量不限,每艘船可以承载的最大重量为 <code>limit</code>。每艘船最多可同时载两人,但条件是这些人的重量之和最多为 <code>limit</code>。返回承载所有人所需的最小船数。示例:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">输入:people = [3,2,2,1], limit = 3</span><br><span class="line">输出:3</span><br><span class="line">解释:3 艘船分别载 (1, 2), (2) 和 (3)</span><br></pre></td></tr></table></figure><p>我们假设一种极端情况,数组排序后是 <code>[1, 2, ..., n-2, n-1]</code>,而船能容纳的极限是 <code>n</code>。那么,最佳分配就是让 <code>1</code> 和 <code>n-1</code> 在一起,<code>2</code> 和 <code>n-2</code> 在一起。此时只用两条船。虽然 <code>1</code> 可以和 <code>2</code> 在一起,那么要承载 <code>n-2</code> 和 <code>n-1</code>,就需要 3 条船。</p><p>基于贪心的思想,我们应该尽可能的把轻的和重的分配到一起,来减少船的使用数量,首先对数组排序:</p><ul><li>设立双指针,<code>l=0, r=n-1</code></li><li>因为船只能坐两个人,因此,如果 <code>nums[l] + nums[n-1] <= limit</code>,就让这两个人坐一起,此时 <code>l++</code></li><li>而无论如何,都需要 <code>r--</code>,因为数组末尾的必须上一个人,而数组左侧的人选择性上或不上 </li></ul><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">Solution</span> {</span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> <span class="function"><span class="type">int</span> <span class="title">numRescueBoats</span><span class="params">(vector<<span class="type">int</span>> &people, <span class="type">int</span> limit)</span> </span>{</span><br><span class="line"> <span class="type">int</span> ans = <span class="number">0</span>;</span><br><span class="line"> <span class="built_in">sort</span>(people.<span class="built_in">begin</span>(), people.<span class="built_in">end</span>());</span><br><span class="line"> <span class="type">int</span> light = <span class="number">0</span>, heavy = people.<span class="built_in">size</span>() - <span class="number">1</span>;</span><br><span class="line"> <span class="keyword">while</span> (light <= heavy) {</span><br><span class="line"> <span class="keyword">if</span> (people[light] + people[heavy] <= limit) {</span><br><span class="line"> ++light;</span><br><span class="line"> } </span><br><span class="line"> --heavy;</span><br><span class="line"> ++ans;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> ans;</span><br><span class="line"> }</span><br><span class="line">};</span><br></pre></td></tr></table></figure><h2 id="红白球"><a href="#红白球" class="headerlink" title="红白球"></a>红白球</h2><p>给定一个长度为 $n$ 的字符串,其中,<code>W</code> 表示白色的球,<code>R</code> 表示红色的球,如果把红色的球放到一起,请问最少移动多少次?示例:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">输入:s = "WRRWRW"</span><br><span class="line">输出:1</span><br><span class="line">输入:s = "WWRWWWRWR"</span><br><span class="line">输出:4,"WWRWWWRWR" -> "WWWRWWRWR" -> "WWWWRWRWR" -> "WWWWWRRWR" -> "WWWWWWRRRW"</span><br></pre></td></tr></table></figure><p>一个很经典的双指针题目,注:2022年微软秋招笔试题原题。这个题解有点长,日后完善。</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">solution</span>{</span><br><span class="line"> <span class="function"><span class="type">int</span> <span class="title">num</span><span class="params">(string& s)</span> </span>{</span><br><span class="line"> <span class="type">int</span> red_count = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">char</span> c : s) {</span><br><span class="line"> <span class="keyword">if</span> (c == <span class="string">'R'</span>) red_count++;</span><br><span class="line"> }</span><br><span class="line"> <span class="type">int</span> left = <span class="number">0</span>, right = s.<span class="built_in">size</span>() - <span class="number">1</span>, result = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">while</span> (left < right) {</span><br><span class="line"> <span class="keyword">if</span> (s[left] == <span class="string">'R'</span> && s[right] == <span class="string">'R'</span>) {</span><br><span class="line"> red_count -= <span class="number">2</span>;</span><br><span class="line"> result = right - left - <span class="number">1</span> - red_count;</span><br><span class="line"> ++left;</span><br><span class="line"> --right;</span><br><span class="line"> } <span class="keyword">else</span> <span class="keyword">if</span> (s[left] != <span class="string">'R'</span>) {</span><br><span class="line"> left++;</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> right--;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> result;</span><br><span class="line"> }</span><br><span class="line">};</span><br></pre></td></tr></table></figure>]]></content>
<summary type="html"><p>这几天接连遇到了一些双指针的问题,但是说实话,并没有从这些题中看到一种通用的东西,也就不是能很好的做一个总结,但不得不说双指针是一个很神奇的东西,所以做一道记一道吧。</p></summary>
<category term="DataStructure" scheme="https://muyuuuu.github.io/tags/DataStructure/"/>
</entry>
<entry>
<title>由内存分配引发的回忆</title>
<link href="https://muyuuuu.github.io/2022/07/11/memory-allocation/"/>
<id>https://muyuuuu.github.io/2022/07/11/memory-allocation/</id>
<published>2022-07-11T08:52:39.000Z</published>
<updated>2022-08-06T16:30:48.819Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" class="aplayer-secondary-style-marker" href="/assets/css/APlayer.min.css"><script src="/assets/js/APlayer.min.js" class="aplayer-secondary-script-marker"></script><p>闲来无事,在面经上看到了一个问题:在物理机只有 1G 内存的情况下,能否 <code>malloc</code> 出 4G 大小的数组。奇怪的是,这个问题在网上搜不到特别好的解答,于是突发奇想试着解答一下。</p><span id="more"></span><h1 id="内存分布"><a href="#内存分布" class="headerlink" title="内存分布"></a>内存分布</h1><p>先直接给出结论,<code>malloc</code> 的内存位于堆区,顺便简单了解下 C/C++ 的内存分布。对于 C/C++ 语言,程序内存分布如下:</p><ul><li>栈区(stack):由编译器自动分配释放,存储函数的参数值,局部变量值等,其操作方法类似于数据结构中的栈</li><li>堆区(heap):一般由程序员申请和释放,与数据结构中的堆没有任何关系,分配方式类似于链表</li><li>全局/静态区(static):全局变量和静态变量是存储在一起的,在程序编译时分配</li><li>文字常量区:存储常量字符串</li><li>程序代码区:存储函数体(类的成员函数、全局函数)的二进制代码</li></ul><p>重点是其中的栈区和堆区:</p><ul><li><p>栈区:程序自动向操作系统申请分配以及回收,速度快,使用方便,但是程序员无法控制。如果分配的内存超过了栈区的最大空间,会抛出栈溢出错误。const 局部变量也存储在栈区,栈区向地址减小的方向增长。系统为变量在栈上申请内存后,CPU 需要不断地判断变量是否已结束使用的生命周期,如果生命周期结束,系统就会释放为这个变量申请的栈内存,这样一来随着在栈上申请的变量增多,会对 CPU 造成额外的消耗。</p></li><li><p>堆区:程序员向操作系统申请一段内存,当系统收到程序的申请时,会遍历一个记录空内存结点的链表,找到第一个空间大于或等于所申请空间的堆结点,将该空闲结点从链表中删除,并将该结点的空间分配给程序,如果链表中空闲结点的空间大于申请空间的大小,系统会自动将对于的部分放入空闲链表中,故容易造成内存的碎片化,分配速度较慢,地址不连续。且堆区的内存由程序员申请,也必须由程序员负责管理和释放,否则会导致内存泄漏,堆的增长方向与内存地址的增长方向相同,因此在堆区上申请空间理论上是没有大小限制的,但是受安装内存条的大小和系统以及其他程序的占用,不是无限大的。不像栈上的变量那样,需要消耗 CPU 资源判断变量的生命周期,所以不会对 CPU 造成额外的消耗,这也是程序员申请堆上内存的优点。</p></li></ul><p>对于栈来讲,是由编译器自动管理,无需我们手工控制;对于堆来说,释放工作由程序员控制,容易产生内存泄露。碎片问题:对于堆来讲,频繁的 <code>malloc/free</code> 势必会造成内存空间的不连续,从而造成大量的碎片,使程序效率降低。对于栈来讲,则不会存在这个问题,因为栈是先进后出的队列,他们是如此的一一对应,以至于永远都不可能有一个内存块从栈中间弹出,在他弹出之前,在他上面的后进的栈内容已经被弹出。</p><h1 id="1G-内存-4G-数组"><a href="#1G-内存-4G-数组" class="headerlink" title="1G 内存 4G 数组"></a>1G 内存 4G 数组</h1><p>在了解 <code>malloc</code> 分配到的堆区大小取决于内存剩余的空闲空间后,再来研究能不能分配出大于空闲空间的数据。先给出结论,在虚拟内存足够大的情况下,1G 大小的内存可以开辟出 4G 的数组。虚拟内存是一个假象的内存空间,在程序运行过程中虚拟内存空间中需要被访问的部分会被映射到物理内存空间中。虚拟内空间大只能表示程序运行过程中可访问的空间比较大,不代表物理内存空间占用也大。</p><p><code>malloc</code> 可以申请到超出机器物理内存的大小,为什么说是一部分呢,因为可申请的内存不仅和已占用的内存相关,还和机器的 <code>swap space</code> (虚拟内存)相关,事实上在你给你机器装 Linux 系统的时候应该碰到过,那就是磁盘分区的时候会有一个 <code>swap</code>设定,只需要知道它是一种挂载在物理硬盘上,用来存放一些不太频繁使用的内存,是一种低速的物理内存的扩展。</p><p>当物理内存不够用时,原先一些物理内存中不常访问的内容会被转移到这里以让出空间给其它进程。所以 <code>swap</code> 空间也可以被 <code>malloc</code> 申请到。<code>malloc</code> 这个时候申请了内存,但没有完全申请,这就涉及到一个叫做 <code>Lazy Allocation</code> 的东东,当你使用 <code>malloc</code> 时,系统并没有真正从物理内存中分配,而是等到进程要操作时才提供 <code>allocation</code>。</p><p>因此,正是因为虚拟内存的存在,通过虚拟内存可以让程序可以拥有超过系统物理内存大小的可用内存空间。</p><h2 id="虚拟内存扩展"><a href="#虚拟内存扩展" class="headerlink" title="虚拟内存扩展"></a>虚拟内存扩展</h2><p>这是我研究生第一节课老师讲述的内容(顿时我就觉得那老师才是真正的计算机学者):虚拟内存为每个进程提供了一个一致的、私有的地址空间,它让每个进程产生了一种自己在独享主存的错觉。使用虚拟寻址,CPU 需要将虚拟地址翻译成物理地址,这样才能访问到真实的物理内存。程序可以使用一系列相邻的虚拟地址来访问物理内存中不相邻的大内存缓冲区,<strong>不同进程使用的虚拟地址彼此隔离</strong>。一个进程中的代码无法更改正在由另一进程或操作系统使用的物理内存,如果各个进程之间没有独立的地址空间,一个进程由于执行错误指令或是恶意代码都可以直接修改其它进程的数据,甚至修改内核地址空间的数据,这是操作系统所不愿看到的。</p>]]></content>
<summary type="html"><p>闲来无事,在面经上看到了一个问题:在物理机只有 1G 内存的情况下,能否 <code>malloc</code> 出 4G 大小的数组。奇怪的是,这个问题在网上搜不到特别好的解答,于是突发奇想试着解答一下。</p></summary>
<category term="OS" scheme="https://muyuuuu.github.io/tags/OS/"/>
</entry>
</feed>