Skip to content

Commit fcede6a

Browse files
committed
Review pt.2
1 parent dc268c5 commit fcede6a

File tree

76 files changed

+288
-295
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

76 files changed

+288
-295
lines changed

models/intel/age-gender-recognition-retail-0013/description/age-gender-recognition-retail-0013.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -40,12 +40,12 @@ applicable for children since their faces were not in the training set.
4040

4141
## Inputs
4242

43-
Name: `input` , shape: [1x3x62x62] - An input image in [1xCxHxW] format. Expected color order is BGR.
43+
Name: `input`, shape: [1x3x62x62] - An input image in [1xCxHxW] format. Expected color order is BGR.
4444

4545
## Outputs
4646

47-
1. name: "age_conv3", shape: [1, 1, 1, 1] - Estimated age divided by 100.
48-
2. name: "prob", shape: [1, 2, 1, 1] - Softmax output across 2 type classes [female, male]
47+
1. Name: `age_conv3`, shape: [1, 1, 1, 1] - Estimated age divided by 100.
48+
2. Name: `prob`, shape: [1, 2, 1, 1] - Softmax output across 2 type classes [female, male]
4949

5050
## Legal Information
5151
[*] Other names and brands may be claimed as the property of others.

models/intel/asl-recognition-0004/description/asl-recognition-0004.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ on the input clip.
2727

2828
## Inputs
2929

30-
Name: `input` , shape: [1x3x16x224x224]. An input image sequence in the format [BxCxTxHxW], where:
30+
Name: `input`, shape: [1x3x16x224x224]. An input image sequence in the format [BxCxTxHxW], where:
3131
- B - batch size
3232
- C - number of channels
3333
- T - duration of input clip

models/intel/emotions-recognition-retail-0003/description/emotions-recognition-retail-0003.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ only the images containing five aforementioned emotions is chosen. The total amo
3838

3939
## Inputs
4040

41-
Name: `input` , shape: [1x3x64x64] - An input image in [1xCxHxW] format. Expected color order is BGR.
41+
Name: `input`, shape: [1x3x64x64] - An input image in [1xCxHxW] format. Expected color order is BGR.
4242

4343
## Outputs
4444

models/intel/emotions-recognition-retail-0003/emotions-recognition-retail-0003.prototxt

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name: "0003_EmoNet_ResNet10"
22
layer {
33
name: "data"
4-
type: `input`
4+
type: "Input"
55
top: "data"
66
input_param {
77
shape {

models/intel/face-detection-0100/description/face-detection-0100.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ curve. All numbers were evaluated by taking into account only faces bigger than
2828

2929
## Inputs
3030

31-
Name: `input` , shape: [1x3x256x256] - An input image in the format [BxCxHxW],
31+
Name: `input`, shape: [1x3x256x256] - An input image in the format [BxCxHxW],
3232
where:
3333

3434
- B - batch size

models/intel/face-detection-0102/description/face-detection-0102.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ curve. All numbers were evaluated by taking into account only faces bigger than
2828

2929
## Inputs
3030

31-
Name: `input` , shape: [1x3x384x384] - An input image in the format [BxCxHxW],
31+
Name: `input`, shape: [1x3x384x384] - An input image in the format [BxCxHxW],
3232
where:
3333

3434
- B - batch size

models/intel/face-detection-0104/description/face-detection-0104.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ curve. All numbers were evaluated by taking into account only faces bigger than
2828

2929
## Inputs
3030

31-
Name: `input` , shape: [1x3x448x448] - An input image in the format [BxCxHxW],
31+
Name: `input`, shape: [1x3x448x448] - An input image in the format [BxCxHxW],
3232
where:
3333

3434
- B - batch size

models/intel/face-detection-0105/description/face-detection-0105.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ curve. All numbers were evaluated by taking into account only faces bigger than
2727

2828
## Inputs
2929

30-
Name: `input` , shape: [1x3x416x416] - An input image in the format [BxCxHxW],
30+
Name: `input`, shape: [1x3x416x416] - An input image in the format [BxCxHxW],
3131
where:
3232

3333
- B - batch size

models/intel/face-detection-0106/description/face-detection-0106.md

+9-9
Original file line numberDiff line numberDiff line change
@@ -27,26 +27,26 @@ curve. All numbers were evaluated by taking into account only faces bigger than
2727

2828
## Inputs
2929

30-
1. Name: `input` , shape: [1x3x640x640] - An input image in the format [BxCxHxW],
30+
Name: `input`, shape: [1x3x640x640] - An input image in the format [BxCxHxW],
3131
where:
3232

33-
- B - batch size
34-
- C - number of channels
35-
- H - image height
36-
- W - image width
33+
- B - batch size
34+
- C - number of channels
35+
- H - image height
36+
- W - image width
3737

3838
Expected color order: BGR.
3939

4040
## Outputs
4141

42-
1. The "boxes" is a blob with shape: [N, 5], where N is the number of detected
43-
bounding boxes. For each detection, the description has the format:
42+
1. The `boxes` is a blob with the shape [N, 5], where N is the number of detected
43+
bounding boxes. For each detection, the description has the format
4444
[`x_min`, `y_min`, `x_max`, `y_max`, `conf`],
4545
where:
4646
- (`x_min`, `y_min`) - coordinates of the top left bounding box corner
47-
- (`x_max`, `y_max`) - coordinates of the bottom right bounding box corner.
47+
- (`x_max`, `y_max`) - coordinates of the bottom right bounding box corner
4848
- `conf` - confidence for the predicted class
49-
2. The "labels" is a blob with shape: [N], where N is the number of detected
49+
2. The `labels` is a blob with the shape [N], where N is the number of detected
5050
bounding boxes. It contains `label` per each detected box.
5151

5252
## Legal Information

models/intel/face-detection-adas-0001/description/face-detection-adas-0001.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ curve. Numbers are on
3232

3333
## Inputs
3434

35-
Name: `input` , shape: [1x3x384x672] - An input image in the format [BxCxHxW],
35+
Name: `input`, shape: [1x3x384x672] - An input image in the format [BxCxHxW],
3636
where:
3737
- B - batch size
3838
- C - number of channels

models/intel/face-detection-adas-binary-0001/description/face-detection-adas-binary-0001.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ curve. Numbers are on
3434

3535
## Inputs
3636

37-
Name: `input` , shape: [1x3x384x672] - An input image in the format [BxCxHxW],
37+
Name: `input`, shape: [1x3x384x672] - An input image in the format [BxCxHxW],
3838
where:
3939
- B - batch size
4040
- C - number of channels

models/intel/face-detection-retail-0004/description/face-detection-retail-0004.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ curve. All numbers were evaluated by taking into account only faces bigger than
2929

3030
## Inputs
3131

32-
Name: `input` , shape: [1x3x300x300] - An input image in the format [BxCxHxW],
32+
Name: `input`, shape: [1x3x300x300] - An input image in the format [BxCxHxW],
3333
where:
3434

3535
- B - batch size

models/intel/face-detection-retail-0004/face-detection-retail-0004.prototxt

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name: "cnn_fd_004_sq_light_ssd"
22
layer {
33
name: "data"
4-
type: `input`
4+
type: "Input"
55
top: "data"
66
input_param {
77
shape {

models/intel/face-detection-retail-0005/description/face-detection-retail-0005.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ curve. All numbers were evaluated by taking into account only faces bigger than
2828

2929
## Inputs
3030

31-
Name: `input` , shape: [1x3x300x300] - An input image in the format [BxCxHxW],
31+
Name: `input`, shape: [1x3x300x300] - An input image in the format [BxCxHxW],
3232
where:
3333

3434
- B - batch size

models/intel/face-reidentification-retail-0095/description/face-reidentification-retail-0095.md

+8-8
Original file line numberDiff line numberDiff line change
@@ -36,14 +36,14 @@ To align the face, use a landmarks regression model: using regressed points and
3636

3737
## Inputs
3838

39-
1. Name: "data" , shape: [1x3x128x128] - An input image in the format [BxCxHxW],
40-
where:
41-
- B - batch size
42-
- C - number of channels
43-
- H - image height
44-
- W - image width
45-
46-
Expected color order is BGR.
39+
Name: "data" , shape: [1x3x128x128] - An input image in the format [BxCxHxW],
40+
where:
41+
- B - batch size
42+
- C - number of channels
43+
- H - image height
44+
- W - image width
45+
46+
Expected color order is BGR.
4747

4848
## Outputs
4949
The net outputs a blob with the shape [1, 256, 1, 1], containing a row-vector of 256 floating point values. Outputs on different images are comparable in cosine distance.

models/intel/faster-rcnn-resnet101-coco-sparse-60-0001/description/faster-rcnn-resnet101-coco-sparse-60-0001.md

+16-16
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,11 @@
22

33
## Use Case and High-Level Description
44

5-
This is a re-trained version of [Faster R-CNN](https://arxiv.org/abs/1506.01497) object detection network trained with COCO\* training dataset.
5+
This is a retrained version of the [Faster R-CNN](https://arxiv.org/abs/1506.01497) object detection network trained with the COCO\* training dataset.
66
The actual implementation is based on [Detectron](https://github.com/facebookresearch/detectron2),
77
with additional [network weight pruning](https://arxiv.org/abs/1710.01878) applied to sparsify convolution layers (60% of network parameters are set to zeros).
88

9-
The model input is a blob that consists of a single image of "1x3x800x1280" in BGR order. The pixel values are integers in the [0, 255] range.
9+
The model input is a blob that consists of a single image of `1x3x800x1280` in the BGR order. The pixel values are integers in the [0, 255] range.
1010

1111
## Specification
1212

@@ -17,28 +17,28 @@ The model input is a blob that consists of a single image of "1x3x800x1280" in B
1717
| MParams | 52.79 |
1818
| Source framework | TensorFlow\* |
1919

20-
Average Precision metric described in: ["COCO: Common Objects in Context"](http://cocodataset.org/#detection-eval). The primary challenge metric is used. Tested on COCO validation dataset.
20+
See Average Precision metric description at [COCO: Common Objects in Context](http://cocodataset.org/#detection-eval). The primary challenge metric is used. Tested on the COCO validation dataset.
2121

2222
## Performance
2323

2424
## Inputs
2525

26-
Name: `input` , shape: [1x3x800x1280] - An input image in the format [BxCxHxW],
27-
where:
28-
- B - batch size
29-
- C - number of channels
30-
- H - image height
31-
- W - image width.
32-
Expected color order is BGR.
26+
Name: `input`, shape: [1x3x800x1280] - An input image in the format [BxCxHxW],
27+
where:
28+
- B - batch size
29+
- C - number of channels
30+
- H - image height
31+
- W - image width
32+
Expected color order is BGR.
3333

3434
## Outputs
3535

36-
1. The net outputs a blob with the shape: [300, 7], where each row is consisted of [`image_id`, `class_id`, `confidence`, `x0`, `y0`, `x1`, `y1`], respectively.
37-
- `image_id` - image ID in the batch
38-
- `class_id` - predicted class ID
39-
- `confidence` - [0, 1] detection score, the higher the value, the more confident the deteciton is on
40-
- (`x0`, `y0`) - normalized coordinates of the top left bounding box corner, in range of [0, 1]
41-
- (`x1`, `y1`) - normalized coordinates of the bootm right bounding box corner, in range of [0, 1].
36+
The net outputs a blob with the shape [300, 7], where each row consists of [`image_id`, `class_id`, `confidence`, `x0`, `y0`, `x1`, `y1`] respectively:
37+
- `image_id` - image ID in the batch
38+
- `class_id` - predicted class ID
39+
- `confidence` - [0, 1] detection score; the higher the value, the more confident the detection is
40+
- (`x0`, `y0`) - normalized coordinates of the top left bounding box corner, in the [0, 1] range
41+
- (`x1`, `y1`) - normalized coordinates of the bottom right bounding box corner, in the [0, 1] range
4242

4343
## Legal Information
4444
[\*] Other names and brands may be claimed as the property of others.

models/intel/handwritten-japanese-recognition-0001/description/handwritten-japanese-recognition-0001.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,9 @@
22

33
## Use Case and High-Level Description
44

5-
This is a network for handwritten japanese text recognition scenario. It consists of VGG16-like backbone, reshape layer and a fully connected layer.
6-
The network is able to recognize japanese text (characters in datasets [Kondate](http://web.tuat.ac.jp/~nakagawa/database/en/kondate_about.html) and [Nakayosi](http://web.tuat.ac.jp/~nakagawa/database/en/about_nakayosi.html)).
5+
This is a network for handwritten Japanese text recognition scenario. It consists of a VGG16-like backbone,
6+
reshape layer and a fully connected layer.
7+
The network is able to recognize Japanese text. For details on characters in datasets, see [Kondate](http://web.tuat.ac.jp/~nakagawa/database/en/kondate_about.html) and [Nakayosi](http://web.tuat.ac.jp/~nakagawa/database/en/about_nakayosi.html).
78

89
## Example
910

@@ -32,9 +33,10 @@ where:
3233
- H - image height
3334
- W - image width
3435

35-
Note that the source image should be converted to grayscale, resized to spefic height (such as 96) while keeping aspect ratio, normalized to [-1, 1] and right bottom padded
36+
Note that the source image should be converted to grayscale, resized to specific height (such as 96) while keeping aspect ratio, normalized to [-1, 1], and right-bottom padded.
3637

3738
## Outputs
39+
3840
The net outputs a blob with the shape [186, 1, 1161] in the format [WxBxL],
3941
where:
4042
- W - output sequence length
@@ -43,7 +45,5 @@ where:
4345

4446
The network output can be decoded by CTC Greedy Decoder.
4547

46-
47-
4848
## Legal Information
4949
[*] Other names and brands may be claimed as the property of others.

models/intel/human-pose-estimation-0001/description/human-pose-estimation-0001.md

+7-7
Original file line numberDiff line numberDiff line change
@@ -27,13 +27,13 @@ Tested on a COCO validation subset from the original paper [Realtime Multi-Perso
2727

2828
## Inputs
2929

30-
1. Name: `input` , shape: [1x3x256x456]. An input image in the [BxCxHxW] format ,
31-
where:
32-
- B - batch size
33-
- C - number of channels
34-
- H - image height
35-
- W - image width.
36-
Expected color order is BGR.
30+
Name: `input`, shape: [1x3x256x456]. An input image in the [BxCxHxW] format ,
31+
where:
32+
- B - batch size
33+
- C - number of channels
34+
- H - image height
35+
- W - image width
36+
Expected color order is BGR.
3737

3838
## Outputs
3939

models/intel/human-pose-estimation-0001/human-pose-estimation-0001.prototxt

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
layer {
22
name: "data"
3-
type: `input`
3+
type: "Input"
44
top: "data"
55
input_param {shape: {dim: 1 dim: 3 dim: 256 dim: 456}}
66
}

models/intel/icnet-camvid-ava-0001/description/icnet-camvid-ava-0001.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
## Use Case and High-Level Description
44

5-
A trained model of ICNet for fast semantic segmentation, trained on the CamVid\* dataset from scratch using the TensorFlow\* framework. For more details about the original floating point model, check out the [paper](https://arxiv.org/abs/1704.08545).
5+
A trained model of ICNet for fast semantic segmentation, trained on the CamVid\* dataset from scratch using the TensorFlow\* framework. For details about the original floating-point model, check out [ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545).
66

7-
The model input is a blob that consists of a single image of "1x3x720x960" in BGR order. The pixel values are integers in the [0, 255] range.
7+
The model input is a blob that consists of a single image of `1x3x720x960` in the BGR order. The pixel values are integers in the [0, 255] range.
88

99
The model output for `icnet-camvid-ava-0001` is the predicted class index of each input pixel belonging to one of the 12 classes of the CamVid dataset.
1010

@@ -18,7 +18,7 @@ The model output for `icnet-camvid-ava-0001` is the predicted class index of eac
1818

1919
## Accuracy
2020

21-
The quality metrics were calculated on the CamVid\* validation dataset. The 'unlabeled' class had been ignored during metrics calculation.
21+
The quality metrics were calculated on the CamVid\* validation dataset. The `unlabeled` class had been ignored during metrics calculation.
2222

2323
| Metric | Value |
2424
|---------------------------|---------------|
@@ -51,7 +51,7 @@ Semantic segmentation class prediction map, shape - `1,720,960`, output data for
5151
- `H` - horizontal coordinate of the input pixel
5252
- `W` - vertical coordinate of the input pixel
5353

54-
containing the class prediction result of each pixel.
54+
Output contains the class prediction result of each pixel.
5555

5656
## Legal Information
5757
[*] Other names and brands may be claimed as the property of others.

models/intel/icnet-camvid-ava-sparse-30-0001/description/icnet-camvid-ava-sparse-30-0001.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
## Use Case and High-Level Description
44

5-
A trained model of ICNet for fast semantic segmentation, trained on the CamVid\* dataset from scratch using the TensorFlow\* framework. The trained model has 30% sparsity (ratio of 0's within all the convolution kernel weights). For more details about the original floating point model, check out the [paper](https://arxiv.org/abs/1704.08545).
5+
A trained model of ICNet for fast semantic segmentation, trained on the CamVid\* dataset from scratch using the TensorFlow\* framework. The trained model has 30% sparsity (ratio of zeros within all the convolution kernel weights). For details about the original floating-point model, check out the [ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545).
66

7-
The model input is a blob that consists of a single image of "1x3x720x960" in BGR order. The pixel values are integers in the [0, 255] range.
7+
The model input is a blob that consists of a single image of `1x3x720x960` in the BGR order. The pixel values are integers in the [0, 255] range.
88

99
The model output for `icnet-camvid-ava-sparse-30-0001` is the predicted class index of each input pixel belonging to one of the 12 classes of the CamVid dataset.
1010

@@ -18,7 +18,7 @@ The model output for `icnet-camvid-ava-sparse-30-0001` is the predicted class in
1818

1919
## Accuracy
2020

21-
The quality metrics were calculated on the CamVid\* validation dataset. The 'unlabeled' class had been ignored during metrics calculation.
21+
The quality metrics were calculated on the CamVid\* validation dataset. The `unlabeled` class had been ignored during metrics calculation.
2222

2323
| Metric | Value |
2424
|---------------------------|---------------|
@@ -51,7 +51,7 @@ Semantic segmentation class prediction map, shape - `1,720,960`, output data for
5151
- `H` - horizontal coordinate of the input pixel
5252
- `W` - vertical coordinate of the input pixel
5353

54-
containing the class prediction result of each pixel.
54+
Output contains the class prediction result of each pixel.
5555

5656
## Legal Information
5757
[*] Other names and brands may be claimed as the property of others.

models/intel/icnet-camvid-ava-sparse-60-0001/description/icnet-camvid-ava-sparse-60-0001.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
## Use Case and High-Level Description
44

5-
A trained model of ICNet for fast semantic segmentation, trained on the CamVid\* dataset from scratch using the TensorFlow\* framework. The trained model has 60% sparsity (ratio of 0's within all the convolution kernel weights). For more details about the original floating point model, check out the [paper](https://arxiv.org/abs/1704.08545).
5+
A trained model of ICNet for fast semantic segmentation, trained on the CamVid\* dataset from scratch using the TensorFlow\* framework. The trained model has 60% sparsity (ratio of zeros within all the convolution kernel weights). For details about the original floating-point model, check out the [ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545).
66

7-
The model input is a blob that consists of a single image of "1x3x720x960" in BGR order. The pixel values are integers in the [0, 255] range.
7+
The model input is a blob that consists of a single image of `1x3x720x960` in the BGR order. The pixel values are integers in the [0, 255] range.
88

99
The model output for `icnet-camvid-ava-sparse-60-0001` is the predicted class index of each input pixel belonging to one of the 12 classes of the CamVid dataset.
1010

@@ -18,7 +18,7 @@ The model output for `icnet-camvid-ava-sparse-60-0001` is the predicted class in
1818

1919
## Accuracy
2020

21-
The quality metrics were calculated on the CamVid\* validation dataset. The 'unlabeled' class had been ignored during metrics calculation.
21+
The quality metrics were calculated on the CamVid\* validation dataset. The `unlabeled` class had been ignored during metrics calculation.
2222

2323
| Metric | Value |
2424
|---------------------------|---------------|
@@ -51,7 +51,7 @@ Semantic segmentation class prediction map, shape - `1,720,960`, output data for
5151
- `H` - horizontal coordinate of the input pixel
5252
- `W` - vertical coordinate of the input pixel
5353

54-
containing the class prediction result of each pixel.
54+
Output contains the class prediction result of each pixel.
5555

5656
## Legal Information
5757
[*] Other names and brands may be claimed as the property of others.

models/intel/image-retrieval-0001/description/image-retrieval-0001.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Image retrieval model based on [MobileNetV2](https://arxiv.org/abs/1801.04381) a
2121

2222
## Inputs
2323

24-
Name: `input` , shape: [1x3x224x224] — An input image in the format [BxCxHxW],
24+
Name: `input`, shape: [1x3x224x224] — An input image in the format [BxCxHxW],
2525
where:
2626

2727
- B - batch size

0 commit comments

Comments
 (0)