You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: models/intel/face-reidentification-retail-0095/description/face-reidentification-retail-0095.md
+8-8
Original file line number
Diff line number
Diff line change
@@ -36,14 +36,14 @@ To align the face, use a landmarks regression model: using regressed points and
36
36
37
37
## Inputs
38
38
39
-
1.Name: "data" , shape: [1x3x128x128] - An input image in the format [BxCxHxW],
40
-
where:
41
-
- B - batch size
42
-
- C - number of channels
43
-
- H - image height
44
-
- W - image width
45
-
46
-
Expected color order is BGR.
39
+
Name: "data" , shape: [1x3x128x128] - An input image in the format [BxCxHxW],
40
+
where:
41
+
- B - batch size
42
+
- C - number of channels
43
+
- H - image height
44
+
- W - image width
45
+
46
+
Expected color order is BGR.
47
47
48
48
## Outputs
49
49
The net outputs a blob with the shape [1, 256, 1, 1], containing a row-vector of 256 floating point values. Outputs on different images are comparable in cosine distance.
Copy file name to clipboardexpand all lines: models/intel/faster-rcnn-resnet101-coco-sparse-60-0001/description/faster-rcnn-resnet101-coco-sparse-60-0001.md
+16-16
Original file line number
Diff line number
Diff line change
@@ -2,11 +2,11 @@
2
2
3
3
## Use Case and High-Level Description
4
4
5
-
This is a re-trained version of [Faster R-CNN](https://arxiv.org/abs/1506.01497) object detection network trained with COCO\* training dataset.
5
+
This is a retrained version of the [Faster R-CNN](https://arxiv.org/abs/1506.01497) object detection network trained with the COCO\* training dataset.
6
6
The actual implementation is based on [Detectron](https://github.com/facebookresearch/detectron2),
7
7
with additional [network weight pruning](https://arxiv.org/abs/1710.01878) applied to sparsify convolution layers (60% of network parameters are set to zeros).
8
8
9
-
The model input is a blob that consists of a single image of "1x3x800x1280" in BGR order. The pixel values are integers in the [0, 255] range.
9
+
The model input is a blob that consists of a single image of `1x3x800x1280` in the BGR order. The pixel values are integers in the [0, 255] range.
10
10
11
11
## Specification
12
12
@@ -17,28 +17,28 @@ The model input is a blob that consists of a single image of "1x3x800x1280" in B
17
17
| MParams | 52.79 |
18
18
| Source framework | TensorFlow\*|
19
19
20
-
Average Precision metric described in: ["COCO: Common Objects in Context"](http://cocodataset.org/#detection-eval). The primary challenge metric is used. Tested on COCO validation dataset.
20
+
See Average Precision metric description at [COCO: Common Objects in Context](http://cocodataset.org/#detection-eval). The primary challenge metric is used. Tested on the COCO validation dataset.
21
21
22
22
## Performance
23
23
24
24
## Inputs
25
25
26
-
Name: `input`, shape: [1x3x800x1280] - An input image in the format [BxCxHxW],
27
-
where:
28
-
- B - batch size
29
-
- C - number of channels
30
-
- H - image height
31
-
- W - image width.
32
-
Expected color order is BGR.
26
+
Name: `input`, shape: [1x3x800x1280] - An input image in the format [BxCxHxW],
27
+
where:
28
+
- B - batch size
29
+
- C - number of channels
30
+
- H - image height
31
+
- W - image width
32
+
Expected color order is BGR.
33
33
34
34
## Outputs
35
35
36
-
1.The net outputs a blob with the shape:[300, 7], where each row is consisted of [`image_id`, `class_id`, `confidence`, `x0`, `y0`, `x1`, `y1`], respectively.
37
-
-`image_id` - image ID in the batch
38
-
-`class_id` - predicted class ID
39
-
-`confidence` - [0, 1] detection score, the higher the value, the more confident the deteciton is on
40
-
- (`x0`, `y0`) - normalized coordinates of the top left bounding box corner, in range of [0, 1]
41
-
- (`x1`, `y1`) - normalized coordinates of the bootm right bounding box corner, in range of [0, 1].
36
+
The net outputs a blob with the shape [300, 7], where each row consists of [`image_id`, `class_id`, `confidence`, `x0`, `y0`, `x1`, `y1`] respectively:
37
+
-`image_id` - image ID in the batch
38
+
-`class_id` - predicted class ID
39
+
-`confidence` - [0, 1] detection score; the higher the value, the more confident the detection is
40
+
- (`x0`, `y0`) - normalized coordinates of the top left bounding box corner, in the [0, 1] range
41
+
- (`x1`, `y1`) - normalized coordinates of the bottom right bounding box corner, in the [0, 1] range
42
42
43
43
## Legal Information
44
44
[\*] Other names and brands may be claimed as the property of others.
Copy file name to clipboardexpand all lines: models/intel/handwritten-japanese-recognition-0001/description/handwritten-japanese-recognition-0001.md
+5-5
Original file line number
Diff line number
Diff line change
@@ -2,8 +2,9 @@
2
2
3
3
## Use Case and High-Level Description
4
4
5
-
This is a network for handwritten japanese text recognition scenario. It consists of VGG16-like backbone, reshape layer and a fully connected layer.
6
-
The network is able to recognize japanese text (characters in datasets [Kondate](http://web.tuat.ac.jp/~nakagawa/database/en/kondate_about.html) and [Nakayosi](http://web.tuat.ac.jp/~nakagawa/database/en/about_nakayosi.html)).
5
+
This is a network for handwritten Japanese text recognition scenario. It consists of a VGG16-like backbone,
6
+
reshape layer and a fully connected layer.
7
+
The network is able to recognize Japanese text. For details on characters in datasets, see [Kondate](http://web.tuat.ac.jp/~nakagawa/database/en/kondate_about.html) and [Nakayosi](http://web.tuat.ac.jp/~nakagawa/database/en/about_nakayosi.html).
7
8
8
9
## Example
9
10
@@ -32,9 +33,10 @@ where:
32
33
- H - image height
33
34
- W - image width
34
35
35
-
Note that the source image should be converted to grayscale, resized to spefic height (such as 96) while keeping aspect ratio, normalized to [-1, 1] and rightbottom padded
36
+
Note that the source image should be converted to grayscale, resized to specific height (such as 96) while keeping aspect ratio, normalized to [-1, 1], and right-bottom padded.
36
37
37
38
## Outputs
39
+
38
40
The net outputs a blob with the shape [186, 1, 1161] in the format [WxBxL],
39
41
where:
40
42
- W - output sequence length
@@ -43,7 +45,5 @@ where:
43
45
44
46
The network output can be decoded by CTC Greedy Decoder.
45
47
46
-
47
-
48
48
## Legal Information
49
49
[*] Other names and brands may be claimed as the property of others.
Copy file name to clipboardexpand all lines: models/intel/icnet-camvid-ava-0001/description/icnet-camvid-ava-0001.md
+4-4
Original file line number
Diff line number
Diff line change
@@ -2,9 +2,9 @@
2
2
3
3
## Use Case and High-Level Description
4
4
5
-
A trained model of ICNet for fast semantic segmentation, trained on the CamVid\* dataset from scratch using the TensorFlow\* framework. For more details about the original floatingpoint model, check out the [paper](https://arxiv.org/abs/1704.08545).
5
+
A trained model of ICNet for fast semantic segmentation, trained on the CamVid\* dataset from scratch using the TensorFlow\* framework. For details about the original floating-point model, check out [ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545).
6
6
7
-
The model input is a blob that consists of a single image of "1x3x720x960" in BGR order. The pixel values are integers in the [0, 255] range.
7
+
The model input is a blob that consists of a single image of `1x3x720x960` in the BGR order. The pixel values are integers in the [0, 255] range.
8
8
9
9
The model output for `icnet-camvid-ava-0001` is the predicted class index of each input pixel belonging to one of the 12 classes of the CamVid dataset.
10
10
@@ -18,7 +18,7 @@ The model output for `icnet-camvid-ava-0001` is the predicted class index of eac
18
18
19
19
## Accuracy
20
20
21
-
The quality metrics were calculated on the CamVid\* validation dataset. The 'unlabeled' class had been ignored during metrics calculation.
21
+
The quality metrics were calculated on the CamVid\* validation dataset. The `unlabeled` class had been ignored during metrics calculation.
22
22
23
23
| Metric | Value |
24
24
|---------------------------|---------------|
@@ -51,7 +51,7 @@ Semantic segmentation class prediction map, shape - `1,720,960`, output data for
51
51
-`H` - horizontal coordinate of the input pixel
52
52
-`W` - vertical coordinate of the input pixel
53
53
54
-
containing the class prediction result of each pixel.
54
+
Output contains the class prediction result of each pixel.
55
55
56
56
## Legal Information
57
57
[*] Other names and brands may be claimed as the property of others.
Copy file name to clipboardexpand all lines: models/intel/icnet-camvid-ava-sparse-30-0001/description/icnet-camvid-ava-sparse-30-0001.md
+4-4
Original file line number
Diff line number
Diff line change
@@ -2,9 +2,9 @@
2
2
3
3
## Use Case and High-Level Description
4
4
5
-
A trained model of ICNet for fast semantic segmentation, trained on the CamVid\* dataset from scratch using the TensorFlow\* framework. The trained model has 30% sparsity (ratio of 0's within all the convolution kernel weights). For more details about the original floatingpoint model, check out the [paper](https://arxiv.org/abs/1704.08545).
5
+
A trained model of ICNet for fast semantic segmentation, trained on the CamVid\* dataset from scratch using the TensorFlow\* framework. The trained model has 30% sparsity (ratio of zeros within all the convolution kernel weights). For details about the original floating-point model, check out the [ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545).
6
6
7
-
The model input is a blob that consists of a single image of "1x3x720x960" in BGR order. The pixel values are integers in the [0, 255] range.
7
+
The model input is a blob that consists of a single image of `1x3x720x960` in the BGR order. The pixel values are integers in the [0, 255] range.
8
8
9
9
The model output for `icnet-camvid-ava-sparse-30-0001` is the predicted class index of each input pixel belonging to one of the 12 classes of the CamVid dataset.
10
10
@@ -18,7 +18,7 @@ The model output for `icnet-camvid-ava-sparse-30-0001` is the predicted class in
18
18
19
19
## Accuracy
20
20
21
-
The quality metrics were calculated on the CamVid\* validation dataset. The 'unlabeled' class had been ignored during metrics calculation.
21
+
The quality metrics were calculated on the CamVid\* validation dataset. The `unlabeled` class had been ignored during metrics calculation.
22
22
23
23
| Metric | Value |
24
24
|---------------------------|---------------|
@@ -51,7 +51,7 @@ Semantic segmentation class prediction map, shape - `1,720,960`, output data for
51
51
-`H` - horizontal coordinate of the input pixel
52
52
-`W` - vertical coordinate of the input pixel
53
53
54
-
containing the class prediction result of each pixel.
54
+
Output contains the class prediction result of each pixel.
55
55
56
56
## Legal Information
57
57
[*] Other names and brands may be claimed as the property of others.
Copy file name to clipboardexpand all lines: models/intel/icnet-camvid-ava-sparse-60-0001/description/icnet-camvid-ava-sparse-60-0001.md
+4-4
Original file line number
Diff line number
Diff line change
@@ -2,9 +2,9 @@
2
2
3
3
## Use Case and High-Level Description
4
4
5
-
A trained model of ICNet for fast semantic segmentation, trained on the CamVid\* dataset from scratch using the TensorFlow\* framework. The trained model has 60% sparsity (ratio of 0's within all the convolution kernel weights). For more details about the original floatingpoint model, check out the [paper](https://arxiv.org/abs/1704.08545).
5
+
A trained model of ICNet for fast semantic segmentation, trained on the CamVid\* dataset from scratch using the TensorFlow\* framework. The trained model has 60% sparsity (ratio of zeros within all the convolution kernel weights). For details about the original floating-point model, check out the [ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545).
6
6
7
-
The model input is a blob that consists of a single image of "1x3x720x960" in BGR order. The pixel values are integers in the [0, 255] range.
7
+
The model input is a blob that consists of a single image of `1x3x720x960` in the BGR order. The pixel values are integers in the [0, 255] range.
8
8
9
9
The model output for `icnet-camvid-ava-sparse-60-0001` is the predicted class index of each input pixel belonging to one of the 12 classes of the CamVid dataset.
10
10
@@ -18,7 +18,7 @@ The model output for `icnet-camvid-ava-sparse-60-0001` is the predicted class in
18
18
19
19
## Accuracy
20
20
21
-
The quality metrics were calculated on the CamVid\* validation dataset. The 'unlabeled' class had been ignored during metrics calculation.
21
+
The quality metrics were calculated on the CamVid\* validation dataset. The `unlabeled` class had been ignored during metrics calculation.
22
22
23
23
| Metric | Value |
24
24
|---------------------------|---------------|
@@ -51,7 +51,7 @@ Semantic segmentation class prediction map, shape - `1,720,960`, output data for
51
51
-`H` - horizontal coordinate of the input pixel
52
52
-`W` - vertical coordinate of the input pixel
53
53
54
-
containing the class prediction result of each pixel.
54
+
Output contains the class prediction result of each pixel.
55
55
56
56
## Legal Information
57
57
[*] Other names and brands may be claimed as the property of others.
0 commit comments