- Experiment notes
- Observations
- Features / implementations
- Bugs
- experiments until 2019-07-24 were run with
--num_min_depth=30
and--num_max_depth=150
. - experiments after 2019-07-24-18-00 were run with
--num_min_depth=20
and--num_max_depth=70
. The reason being that due to the vanishing gradient VGG will not train for deep nets
- Training vgg takes at least an order of magnitude less time than training
resnet based networks.
- This statement was very incorrect, as the short training time was caused by a bug in the code. Testing at the end of training failed due to incorrect paths.
The goal is to compare the feature maps' eigenvalues of the different architectures to explain why some network types generalise better. Or rather, forget less.
Graphs to create should have:
- x-axis: layer number
- y-axis: avg value; max value
- one graph per network type (VGG, ResNet, DenseNet)
The eigenvalues are very small, except for one value. Therefore, plotting all 64 values per layer on one graph doesn't make sense as the smaller values wouldn't be visible.
Furthermore, the remaining small values look somewhat strange, almost like mirrored values:
- Important: currently only the last model of a training process is being saved.
- 1st or any conv layer (32 channels) (take one more multiple closer to the output)
- global avg pooling
- create covariance matrix from 32 channels of size
32x32
- perform SVD
- get the eigenvalues
- determine a threshold, and see how many values are over it of different archs
- plot the eigenvalues
- analyse if the
this can probably be realised with forward_hooks
instead of altering the
PyTorch model itself.
Similar to ResNet, DenseNet adds shortcuts among layers. Different from Resnet, a layer in dense receives all the outs of previous layers and concatenate them in the depth dimension. In Resnet, a layer only receives outputs from the previous second or third layer, and the outputs are added together on the same depth, therefore it won’t change the depth by adding shortcuts. In other words, in Resnet the output of layer of k is x[k] = f(w * x[k-1] + x[k-2]), while in DenseNet it is x[k] = f(w * H(x[k-1], x[k-2], … x[1])) where H means stacking over the depth dimension. Besides, Resnet makes learn identity function easy, while DenseNet directly adds identity function. Source: here
Within a DenseBlock, each layer is directly connected to every other layer in front of it (feed-forward only).
The Block config looks like this: (6, 12, 24, 16)
and indicates how many
layers each pooling block contains.
RuntimeError: Given groups=32, weight of size 32 1 5 5, expected input[128, 16, 16, 16] to have 32 channels, but got 16 channels instead
RuntimeError: running_mean should contain 16 elements not 64
The actual error here is, that the output of The conv layer in the
_Transition
layer has to match the input of the next norm1
layer.
- official pytorch implementation
- Gives some errors, see above
- very clean implementation
- seems to have not ideal performance according to the author
Could be interesting to implement and look at, from the 2015 paper GoogLeNet.
-
The number of
True
inself.is_active
determines the number of layers in the network. It is incgp.py
. -
The
self.is_active
elements are set toTrue
in the recursive function__check_course_to_out(self, n)
.-
It traverses the
self.gene
array in which each element has a structure like this:[layer_type nr][rnd val 0..index]
. The number of elements is equal tonode_num + out_num
. -
self.gene
is populated ininit_gene(self)
. -
The recursive stop condition is with
input_num = 1
and the firstn
the function is called with is201
or the max node number incl. output nodes. At this time, allis_active
elements are alsoFalse
.if not self.is_active[n]: ... for i in range(in_num): if self.gene[n][i+1] >= self.net_info.input_num: self.__check_course_to_out( self.gene[n][i+1] - self.net_info.input_num)
-
-
The actual layer number is defined in
active_net_list()
. -
The number of layers is actually determined by the randomly successfull connections made in the
__check_course_to_out(self, n)
function. There, possible viable paths are checked. Then afterwards, it is checked if the number of selected layers is more than the specifiednum_min_depth
. If not,mutation(self, mutation_rate)
is called to create a valid mutation with more layers.
- The random generator is not the problem, for the random value
0..index
the number1
appears 10x more than other values but this is expected. For the layer type, type7
appears1.5..2
times as often as any other layer type. - Increasing the number of
num_min_depth
to a higher number than the usual number of layers. However, this kind of results in an endless loop as the computer needs a long time to come up with random numbers which result in a connection connecting enough layers together.
The logic of connecting layers has to be rewritten, this impacts init_gene()
and __check_course_to_out(self, n)
.
Solution is in the level_back
variable which indicates the distance between
the selected layers. level_back=1
means each layer is connected to its following
one.
Attention: THE FIX DESCRIBED ABOVE IS INCORRECT
The correct fix is, by making each layer directly connected to its previous
layer and not randomising the connection. Then, we randomise the number num_depth
at the beginning of execution.
The number of layers for the ResNet structure is not random even though the num_depth
variable is being randomised.