Skip to content

Configuration files

Javier Sanz-Cruzado edited this page Jan 24, 2020 · 4 revisions

The different programs (with the exception of EWC3) receive, as an argument, an XML file containing the different sets of parameters to use with each algorithm. In this page, we describe how to configure an algorithm.

Algorithms

An algorithm is defined using the algorithm tag, and is composed of two fields: the name field, which just contains the identifier of the algorithm, and the parameters of the algorithm (if any). A partial example is shown next:

<algorithms>
  <algorithm>
    <name>QLJM</name>
    <params>
      ...
    </params>
   </algorithm>
</algorithms>

Name

The name allows the program to know which algorithm you want to configure. Depending on the algorithm, there are some fixed values in the code. We show a relation of them here:

Algorithm Basic Without term discrimination Without length normalization
Random Random
Popularity Popularity
Most Common Neighbors MCN
Jaccard Jaccard
Adamic-Adar Adamic
Cosine similarity Cosine
BIR BIR
BM25 BM25 BM25 No Term Discrimination BM25 No Length Normalization
Extreme BM25 EBM25 EBM25 No Term Discrimination EBM25 No Length Normalization
Pivoted normalization VSM Pivoted VSM Pivoted VSM No Term Discrimination Pivoted VSM No Length Normalization
Query Likelihood Dirichlet QLD QLD No Term Discrimination QLD No Length Normalization
Query Likelihood Jelinek-Mercer QLJM QLJM No Term Discrimination QLJM No Length Normalization
Query Likelihood Laplace QLL
PL2 PL2
DLH DLH
DPH DPH
DFRee DFRee
DFReeKLIM DFReeKLIM

Parameters

The parameters, represented with the params tag, indicate the parameter combinations which have to be executed in the program. The parameters element must contain the different parameters, represented by the param tag.

Each individual parameter is defined by three fields: its name (field name), its type (field type), and the set of values it can take (field values). Next, we can see an example:

<params>
  <param>
    <name>lambda</name>
    <type>Double</type>
    <values>
      ...
    </values>
  </param>
  <param>
    ...
  </param>
</params>

The name of the parameter depends on the algorithm. We include in the repository two files named conf\fullgrid-dir.xml and conf\fullgrid-undir.xml which contain a configuration example for each algorithm (for directed/undirected graphs), so the names of the parameters are easy to obtain.

The type determines which variable type will be used to define the parameters. We provide the following types:

Type Description
Integer The values will be taken as int values
Long The values will be taken as long values
Double The values will be taken as double values
Boolean The values will be taken as boolean values. If the set value is "true", it will be taken as the true boolean value. Otherwise, it will be taken as false.
String The values will be taken as they are (as strings)
LinkOrientation The values represent the selection of edges for a user. It can take three values: IN for the incoming neighborhood, OUT for the outgoing neighborhood and UND for the union of both.

Finally, for the values, there are two possible ways to describe them: indicating the individual values, as it is shown in the following example:

<values>
  <value>0.1</value>
  <value>0.2</value>
  ...
  <value>1.0</value>
</values>

or using ranges, as it is shown next:

<values>
  <range>
    <start>0.1</start>
    <end>1.0</end>
    <step>0.1</step>
  </range>
</values>

where start indicates the first value, end the last one (both included), and step indicates the difference between the values in the range.

Both ranges and individual values can be combined at will, even in the same parameter. For example:

<values>
  <value>1.0</value>
  <range>
    <start>0.1</start>
    <end>0.999</end>
    <step>0.1</step>
  </range>
</values>

Configuration parameters

Next, we include a table including the different parameter configurations we have selected for the experiments in our paper. We include these configurations in the conf folder. The notation followed is the same as the one indicated in the previous formulas. In addition, similarly to [1], we choose, for the directed graphs, different orientations for the target and candidate users' neighborhoods (the incoming, outgoing or undireted neighborhood of the users). In BM25 and EBM25 we also select a different neighborhood for the length.

Algorithm (and variants) Parameters
BM25
EBM25
QLD
QLJM
QLL
PL2
Pivoted normalization
Clone this wiki locally