-
Notifications
You must be signed in to change notification settings - Fork 2
MicroCreator_Chapter_4_Input_Specification
This chapter is part of the MicroCreator Manual.
The previous chapter is Chapter 3: General Usage.
The next chapter is Chapter 5: Error Messages.
The page specifies the input file format and the existing options for each element, etc. With the number of nodes increasing, to ease the navigation throughout the page, a table index is provided. Each cell represents a node and the columns represents the level at which a node may appear and in which node it does.
The description node is the root node of the input containing no special options. The purpose of the description node is to contain the rest of the nodes: kernels, code insertions, etc.
The kernel node is one of the most important nodes since it allows the definition of which instructions are going to be generated, how to handle unrolling, etc. The compiler community defines a basic block as a series of instructions with one instruction flow entry and a single instruction flow exit.
Instruction is the basic node to define an instruction in assembly form. It should contain the name of the operation and its operands. There are also a few nodes, such as choose_operation_before_unroll, which help define how to generate or modify the instruction generation process via various MicroCreator passes.
The node is contained in a Instruction node. The default behavior if the instruction node contains multiple operation nodes is to perform the choice before the unrolling.
When the user inserts the choose_operation_after_unroll node in the instruction, and having defined multiple operation nodes, the choice of which operation to use is performed after the unrolling.
An immediate operand defines any numerical value such as one, five, or forty-two. However, in certain cases, the user might require trying various values for the immediate value. If the user wants to generate programs which each use one of the values one, two, three, four, five or six. As a result, there are two variations of the operand: a single value or a range of values.
Accordingly, the immediate operand can contain:
-
A single value node and no min, max, or progress nodes
-
Minimum, maximum, and progress values
Examples are:
- Single:
<immediate>
<value>5</value>
</immediate>
- Range:
Generate every value between 1 and 1024:
<immediate>
<min>1</min>
<max>1024</max>
</immediate>
Generate every other value between 1 and 1024:
<immediate>
<min>1</min>
<max>1024</max>
<progress>2</progress>
</immediate>
- Note Errors regarding the use of options are available in Chapter 5.
The immediate_before_unroll modifies the moment where an immediate value is selected. When the user provides a range of values, MicroCreator either generates the variations of values before the unrolling process or after. The immediate_before_unroll performs the selection before.
The immediate_after_unroll modifies the moment where an immediate value is selected. When the user provides a range of values, MicroCreator either generates the variations of values before the unrolling process or after. The immediate_before_unroll performs the selection after.
MicroCreator generates the following assembly code with an indirect memory operand:
movaps 0(%rsi, %rdx, 8), %xmm0 #A store instruction
The first operand of the instruction is the indirect memory operand. The zero is the offset, rsi is the base register, or base address, rdx is the index register and eight is the multiplier.
The address calculated for the store instruction is:
address = 0 + %rsi + %rdx * 8
For MicroCreator, the corresponding node contains two normal register operands:
-
An index register
-
A base register
The node also can contain an offse' node and a multiplier node.
The XML code used to generate the assembly example above is:
<indirect_memory>
<base>
<name>r1</name>
</base>
<index>
<name>r2</name>
</index>
<multiplier>8</multiplier>
<offset>0</offset>
</indirect_memory>
- Remember:
- [r1] and [r2] are logical register names and are later transformed by the detector tool into valid physical register names
- A register node contains either a name or a phyName to define respectively a logical or a physical name
As opposed to the indirect_memory operand, the operand is directly an address register with an offset such as:
0(%rdi)
To create a memory operand, define:
-
An offset node with the integer value
The following XML code generates the operand 0(%rdi):
<memory>
<register>
<phyName>%rdi</phyName>
</register>
<offset>0</offset>
</memory>
The operation is generally the name of the intended instruction:
<operation>movss</operation>
It is also possible to define multiple operations and generate programs with different operation choices. Imagine a program generation requiring the comparison between movss and movsd:
<instruction>
<operation>movss</operation>
<operation>movsd</operation>
</instruction>
The previous instruction node generates one program with the instruction with an operation movss and another program with the instruction created with movsd.
However, as the unroll pass is a bit particular for MicroCreator, there is a before and after unrolling option.
Before unrolling, each program either has movss or movsd for the operation of the instructions. On the other hand, after unrolling, MicroCreator generates any combination of the two operations.
If the unroll factor is 2, choosing the operation before leads to two different programs:
-
Movss, movss
-
Movsd, movsd
However, choosing after the unrolling adds two additional programs:
-
Movss, movsd
-
Movsd, movss
A register is the base operand for an instruction since it represents a register in the assembly file.
The options define whether it is a logical named register, defined later by the MicroDetector tool, or it is a physical named register.
To define a logical name:
<register>
<name>r0</name>
</register>
To describe a physical register name:
<register>
<phyName>%eax</phyName>
</register>
Additionally, sometimes the user might want to have different registers after unrolling. MicroCreator permits numerical values by defining a minimum, a maximum, or a progress node:
<register>
<phyName>%xmm</phyName>
<min>0</min>
<max>8</max>
</register>
Such a register definition generates registers %xmm0, %xmm1, . . ., and %xmm8 after unrolling. Further information is given on the unrolling page.
Sometimes, the user wants to have repetitions of a given instruction. The node, included in an instruction, informs the tool of the minimum and maximum number of repetitions:
<repetition>
<min>1</min>
<max>5</max>
</repetition>
Swapping operands generates two versions of the instruction. There are two possibilities, swapping the operands before or after unrolling.
Using the swap_before_unroll node, MicroCreator swaps the operands before unrolling. If the user defines an instruction op A, B, with an unroll factor of two, the tool generates two programs:
op A, B
op A, B
op B, A
op B, A
Swapping operands generates two versions of the instruction. There are two possibilities, swapping the operands before or after unrolling.
Using the swap_after_unroll node, MicroCreator swaps the operands after unrolling. If the user defines an instruction op A, B, with an unroll factor of two, the tool generates four programs:
op A, B
op A, B
op A, B
op B, A
op B, A
op A, B
op B, A
op B, A
When inserting code directly, the insert_code node is used. There are two options: either the user inserts an instruction verbatim or from a particular file.
When inserting only one instruction, MicroCreator supports the instruction node:
<insert_code>
<instruction>xor %xmm0, %xmm0</instruction>
</insert_code>
Such an insert_code generates the assembly instruction xor %xmm0, %xmm0.
The second variation copy-pastes code from a file directly into the generated code. For instance:
<a first kernel/>
<kernel>
<insert_code>examples/prologue.s</insert_code>
</kernel>
<another kernel/>
The kernel would then copy paste the code from examples/prologue.s between the code generated for the two other kernels.
Using such a node randomizes the generation of the instruction scheduling. The system generates every possible schedule of a kernel. If the kernel contains an instruction A and an instruction B, the tool generates the following schedules:
-
A - B
-
B - A
Unrolling allows to generate various versions of the kernel by generating multiple unroll factors. The internal nodes are:
-
Min: the minimum value of the unrolling factor
-
Max: the maximum value of the unrolling factor
-
Progress: the step between unrolling factors
To inform the generator to create eight versions using an unroll factor of one to eight:
<unrolling>
<min>1</min>
<max>8</max>
<progress>1</progress>
</unrolling>
A progress of two would generate factors one, three, five, and seven.
For further information, please refer to the unrolling page.
An induction variable is a variable which is updated at each iteration of the loop.
The comment option allows the user to specify a comment to add in the assembly output.
<induction>
<comment>Induction_status:Success</comment>
<register>
<name>r1</name>
</register>
<increment>64</increment>
</induction>
creates:
add $64, %rsi # Induction_status:Success
The stride option allows the user to define the minimum and maximum stride for an induction variable. If no stride is defined, the stride used is one.
To define different strides, the user can add two nodes: a minimum and a maximum. To define a stride variation from five to ten:
<stride>
<min>5</min>
<max>10</max>
</stride>
The definition of register has already been defined above. The induction variable uses the same definition as a generic register in an instruction.
When the unrolling passes chooses a factor, it updates the induction variable's increment or decrement value by the same factor.
Thus, consider:
for (i = 0; i < 10; i++)
If the loop is unrolled twice, it becomes:
for (i = 0; i < 10; i + = 2)
The option renders the induction variable's update independent of the unrolling factor, thus never modified.
The last_induction node defines which induction variable is the last induction variable to be updated. It is useful when the result of the arithmetic operation involving the variable is used to decide whether to branch or not.
Consider:
<induction>
#Induction variable A
<last_induction/>
<induction>
<induction>
#Induction variable B
<induction>
The XML code generates:
Update of induction variable B
Update of induction variable A
The offset of an induction variable determines by how much it should be updated in the case of unrolling.
Consider the following code:
movaps 0(%rsi), %xmm0 # Load 16 bytes to xmm0
add $16, %rsi # Add 16 to rsi to prepare for the next iteration
Now, if MicroCreator unrolls the code without a defined offset, it generates:
movaps 0(%rsi), %xmm0 # Load 16 bytes to xmm0
movaps 0(%rsi), %xmm1 # Load 16 bytes to xmm1
add $32, %rsi # Add 32 to rsi to prepare for the next iteration
Such a code loads the same value into both Xmm variables.
If the user defines an offset of sixteen in the induction variable with:
<offset>16</offset>
MicroCreator generates:
movaps 0(%rsi), %xmm0 # Load 16 bytes to xmm0
movaps 16(%rsi), %xmm1 # Load 16 bytes to xmm1
add $32, %rsi # Add 32 to rsi to prepare for the next iteration
Increment provides the value which is added or subtracted to the induction variable at each iteration.
Linked allows the system to be strided in the same way as the linked induction variable. In certain cases, it is important to have two induction variables strided the same way.
When the kernel node is used to define a loop, it needs to contain the branch_information node. A branch_information node should contain a label to define the loop label and test is the test instruction to be used:
<branch_information>
< label>L6</label>
<test>jge</test>
</branch_information>
It is the user's job to make sure the label is unused in the rest of the file. Otherwise, the behavior is undefined.
The node is used to define information for the OpenMP loop structure. It contains three possible sub-nodes:
-
Induction: which represents the name of the induction operand of the loop
-
count_iteration: which represents the count information for handling the loop
-
omp_option: which provides what OMP options the user requires
The count_iteration node provides the information necessary for handling a loop in an OpenMP code. It contains three possible sub-nodes:
-
Induction: which represents the name of the induction operand of the loop
-
Min: the initial value for the loop
-
Max: the last value for the loop
A sample:
<count_iteration>
<induction>i</induction>
<min>0</min>
<max>100</max>
</count_iteration>
generates:
for (i = 0; i < 100; i++)
The node provides the user options for the OMP pragma. It currently contains three OpenMP possibilities private, shared, and first_private.
The following option:
<omp_option>
<shared>r0</shared>
<private>x0</private>
</omp_option>
generates:
#pragma omp parallel for shared (r0) private (x0)
The node defines the maximal number of benchmarks created by the tool:
<benchmark_amount>100</benchmark_amount>
The hardware detection is the link from the creation tool to the detector tool. The detector tool allows the register allocation system to go from virtual registers to physical registers.
The node takes two internal nodes:
-
Execute: the string used to execute the detection tool and what command-line options should be used
-
Information_file: the output generated by the detection tool
An example is:
<hardware_detector>
<execute>../microdetect/microdetect ../microdetect/data/args.c ../microdetect/output</execute>
<information_file>../microdetect/output</information_file>
</hardware_detector>
Adding a verbose node into the description tool sets the system into a verbosis tool. The difference between the normal silent mode and the verbosis mode is it creates an output file for each pass, giving extra information on what has happened.
- ote:be wary for a huge generation, the number of files generated for the verbosis can become large
This chapter is part of the MicroCreator Manual.
The previous chapter is Chapter 3: General Usage.
The next chapter is Chapter 5: Error Messages.