-
Notifications
You must be signed in to change notification settings - Fork 2
MicroCreator_Chapter_3_General_Usage
The following chapter is part of the MicroCreator Manual.
The previous chapter is Chapter 2: Installation and First Use.
The next chapter is Chapter 4: Input Specification.
MicroCreator is a tool used to generate synthetic benchmarks. They are called synthetic since they are hand-tailored and not extracted from real-world applications. The benchmarks are generated with slight changes in order to study the impact of the changes on performance, energy consumption, and other factors. The tool uses an XML-based input file and generates either assembly code or C code, depending on options in the XML file or the command-line options.
As presented in Chapter 1: General Information, the use of the golden version is supposed. The golden version represents a stable state of the tool.
MicroCreator creates, from XML files, assembly or C code, generating variations of a given code structure. The creation process is done using an internal pass system and a user can also define new passes or modify existing ones to further enhance the entire system. The philosophy of the tool is its ease for the end-user. The following chapter presents how the tool's functions and, via examples, what is the general behavior of the tool.
The following section presents the input used by the tool and briefly specifies how these input files interact with MicroCreator. Second, it explains the expect output for the generated files.
As input, the tool uses a XML format defined in the: [MicroCreator_Chapter_4_Input_Specification|Chapter 4: Input Specification]].
An input file contains four different parts of information:
-
Prologue information
-
Epilogue information
-
Code to be generated
-
Hardware detection system
The prologue and epilogue are actually defined by the user for the generated code. Since the system generates assembly or C code, it is important to provide the outer elements of the generated code. For example, for a C program, a prologue includes the required header files, the function's signature, potentially some initial work for setting up data. For an assembly code, the prologue includes setting up the registers correctly or setting the stack.
MicroCreator's purpose is not to provide the prologue and epilogue for the user since the two are generally linked tightly to the actual to-be generated code. However, during testing and internal usage, examples/prologue.s and examples/epilogue.s were built for examples.
The output is generated code, whether assembly or C code. The tool does not verify the code correctness, it is the user's responsibility to ensure the content of the prologue and epilogue files are correct and MicroCreator's generated code does what the user intended.
The current section presents a couple of examples from the example directory.
There are a lot of example files in the example directory which are useful for seeing what the tool generates. The rest of the section shows a couple of examples in greater detail. However, more information can be found in 4_Input_Specification|Chapter 4: Input Specification].
The first example description_MOVAPD_st_L_LL_LLL.xml creates 510 programs because it generates the following regular expression (Store|Load){1-8}.
When running the program, it outputs:
$:~/Memory/microcreator$ ./microcreator examples/description_MOVAPD_st_L_LL_LLL.xml
Log last lines:
Stopping Register Allocation
Got a new kernel output 0x975c50:
Starting new kernel pass
Now execute the passCode Generation
Starting Code Generation
Opening file: output/example0509.s
Stopping Code Generation
PassEngine is stopping
Drove
Micro-creator shutting down
After the initial description node, the first part inserts the prologue file:
<kernel>
<insert_code>
<file>examples/prologue.s</file>
</insert_code>
</kernel>
Then the main kernel which contains only one instruction is inserted:
<kernel>
<!--instruction part -->
<!--allow instructions to be randomized in benchmarks-->
<instruction>
<operation>movapd</operation>
<memory>
<register>
<name>r1</name>
</register>
<offset>0</offset>
</memory>
<register>
<phyName>%xmm</phyName>
<min>0</min>
<max>8</max>
</register>
<swap_after_unroll/>
</instruction>
The kernel is not yet closed since there still is additional information.
The instruction in the kernel is a movapd instruction defined by the operation node. Its operands are a memory and a register operand. Finally, there is a swap_after_unroll specifier to allow the tool to swap the operands after having performed the unroll pass. When using the swap_after_unroll specifier, the tool unrolls the kernel and for each instruction in the unrolled version, it creates two variants: the original order of operands and the swapped version. Further information is given in Chapter 4: Input Specification].
The memory operand here contains a register name r1. r1 is a logical name which is later updated to a physical x86 register during the register allocation pass. The correspondence between logical names and physical names is defined by MicroDetector.
The offset informs the operand of its base offset.
A register operand, as opposed to a memory operand, uses a physical register name directly. Of course, it is also possible to use a logical name but the example only illustrates the use of physical register names.
The base physical register name used here is a Xmm register. Furthermore, during the unrolling process, the user might wish to use a different register, for example from xmm0 to xmm7 by putting xmm in the phyName and then adding the min/max.
The numbering is updated at each iteration of the unroll. Though the tool is not exactly unrolling since it is modifying register allocation, it is a useful technique when the user wishes to obtain:
#Unrolling, iteration 1 out of 6
movapd 0(%rsi), %xmm0
#Unrolling, iteration 2 out of 6
movapd %xmm1, 16(%rsi)
#Unrolling, iteration 3 out of 6
movapd 32(%rsi), %xmm2
#Unrolling, iteration 4 out of 6
movapd 48(%rsi), %xmm3
#Unrolling, iteration 5 out of 6
movapd %xmm4, 64(%rsi)
#Unrolling, iteration 6 out of 6
movapd %xmm5, 80(%rsi)
- Notes:
- The example is a LSLLSS scheme where L represents a Load and S a Store
- The numbers for the Xmm registers have increased at each new instruction
The example kernel contains more information about how the creator handles or modifies the kernel.
The example shows how to request an unroll factor from one to eight:
<unrolling>
<min>1</min>
<max>8</max>
<progress>1</progress>
</unrolling>
The progress is, of course, the step between each different unrolling factor. A progress of two would generate the following factors: 1, 3, 5, 7.
In the example, there are three induction variables:
-
One for the target and source address used by the load and store instructions
-
One for the iteration counter used as a return to the function
-
One for the loop counter which is reduced by two at every iteration (since the example is using movapd which loads and stores two doubles, the example needs to decrement by two at each iteration)
The first induction variable represents the target or source address:
<induction>
<register>
<name>r1</name>
</register>
<increment>16</increment>
<offset>16</offset>
<not_affected_unroll/>
</induction>
The register naming is also either logical or physical; in the current example, it is logical. The difference between increment and offset is whether the offset is used when handling unrolling of an instruction using the register or the increment is what is added at the end of the loop. In the example's case, the induction variable is not affected by the unroll. The increment will always be sixteen disregarding the unroll factor. Chapter 4 provides further information about the options.
The second induction variable is simpler and the only notable difference is the use of a physical register:
<induction>
<register>
<phyName>%eax</phyName>
</register>
<increment>1</increment>
<not_affected_unroll/>
</induction>
The last induction variable adds two new options:
<induction>
<register>
<name>r0</name>
</register>
<increment>-2</increment>
<not_affected_unroll/>
<linked>
<register>
<name>r1</name>
</register>
</linked>
<last_induction/>
</induction>
- The linked node means any modification of the stride chosen for the linked induction variable is copied for the current node
The last induction is used to force the code generator to have the induction variable as the last induction variable instruction of the kernel, which is useful when the branch instruction of the loop, generated after the induction variable instructions, is using the result of the last increment or decrement to decide whether to exit or not.
Finally, if a kernel wishes to be generated as a loop, it is necessary to provide the branch information of the loop:
<branch_information>
< label>L6</label>
<test>jge</test>
</branch_information>
The label is the label used for the loop and the test is the comparison instruction used for the branch instruction.
The rest of the file contains the epilogue insertion to the generated example files:
<kernel>
<insert_code>
<file>examples/epilogue.s</file>
</insert_code>
</kernel>
And, finally, the hardware detector information:
<hardware_detector>
<execute>../microdetect/microdetect ../microdetect/data/args.c ./microdetect/output</execute>
<information_file>../microdetect/output</information_file>
</hardware_detector>
-
First, the execute node explains how to execute the detector
-
Second, the information_file provides where the detector stores its output
The hardware detector section explains in greater detail the hardware_detector node.
The tool itself uses an xml file and certain command line options to define its behavior. It can create as many benchmarks as the user wishes. It is entirely linked and associated to the launcher and detector tools.
The following chapter is part of the MicroCreator Manual.
The previous chapter is Chapter 2: Installation and First Use.
The next chapter is Chapter 4: Input Specification.