Interviews & Experiments #4
lapplislazuli
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I had a chat with my supervisor, and what we discussed in the call is usually referred to as ground truth building.
Another option that my supervisor prefers is a controlled experiment. In a controlled experiment you have one group in vanilla conditions and one group in changed conditions which both are monitored (=measured) doing the same task.
For program repair, a draft could be:
We provide 10 buggy programs and have group A which starts with nothing, while group B gets HenProg and a short introduction how to use it. Then both groups get one hour to fix as many bugs as possible. Optionally Group C might gets a "Blind Start" and are only told there is a tool & here is the Documentation.
After such an experiment you can still ask questions to build a ground truth and get general feedback on the tool and the tasks, but my supervisor likes the experiment a lot more than just opinions, as he feels that opinions especially across programers vary not only from programmer to programmer greatly but also just within a single day whether you have a good day or not.
For controlled experiments there are big communities especially in the Human-Computer-Interaction Departments, so we can ask some of our colleagues. They do this all the time also with fancier stuff like Robots.
The tasks for the experiment should be simple enough to not be the actual challenge, so the GCD one is a good one or general math- and sorting- problems are always good candidates.
For programmers there are also cool tools by now such as Web-IDEs so it should be pretty easy to conduct the actual experiments. The minimum number of participants is 10 per group to make statistical sound conclusions.
However, if we want to draw conclusions on the tool, the tool must be SOTA or at least somewhat sophisticated.
Otherwise we cannot make good statements about program repair, if we just build a bad tool.
My supervisor also said that, if well done, this is stuff for a top-class venue.
In general steps in that direction would be:
Let me know what you think @Tritlo
Beta Was this translation helpful? Give feedback.
All reactions