Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specific command to reproduce results? #27

Open
AustinT opened this issue Nov 27, 2023 · 0 comments
Open

Specific command to reproduce results? #27

AustinT opened this issue Nov 27, 2023 · 0 comments

Comments

@AustinT
Copy link

AustinT commented Nov 27, 2023

From your README.md file it was not 100% clear exactly what command(s) I should run if I want to get results for a new baseline algorithm and extract the AUC top 10 metric as detailed in Table 2 of your paper. You provide several commands to e.g. change the number of seeds or the starting SMILES file, but it is not clear exactly which setting corresponds to the experiment done in the paper.

Can you provide some clarification on this? My best guess is something like the following script, which (as far as I can tell) will run all 23 oracles from your paper with 5 seeds, 10k max oracle calls, and the same starting SMILES which you used in your paper:

oracle_array=('jnk3' 'gsk3b' 'celecoxib_rediscovery' \\
    'troglitazone_rediscovery' \\
    'thiothixene_rediscovery' 'albuterol_similarity' 'mestranol_similarity' \\
    'isomers_c7h8n2o2' 'isomers_c9h10n2o2pf2cl' 'median1' 'median2' 'osimertinib_mpo' \\
    'fexofenadine_mpo' 'ranolazine_mpo' 'perindopril_mpo' 'amlodipine_mpo' \\
    'sitagliptin_mpo' 'zaleplon_mpo' 'valsartan_smarts' 'deco_hop' 'scaffold_hop' 'qed' 'drd2')
for orac in "\${oracle_array[@]}" ; do
    python run.py MODEL_NAME --task production --oracles $orac
done

Afterwards a series of post-processing steps would be needed to extract the results. These are also not super clear to me, and I just made a PR that I think could make this easier (#26). If you are happy to merge #26 then I could write a follow-up PR which parses log file outputs to produce a table like Table 2. I think something like this would lower the barrier towards using the benchmark, and also help ensure that people are using it appropriately 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant