|
| 1 | +# GraphGym |
| 2 | +GraphGym is a platform for designing and evaluating Graph Neural Networks (GNN). |
| 3 | +### Highlights |
| 4 | +**1. Highly modularized pipeline for GNN** |
| 5 | +- **Data:** Data loading, data splitting |
| 6 | +- **Model:** Modularized GNN implementation |
| 7 | +- **Tasks:** Node / edge / graph level GNN tasks |
| 8 | +- **Evaluation:** Accuracy, ROC AUC, ... |
| 9 | + |
| 10 | +**2. Reproducible experiment configuration** |
| 11 | +- Each experiment is *fully described by a configuration file* |
| 12 | + |
| 13 | +**3. Scalable experimental management** |
| 14 | +- Easily launch *thousands of GNN experiments in parallel* |
| 15 | +- *Auto-generate* experiment analyses and figures across random seeds and experiments. |
| 16 | + |
| 17 | +**4. Flexible user customization** |
| 18 | +- Easily *register your own modules* in [`graphgym/contrib/`](graphgym/contrib), such as data loaders, GNN layers, loss functions, etc. |
| 19 | + |
| 20 | +## Why GraphGym? |
| 21 | +**TL;DR:** GraphGym is great for GNN beginners, domain experts and GNN researchers. |
| 22 | + |
| 23 | +**Scenario 1:** You are a beginner to GNN, who wants to understand how GNN works. |
| 24 | + |
| 25 | +You probably have read many exciting papers on GNN, and try to write your own GNN implementation. |
| 26 | +Using existing packages for GNN, you still have to code up the essential pipeline on your own. |
| 27 | +GraphGym is a perfect place for your to start learning *standardized GNN implementation and evaluation*. |
| 28 | + |
| 29 | +<div align="center"> |
| 30 | + <img align="center" src="docs/design_space.png" width="400px" /> |
| 31 | + <figcaption><b><br>Figure 1: Modularized GNN implementation</b></figcaption> |
| 32 | +</div> |
| 33 | + |
| 34 | +<br> |
| 35 | + |
| 36 | +**Scenario 2:** You want to apply GNN to your exciting applications. |
| 37 | + |
| 38 | +You probably know that there are hundreds of possible GNN models, and selecting the best model is notoriously hard. |
| 39 | +Even worse, we have shown in our [paper](https://papers.nips.cc/paper/2020/file/c5c3d4fe6b2cc463c7d7ecba17cc9de7-Paper.pdf) that the best GNN designs for different tasks differ drastically. |
| 40 | +GraphGym provides a *simple interface to try out thousands of GNNs in parallel* and understand the best designs for your specific task. |
| 41 | +GraphGym also recommends a "go-to" GNN design space, after investigating 10 million GNN model-task combinations. |
| 42 | + |
| 43 | +<div align="center"> |
| 44 | + <img align="center" src="docs/rank.png" width="1000px" /> |
| 45 | + <figcaption><b><br>Figure 2: A guideline for desirable GNN design choices. <br>(Sampling from 10 million GNN model-task combinations.) </b></figcaption> |
| 46 | +</div> |
| 47 | + |
| 48 | +<br> |
| 49 | + |
| 50 | + |
| 51 | +**Scenario 3:** You are a GNN researcher, who want to innovate GNN models / propose new GNN tasks. |
| 52 | + |
| 53 | +Say you have proposed a new GNN layer `ExampleConv`. |
| 54 | +GraphGym can help you convincingly argue that `ExampleConv` is better than say `GCNConv`: |
| 55 | +when randomly sample from 10 millions possible model-task combinations, how often `ExampleConv` will outperform `GCNConv`, when everything else is fixed (including the computational cost). |
| 56 | +Moreover, GraphGym can help you easily do hyper-parameter search, and *visualize* what design choices are better. |
| 57 | +In sum, GraphGym can greatly facilitate your GNN research. |
| 58 | + |
| 59 | +<div align="center"> |
| 60 | + <img align="center" src="docs/evaluation.png" width="1000px" /> |
| 61 | + <figcaption><b><br>Figure 3: Evaluation of a given GNN design dimension (BatchNorm here).</b></figcaption> |
| 62 | +</div> |
| 63 | + |
| 64 | +<br> |
| 65 | + |
| 66 | +## Installation |
| 67 | + |
| 68 | +**Requirements** |
| 69 | + |
| 70 | +- CPU or NVIDIA GPU, Linux, Python3 |
| 71 | +- PyTorch, various Python packages; Instructions for installing these dependencies are found below |
| 72 | + |
| 73 | + |
| 74 | +**1. Python environment** |
| 75 | +We recommend using Conda package manager |
| 76 | + |
| 77 | +```bash |
| 78 | +conda create -n graphgym python=3.7 |
| 79 | +source activate graphgym |
| 80 | +``` |
| 81 | + |
| 82 | +**2. Pytorch:** |
| 83 | +Install [PyTorch](https://pytorch.org/). |
| 84 | +We have verified under PyTorch 1.4.0 and torchvision 0.5.0. For example: |
| 85 | +```bash |
| 86 | +pip install torch==1.4.0 torchvision==0.5.0 |
| 87 | +``` |
| 88 | + |
| 89 | +**3. Pytorch Geometric:** |
| 90 | +Install [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html), |
| 91 | +follow their instructions. For example: |
| 92 | +```bash |
| 93 | +# CUDA versions: cpu, cu92, cu101, cu102, cu110 |
| 94 | +# TORCH versions: 1.4.0, 1.5.0, 1.6.0, 1.7.0 |
| 95 | +CUDA=cu101 |
| 96 | +TORCH=1.4.0 |
| 97 | +pip install torch-scatter==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-${TORCH}.html |
| 98 | +pip install torch-sparse==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-${TORCH}.html |
| 99 | +pip install torch-cluster==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-${TORCH}.html |
| 100 | +pip install torch-spline-conv==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-${TORCH}.html |
| 101 | +pip install torch-geometric |
| 102 | +``` |
| 103 | + |
| 104 | +**4. Clone GraphGym and install other dependencies:** |
| 105 | + |
| 106 | +```bash |
| 107 | +git clone https://github.com/snap-stanford/GraphGym |
| 108 | +cd GraphGym |
| 109 | +pip install -r requirements.txt |
| 110 | +python setup.py develop |
| 111 | +``` |
| 112 | + |
| 113 | + |
| 114 | +**5. Test the installation** |
| 115 | + |
| 116 | +Run a test GNN experiment using GraphGym, specified in [`run/configs/example.yaml`](run/configs/example.yaml). |
| 117 | +The experiment is about node classification on Cora dataset (random 80/20 train/val split). |
| 118 | +```bash |
| 119 | +cd run |
| 120 | +bash run_single.sh |
| 121 | +``` |
| 122 | + |
| 123 | +## GraphGym Usage |
| 124 | + |
| 125 | +### 1 Run a single GNN experiment |
| 126 | +A full example is specified in [`run/run_single.sh`](run/run_single.sh). |
| 127 | + |
| 128 | +**1.1 Specify a configuration file.** |
| 129 | +In GraphGym, an experiment is fully specified by a `.yaml` file. |
| 130 | +Unspecified configurations in the `.yaml` file will be populated by the default values in [`graphgym/config.py`](graphgym/config.py). |
| 131 | +For example, in [`run/configs/example.yaml`](run/configs/example.yaml), there are configurations on dataset, training, model, GNN, etc. |
| 132 | +Descriptions for each configuration is described also in [`graphgym/config.py`](graphgym/config.py). |
| 133 | + |
| 134 | +**1.2 Launch an experiment.** |
| 135 | +For example, in [`run/run_single.sh`](run/run_single.sh): |
| 136 | +```bash |
| 137 | +python main.py --cfg configs/example.yaml --repeat 3 |
| 138 | +``` |
| 139 | +You can specify the number of different random seeds to repeat via `--repeat`. |
| 140 | + |
| 141 | +**1.3 Understand the results.** |
| 142 | +Experimental results will be automatically saved in directory `run/results/${CONFIG_NAME}/`; |
| 143 | +in the example above, it is `run/results/example/`. |
| 144 | +Results for different random seeds will be saved in different subdirectories, such as `run/results/example/2`. |
| 145 | +The aggregated results over all the random seeds are *automatically* generated into `run/results/example/agg`, |
| 146 | +including the mean and standard deviation `_std` for each metric. |
| 147 | +Train/val/test results are further saved into subdirectories, such as `run/results/example/agg/val`; here, |
| 148 | +`stats.json` stores the results after each epoch aggregated across random seeds, |
| 149 | +`best.json` stores the results at *the epoch with the highest validation accuracy*. |
| 150 | + |
| 151 | +### 2 Run a batch of GNN experiments |
| 152 | +A full example is specified in [`run/run_batch.sh`](run/run_batch.sh). |
| 153 | + |
| 154 | +**2.1 Specify a base file.** |
| 155 | +GraphGym supports running a batch of experiments. |
| 156 | +To start, a user needs to select a base architecture `--config`. |
| 157 | +The batch of experiments will be created by perturbing certain configurations of the base architecture. |
| 158 | + |
| 159 | +**2.2 (Optional) Specify a base file for computational budget.** |
| 160 | +Additionally, GraphGym allows a user to select a base architecture to *control the computational budget* for the grid search, `--config_budget`. |
| 161 | +The computational budget is currently measured by the number of trainable parameters; the control is achieved by auto-adjust |
| 162 | +the hidden dimension size for GNN. |
| 163 | +If no `--config_budget` is provided, GraphGym will not control the computational budget. |
| 164 | + |
| 165 | +**2.3 Specify a grid file.** |
| 166 | +A grid file describes how to perturb the base file, in order to generate the batch of the experiments. |
| 167 | +For example, the base file could specify an experiment of 3-layer GCN for Cora node classification. |
| 168 | +Then, the grid file specifies how to perturb the experiment along different dimension, such as number of layers, |
| 169 | +model architecture, dataset, level of task, etc. |
| 170 | + |
| 171 | + |
| 172 | +**2.4 Generate config files for the batch of experiments,** based on the information specified above. |
| 173 | +For example, in [`run/run_batch.sh`](run/run_batch.sh): |
| 174 | +```bash |
| 175 | +python configs_gen.py --config configs/${DIR}/${CONFIG}.yaml \ |
| 176 | + --config_budget configs/${DIR}/${CONFIG}.yaml \ |
| 177 | + --grid grids/${DIR}/${GRID}.txt \ |
| 178 | + --out_dir configs |
| 179 | +``` |
| 180 | + |
| 181 | +**2.5 Launch the batch of experiments.** |
| 182 | +For example, in [`run/run_batch.sh`](run/run_batch.sh): |
| 183 | +```bash |
| 184 | +bash parallel.sh configs/${CONFIG}_grid_${GRID} $REPEAT $MAX_JOBS $SLEEP |
| 185 | +``` |
| 186 | +Each experiment will be repeated for `$REPEAT` times. |
| 187 | +We implemented a queue system to sequentially launch all the jobs, with `$MAX_JOBS` concurrent jobs running at the same time. |
| 188 | +In practice, our system works great when handling thousands of jobs. |
| 189 | + |
| 190 | +**2.6 Understand the results.** |
| 191 | +Experimental results will be automatically saved in directory `run/results/${CONFIG_NAME}_grid_${GRID_NAME}/`; |
| 192 | +in the example above, it is `run/results/example_grid_example/`. |
| 193 | +After running each experiment, GraphGym additionally automatically averages across different models, saved in |
| 194 | +`run/results/example_grid_example/agg`. |
| 195 | +There, `val.csv` represents validation accuracy for each model configuration at the *final* epoch, |
| 196 | +and `val_best.csv` represents the results at the epoch with the highest validation error. |
| 197 | +When test set split is provided, `test.csv` represents test accuracy for each model configuration at the *final* epoch, |
| 198 | +and `test_best.csv` represents the results at the epoch with the highest validation error. |
| 199 | + |
| 200 | + |
| 201 | + |
| 202 | + |
| 203 | + |
| 204 | +### 3 Analyze the results |
| 205 | +We provides a handy tool to automatically provide an overview of a batch of experiments in |
| 206 | +[`analysis/example.ipynb`](analysis/example.ipynb). |
| 207 | +```bash |
| 208 | +cd analysis |
| 209 | +jupyter notebook |
| 210 | +example.ipynb # automatically provide an overview of a batch of experiments |
| 211 | +``` |
| 212 | + |
| 213 | + |
| 214 | + |
| 215 | +### 4 User customization |
| 216 | +A highlight of GraphGym is that it allows users to easily register their customized modules. |
| 217 | +The supported customized modules are provided in directory [`graphgym/contrib/`](graphgym/contrib/), including: |
| 218 | +- Activation [`graphgym/contrib/act/`](graphgym/contrib/act/), |
| 219 | +- Configuration [`graphgym/contrib/config/`](graphgym/contrib/config/), |
| 220 | +- Feature augmentation [`graphgym/contrib/feature_augment/`](graphgym/contrib/feature_augment/), |
| 221 | +- Feature encoder [`graphgym/contrib/feature_encoder/`](graphgym/contrib/feature_encoder/), |
| 222 | +- GNN head [`graphgym/contrib/head/`](graphgym/contrib/head/), |
| 223 | +- GNN layer [`graphgym/contrib/layer/`](graphgym/contrib/layer/), |
| 224 | +- Data loader [`graphgym/contrib/loader/`](graphgym/contrib/loader/), |
| 225 | +- Loss function [`graphgym/contrib/loss/`](graphgym/contrib/loss/), |
| 226 | +- GNN network [`graphgym/contrib/network/`](graphgym/contrib/network/), |
| 227 | +- Optimizer [`graphgym/contrib/optimizer/`](graphgym/contrib/optimizer/), |
| 228 | +- GNN global pooling (graph classification only) [`graphgym/contrib/pooling/`](graphgym/contrib/pooling/), |
| 229 | +- GNN stage [`graphgym/contrib/stage/`](graphgym/contrib/stage/), |
| 230 | +- Data transformations [`graphgym/contrib/transform/`](graphgym/contrib/transform/). |
| 231 | + |
| 232 | +Within each directory, (at least) an example is provided, showing how to register user customized modules. |
| 233 | +Note that new user customized modules may result in new configurations; in these cases, new configuration fields |
| 234 | +should be registered at [`graphgym/contrib/config/`](graphgym/contrib/config/). |
| 235 | + |
| 236 | +**Note: Applying to your own datasets.** |
| 237 | +A common use case will be applying GraphGym to your favorite datasets. |
| 238 | +To do so, you may follow our example in [`graphgym/contrib/loader/example.py`](graphgym/contrib/loader/example.py). |
| 239 | +To provide more flexibility, GraphGym currently accepts a list of [NetworkX](https://networkx.org/documentation/stable/index.html) graphs |
| 240 | +or [DeepSNAP](https://github.com/snap-stanford/deepsnap) graphs as the input; |
| 241 | +the following attributes in the graphs will be auto_loaded and parsed `node_feature`, `node_label`, `edge_feature`, `edge_label`, |
| 242 | +`graph_feature`, `graph_label`. |
| 243 | +Additionally, we have provided examples on how to transform [PyG](https://pytorch-geometric.readthedocs.io/en/latest/) datasets into the accepted format. |
| 244 | +Further details on the data representation is described in [DeepSNAP documentation](https://snap.stanford.edu/deepsnap/notes/introduction.html#graph-in-deepsnap). |
| 245 | + |
| 246 | + |
| 247 | + |
| 248 | +## Use case: Design Space for Graph Neural Networks (NeurIPS 2020 Spotlight) |
| 249 | + |
| 250 | +Reproducing experiments in *[Design Space for Graph Neural Networks](https://papers.nips.cc/paper/2020/file/c5c3d4fe6b2cc463c7d7ecba17cc9de7-Paper.pdf)*, Jiaxuan You, Rex Ying, Jure Leskovec, **NeurIPS 2020 Spotlight**. |
| 251 | + |
| 252 | +```bash |
| 253 | +# NOTE: We include the raw results with GraphGym |
| 254 | +# If you run the following code, the results will be overridden. |
| 255 | +cd run |
| 256 | +bash run_design_round1.sh # first round experiments, on a design space of 315K GNN designs |
| 257 | +bash run_design_round2.sh # second round experiments, on a design space of 96 GNN designs |
| 258 | +cd ../analysis |
| 259 | +jupyter notebook |
| 260 | +design_space.ipynb # reproducing all the analyses in the paper |
| 261 | +``` |
| 262 | + |
| 263 | + |
| 264 | +## Contributors |
| 265 | +[Jiaxuan You](https://cs.stanford.edu/~jiaxuan/) initiates the project and majorly contributes to the entire GraphGym platform. |
| 266 | +[Rex Ying](https://cs.stanford.edu/people/rexy/) contributes to the feature augmentation modules. |
| 267 | +Jonathan Gomes Selman enables GraphGym to have OGB support. |
| 268 | + |
| 269 | +GraphGym is inspired by the framework of [pycls](https://github.com/facebookresearch/pycls). |
| 270 | +GraphGym adopt [DeepSNAP](https://github.com/snap-stanford/deepsnap) as the data representation, which is a Python library that assists efficient deep learning on graphs. |
| 271 | +Part of GraphGym relies on [Pytorch Geometric](https://github.com/rusty1s/pytorch_geometric) functionalities. |
| 272 | + |
| 273 | +## Contributing |
| 274 | + |
| 275 | +We warmly welcome the community to contribute to GraphGym. |
| 276 | +GraphGym is particularly designed to enable contribution / customization in a simple way. |
| 277 | +For example, you may contribute your modules to [`graphgym/contrib/`](graphgym/contrib/) by creating pull requests. |
| 278 | + |
| 279 | +## Citing our paper |
| 280 | +If you find GraphGym or our paper useful, please cite our paper: |
| 281 | +``` |
| 282 | +@InProceedings{you2020design, |
| 283 | + title = {Design Space for Graph Neural Networks}, |
| 284 | + author = {You, Jiaxuan and Ying, Rex and Leskovec, Jure}, |
| 285 | + booktitle = {NeurIPS}, |
| 286 | + year = {2020} |
| 287 | +} |
| 288 | +``` |
0 commit comments