Skip to content

Commit 24d6538

Browse files
committed
Release cleaned version
Release cleaned version
0 parents  commit 24d6538

File tree

125 files changed

+30443
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

125 files changed

+30443
-0
lines changed

.gitattributes

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# Auto detect text files and perform LF normalization
2+
* text=auto

.gitignore

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
**/data_dir/
2+
run/datasets/data/
3+
**/__pycache__/
4+
**/.ipynb_checkpoints
5+
.idea/

LICENSE

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
Copyright (c) 2020 Jiaxuan You, Rex Ying, Jonathan Gomes Selman
2+
Copyright (c) Facebook, Inc. and its affiliates.
3+
Additional copyrights are specified in relevant subdirectories.
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

+288
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,288 @@
1+
# GraphGym
2+
GraphGym is a platform for designing and evaluating Graph Neural Networks (GNN).
3+
### Highlights
4+
**1. Highly modularized pipeline for GNN**
5+
- **Data:** Data loading, data splitting
6+
- **Model:** Modularized GNN implementation
7+
- **Tasks:** Node / edge / graph level GNN tasks
8+
- **Evaluation:** Accuracy, ROC AUC, ...
9+
10+
**2. Reproducible experiment configuration**
11+
- Each experiment is *fully described by a configuration file*
12+
13+
**3. Scalable experimental management**
14+
- Easily launch *thousands of GNN experiments in parallel*
15+
- *Auto-generate* experiment analyses and figures across random seeds and experiments.
16+
17+
**4. Flexible user customization**
18+
- Easily *register your own modules* in [`graphgym/contrib/`](graphgym/contrib), such as data loaders, GNN layers, loss functions, etc.
19+
20+
## Why GraphGym?
21+
**TL;DR:** GraphGym is great for GNN beginners, domain experts and GNN researchers.
22+
23+
**Scenario 1:** You are a beginner to GNN, who wants to understand how GNN works.
24+
25+
You probably have read many exciting papers on GNN, and try to write your own GNN implementation.
26+
Using existing packages for GNN, you still have to code up the essential pipeline on your own.
27+
GraphGym is a perfect place for your to start learning *standardized GNN implementation and evaluation*.
28+
29+
<div align="center">
30+
<img align="center" src="docs/design_space.png" width="400px" />
31+
<figcaption><b><br>Figure 1: Modularized GNN implementation</b></figcaption>
32+
</div>
33+
34+
<br>
35+
36+
**Scenario 2:** You want to apply GNN to your exciting applications.
37+
38+
You probably know that there are hundreds of possible GNN models, and selecting the best model is notoriously hard.
39+
Even worse, we have shown in our [paper](https://papers.nips.cc/paper/2020/file/c5c3d4fe6b2cc463c7d7ecba17cc9de7-Paper.pdf) that the best GNN designs for different tasks differ drastically.
40+
GraphGym provides a *simple interface to try out thousands of GNNs in parallel* and understand the best designs for your specific task.
41+
GraphGym also recommends a "go-to" GNN design space, after investigating 10 million GNN model-task combinations.
42+
43+
<div align="center">
44+
<img align="center" src="docs/rank.png" width="1000px" />
45+
<figcaption><b><br>Figure 2: A guideline for desirable GNN design choices. <br>(Sampling from 10 million GNN model-task combinations.) </b></figcaption>
46+
</div>
47+
48+
<br>
49+
50+
51+
**Scenario 3:** You are a GNN researcher, who want to innovate GNN models / propose new GNN tasks.
52+
53+
Say you have proposed a new GNN layer `ExampleConv`.
54+
GraphGym can help you convincingly argue that `ExampleConv` is better than say `GCNConv`:
55+
when randomly sample from 10 millions possible model-task combinations, how often `ExampleConv` will outperform `GCNConv`, when everything else is fixed (including the computational cost).
56+
Moreover, GraphGym can help you easily do hyper-parameter search, and *visualize* what design choices are better.
57+
In sum, GraphGym can greatly facilitate your GNN research.
58+
59+
<div align="center">
60+
<img align="center" src="docs/evaluation.png" width="1000px" />
61+
<figcaption><b><br>Figure 3: Evaluation of a given GNN design dimension (BatchNorm here).</b></figcaption>
62+
</div>
63+
64+
<br>
65+
66+
## Installation
67+
68+
**Requirements**
69+
70+
- CPU or NVIDIA GPU, Linux, Python3
71+
- PyTorch, various Python packages; Instructions for installing these dependencies are found below
72+
73+
74+
**1. Python environment**
75+
We recommend using Conda package manager
76+
77+
```bash
78+
conda create -n graphgym python=3.7
79+
source activate graphgym
80+
```
81+
82+
**2. Pytorch:**
83+
Install [PyTorch](https://pytorch.org/).
84+
We have verified under PyTorch 1.4.0 and torchvision 0.5.0. For example:
85+
```bash
86+
pip install torch==1.4.0 torchvision==0.5.0
87+
```
88+
89+
**3. Pytorch Geometric:**
90+
Install [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html),
91+
follow their instructions. For example:
92+
```bash
93+
# CUDA versions: cpu, cu92, cu101, cu102, cu110
94+
# TORCH versions: 1.4.0, 1.5.0, 1.6.0, 1.7.0
95+
CUDA=cu101
96+
TORCH=1.4.0
97+
pip install torch-scatter==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-${TORCH}.html
98+
pip install torch-sparse==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-${TORCH}.html
99+
pip install torch-cluster==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-${TORCH}.html
100+
pip install torch-spline-conv==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-${TORCH}.html
101+
pip install torch-geometric
102+
```
103+
104+
**4. Clone GraphGym and install other dependencies:**
105+
106+
```bash
107+
git clone https://github.com/snap-stanford/GraphGym
108+
cd GraphGym
109+
pip install -r requirements.txt
110+
python setup.py develop
111+
```
112+
113+
114+
**5. Test the installation**
115+
116+
Run a test GNN experiment using GraphGym, specified in [`run/configs/example.yaml`](run/configs/example.yaml).
117+
The experiment is about node classification on Cora dataset (random 80/20 train/val split).
118+
```bash
119+
cd run
120+
bash run_single.sh
121+
```
122+
123+
## GraphGym Usage
124+
125+
### 1 Run a single GNN experiment
126+
A full example is specified in [`run/run_single.sh`](run/run_single.sh).
127+
128+
**1.1 Specify a configuration file.**
129+
In GraphGym, an experiment is fully specified by a `.yaml` file.
130+
Unspecified configurations in the `.yaml` file will be populated by the default values in [`graphgym/config.py`](graphgym/config.py).
131+
For example, in [`run/configs/example.yaml`](run/configs/example.yaml), there are configurations on dataset, training, model, GNN, etc.
132+
Descriptions for each configuration is described also in [`graphgym/config.py`](graphgym/config.py).
133+
134+
**1.2 Launch an experiment.**
135+
For example, in [`run/run_single.sh`](run/run_single.sh):
136+
```bash
137+
python main.py --cfg configs/example.yaml --repeat 3
138+
```
139+
You can specify the number of different random seeds to repeat via `--repeat`.
140+
141+
**1.3 Understand the results.**
142+
Experimental results will be automatically saved in directory `run/results/${CONFIG_NAME}/`;
143+
in the example above, it is `run/results/example/`.
144+
Results for different random seeds will be saved in different subdirectories, such as `run/results/example/2`.
145+
The aggregated results over all the random seeds are *automatically* generated into `run/results/example/agg`,
146+
including the mean and standard deviation `_std` for each metric.
147+
Train/val/test results are further saved into subdirectories, such as `run/results/example/agg/val`; here,
148+
`stats.json` stores the results after each epoch aggregated across random seeds,
149+
`best.json` stores the results at *the epoch with the highest validation accuracy*.
150+
151+
### 2 Run a batch of GNN experiments
152+
A full example is specified in [`run/run_batch.sh`](run/run_batch.sh).
153+
154+
**2.1 Specify a base file.**
155+
GraphGym supports running a batch of experiments.
156+
To start, a user needs to select a base architecture `--config`.
157+
The batch of experiments will be created by perturbing certain configurations of the base architecture.
158+
159+
**2.2 (Optional) Specify a base file for computational budget.**
160+
Additionally, GraphGym allows a user to select a base architecture to *control the computational budget* for the grid search, `--config_budget`.
161+
The computational budget is currently measured by the number of trainable parameters; the control is achieved by auto-adjust
162+
the hidden dimension size for GNN.
163+
If no `--config_budget` is provided, GraphGym will not control the computational budget.
164+
165+
**2.3 Specify a grid file.**
166+
A grid file describes how to perturb the base file, in order to generate the batch of the experiments.
167+
For example, the base file could specify an experiment of 3-layer GCN for Cora node classification.
168+
Then, the grid file specifies how to perturb the experiment along different dimension, such as number of layers,
169+
model architecture, dataset, level of task, etc.
170+
171+
172+
**2.4 Generate config files for the batch of experiments,** based on the information specified above.
173+
For example, in [`run/run_batch.sh`](run/run_batch.sh):
174+
```bash
175+
python configs_gen.py --config configs/${DIR}/${CONFIG}.yaml \
176+
--config_budget configs/${DIR}/${CONFIG}.yaml \
177+
--grid grids/${DIR}/${GRID}.txt \
178+
--out_dir configs
179+
```
180+
181+
**2.5 Launch the batch of experiments.**
182+
For example, in [`run/run_batch.sh`](run/run_batch.sh):
183+
```bash
184+
bash parallel.sh configs/${CONFIG}_grid_${GRID} $REPEAT $MAX_JOBS $SLEEP
185+
```
186+
Each experiment will be repeated for `$REPEAT` times.
187+
We implemented a queue system to sequentially launch all the jobs, with `$MAX_JOBS` concurrent jobs running at the same time.
188+
In practice, our system works great when handling thousands of jobs.
189+
190+
**2.6 Understand the results.**
191+
Experimental results will be automatically saved in directory `run/results/${CONFIG_NAME}_grid_${GRID_NAME}/`;
192+
in the example above, it is `run/results/example_grid_example/`.
193+
After running each experiment, GraphGym additionally automatically averages across different models, saved in
194+
`run/results/example_grid_example/agg`.
195+
There, `val.csv` represents validation accuracy for each model configuration at the *final* epoch,
196+
and `val_best.csv` represents the results at the epoch with the highest validation error.
197+
When test set split is provided, `test.csv` represents test accuracy for each model configuration at the *final* epoch,
198+
and `test_best.csv` represents the results at the epoch with the highest validation error.
199+
200+
201+
202+
203+
204+
### 3 Analyze the results
205+
We provides a handy tool to automatically provide an overview of a batch of experiments in
206+
[`analysis/example.ipynb`](analysis/example.ipynb).
207+
```bash
208+
cd analysis
209+
jupyter notebook
210+
example.ipynb # automatically provide an overview of a batch of experiments
211+
```
212+
213+
214+
215+
### 4 User customization
216+
A highlight of GraphGym is that it allows users to easily register their customized modules.
217+
The supported customized modules are provided in directory [`graphgym/contrib/`](graphgym/contrib/), including:
218+
- Activation [`graphgym/contrib/act/`](graphgym/contrib/act/),
219+
- Configuration [`graphgym/contrib/config/`](graphgym/contrib/config/),
220+
- Feature augmentation [`graphgym/contrib/feature_augment/`](graphgym/contrib/feature_augment/),
221+
- Feature encoder [`graphgym/contrib/feature_encoder/`](graphgym/contrib/feature_encoder/),
222+
- GNN head [`graphgym/contrib/head/`](graphgym/contrib/head/),
223+
- GNN layer [`graphgym/contrib/layer/`](graphgym/contrib/layer/),
224+
- Data loader [`graphgym/contrib/loader/`](graphgym/contrib/loader/),
225+
- Loss function [`graphgym/contrib/loss/`](graphgym/contrib/loss/),
226+
- GNN network [`graphgym/contrib/network/`](graphgym/contrib/network/),
227+
- Optimizer [`graphgym/contrib/optimizer/`](graphgym/contrib/optimizer/),
228+
- GNN global pooling (graph classification only) [`graphgym/contrib/pooling/`](graphgym/contrib/pooling/),
229+
- GNN stage [`graphgym/contrib/stage/`](graphgym/contrib/stage/),
230+
- Data transformations [`graphgym/contrib/transform/`](graphgym/contrib/transform/).
231+
232+
Within each directory, (at least) an example is provided, showing how to register user customized modules.
233+
Note that new user customized modules may result in new configurations; in these cases, new configuration fields
234+
should be registered at [`graphgym/contrib/config/`](graphgym/contrib/config/).
235+
236+
**Note: Applying to your own datasets.**
237+
A common use case will be applying GraphGym to your favorite datasets.
238+
To do so, you may follow our example in [`graphgym/contrib/loader/example.py`](graphgym/contrib/loader/example.py).
239+
To provide more flexibility, GraphGym currently accepts a list of [NetworkX](https://networkx.org/documentation/stable/index.html) graphs
240+
or [DeepSNAP](https://github.com/snap-stanford/deepsnap) graphs as the input;
241+
the following attributes in the graphs will be auto_loaded and parsed `node_feature`, `node_label`, `edge_feature`, `edge_label`,
242+
`graph_feature`, `graph_label`.
243+
Additionally, we have provided examples on how to transform [PyG](https://pytorch-geometric.readthedocs.io/en/latest/) datasets into the accepted format.
244+
Further details on the data representation is described in [DeepSNAP documentation](https://snap.stanford.edu/deepsnap/notes/introduction.html#graph-in-deepsnap).
245+
246+
247+
248+
## Use case: Design Space for Graph Neural Networks (NeurIPS 2020 Spotlight)
249+
250+
Reproducing experiments in *[Design Space for Graph Neural Networks](https://papers.nips.cc/paper/2020/file/c5c3d4fe6b2cc463c7d7ecba17cc9de7-Paper.pdf)*, Jiaxuan You, Rex Ying, Jure Leskovec, **NeurIPS 2020 Spotlight**.
251+
252+
```bash
253+
# NOTE: We include the raw results with GraphGym
254+
# If you run the following code, the results will be overridden.
255+
cd run
256+
bash run_design_round1.sh # first round experiments, on a design space of 315K GNN designs
257+
bash run_design_round2.sh # second round experiments, on a design space of 96 GNN designs
258+
cd ../analysis
259+
jupyter notebook
260+
design_space.ipynb # reproducing all the analyses in the paper
261+
```
262+
263+
264+
## Contributors
265+
[Jiaxuan You](https://cs.stanford.edu/~jiaxuan/) initiates the project and majorly contributes to the entire GraphGym platform.
266+
[Rex Ying](https://cs.stanford.edu/people/rexy/) contributes to the feature augmentation modules.
267+
Jonathan Gomes Selman enables GraphGym to have OGB support.
268+
269+
GraphGym is inspired by the framework of [pycls](https://github.com/facebookresearch/pycls).
270+
GraphGym adopt [DeepSNAP](https://github.com/snap-stanford/deepsnap) as the data representation, which is a Python library that assists efficient deep learning on graphs.
271+
Part of GraphGym relies on [Pytorch Geometric](https://github.com/rusty1s/pytorch_geometric) functionalities.
272+
273+
## Contributing
274+
275+
We warmly welcome the community to contribute to GraphGym.
276+
GraphGym is particularly designed to enable contribution / customization in a simple way.
277+
For example, you may contribute your modules to [`graphgym/contrib/`](graphgym/contrib/) by creating pull requests.
278+
279+
## Citing our paper
280+
If you find GraphGym or our paper useful, please cite our paper:
281+
```
282+
@InProceedings{you2020design,
283+
title = {Design Space for Graph Neural Networks},
284+
author = {You, Jiaxuan and Ying, Rex and Leskovec, Jure},
285+
booktitle = {NeurIPS},
286+
year = {2020}
287+
}
288+
```

analysis/LICENSE

+19
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
Copyright (c) 2020 Jiaxuan You
2+
3+
Permission is hereby granted, free of charge, to any person obtaining a copy
4+
of this software and associated documentation files (the "Software"), to deal
5+
in the Software without restriction, including without limitation the rights
6+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7+
copies of the Software, and to permit persons to whom the Software is
8+
furnished to do so, subject to the following conditions:
9+
10+
The above copyright notice and this permission notice shall be included in
11+
all copies or substantial portions of the Software.
12+
13+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19+
THE SOFTWARE.

analysis/design_space.ipynb

+2,687
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)