github

Github and Code Repositories

Introduction
Opening a GitHub Account & Setting Up Git
- Making Your Email Public
Repositories
Teams
- Individual Collaborators
General Guidelines

Introduction

All code generated at SPIRAL should be at a quality level that allows for worldwide dissemination. As a member of SPIRAL, you will not be writing code for yourself; you will be working as part of a team. Consequently, it is expected that you follow the coding practices and standards of the group. You must learn quickly from others and teach others if you have something useful and interesting to contribute. All project codes must be maintained in a version controlled repository. Our aim is to collectively grow our project specific code collections and generic tool libraries, for the purpose of producing reproducible scientific results.

If another reasonably knowledgeable and capable person cannot (1) use your code and replicate your results, (2) make modifications to your code to allow for its application to alternative problems and datasets, your code is useless. Always strive to learn how to write better code and how to improve our collective product. All codes will be disseminated to the public over the internet with appropriate credit to all contributors. These can (1) improve your impact on science and society, (2) showcase your work and capabilities to potential future employers.

Opening a GitHub Account and Setting Up GIT

Open a GitHub account at https://github.com using your ECE.NEU.EDU email (at any time, you can associate more email addresses with your account, however having the ECE email as primary makes it easier for other SPIRAL members to invite you to join repositories – use your ece email address).
Get invited to the neu-spiral organization. You need an invitation to join; ask your advisor if you have not already received one. Doing so will give you access to SPIRAL resources, including this wiki.
Read the GitHub documentation.
Install git, and learn how to use it: see our git guide.
Remember to assign your full name (First Last) and ECE email address correctly in any GIT software you use. When you commit a code to a repository these credentials are used to show who committed the code.

Using Git

See the Git Mini-Tutorial on how basic Git commands and operations.

Making Your Email Public

When you sign up to github, your email is private by default. This means that people cannot search for you using your email, nor add you to a repository or an organization using your email. For that reason, you need to provide your username, not your email, to your advisor to sign-up to the neu-spiral organization. To make your email public, go to your email settings and unclick the box Keep my email address private.

Repositories

New & Imported Repositories

Follow these instructions to create a new git repository. Make sure that you assign it to the neu-spiral organization and to set it to private. If you do not import it to neu-spiral, it may automatically become public.

Please make sure to follow these SPIRAL repository naming conventions when creating new repositories:

Use capitalized letters for acronyms and first letters of each word.
Never use connectors like - or _ (despite what SPIRAL-Handbook does, which predates the establishment of these conventions).
Do not change the names of existing repos, if the link has been disseminated in public products, such as papers.

Repositories from bitbucket, gitlab, or elsewhere, can be seamlessly transferred to github. Instructions on how to import a repository from, e.g., bitbucket, are here. Make sure that you import the project to neu-spiral, as opposed to your personal account, and that you make the repository private. Once a repository has been imported, it is very easy to change any local copies (e.g., on your laptop or on discovery) to make them point to GitHub rather than, e.g., bitbucket. Instructions on how to make local repo copy point to github rather than bitbucket are here.

Once you have created a repository you may want to invite team members to collaborate with you, or invite external collaborators. Finally, once a project is good to be shared with the outside world, you may want to make it public.

Repository Ownership

Always create repositories under the neu-spiral organization. It is imperative that you do so, to ensure that (a) SPIRAL maintains access to your code in perpetuity, and (b) your repositories remain private.

Access to Repositories

Depending on which teams you belong to, you may have access to multiple repositories. You should be able to see them immediately when you log in to GitHub-you can also use the search function.

Making a Repository Public

Once you have published a paper, you may want to make the code you used for this paper public. Instructions on how to make a repository public can be found here. Before doing so, make sure that it is released under the right software license, and that your code is well-documented; in particular, your repository should contain a README.md file, written in markdown language.

Teams

The neu-spiral organization has a hierarchical structure, and all its members are organized in teams. Everyone belongs to the spiral team. You should also belong to your lab/group team. If you are participating in a project, you may also belong to the team that is responsible for maintaining this project. You may want to create a new team; instructions on how to create a team can be found here.

Once you create a repository, it is by default only viewable and editable by you. You may want to make it available to one or more teams. Instructions on how to give access to a team to your repository can be found here.

The recommended behavior for how to share SPIRAL projects with teams is as follows:

If your repository belongs to a project, give write permissions to the entire team of that project.
In addition, give read permissions to any SPIRAL groups may in any way related or affiliated with the project.
In addition, give appropriate (read or write) permissions to any external collaborators.

Example: Suppose that you created new repository named ASSIST, containing the code for the NSF-funded project ASSIST. This project includes SPIRAL members from the CSL, ML, and DNAL labs, and a few external collaborators. The repository for should then be shared with write permissions with the entire ASSIST team. It should also be shared with read permissions withe CSL, ML, and DNAL teams. Finally, external collaborators should be invited as individual collaborators.

Individual Collaborators

Individual collaborators (either internal or external) can also be added to a project. If you intend to add a lot of internal collaborators, you might as well create a new team. Instructions on how to add individual collaborators (either internal or external) can be found here.

General Guidelines

Code Sharing

Each project, based on its needs, will use other repositories as sub-modules. Our aim is to generate a growing collection of shared repositories where useful code libraries exist. There are several benefits to spending the energy for this seemingly cumbersome, but extremely useful effort:

Standardization and sharing of general purpose libraries minimize coding time and effort by group members. At steady state, everyone in the lab benefits significantly from not only the time-saving nature of being able to access readily available libraries, but also from the coding styles used by previous group members.
Formation of well-coordinated, carefully prepared, and rigorously tested libraries will allow us to share these products with the broader scientific community, which will impact positively our citations and reputation. Furthermore, when seeking for your next job, you will be in a position to provide code samples from these published and publicly available libraries.

For these reasons, if a piece of code you are writing for your project has the potential to be of broader utility, talk to other people and consider writing your code/class/function to be included in the shared repositories consisting of useful libraries. Use others’ codes, and contribute to the shared pool. Study others’ codes and learn from their good practices (and if applicable, notice and fix their mistakes and poor implementations).

Code Documentation

Every logical unit of code in your modular program must be extremely clearly commented with a detailed documentation attached to it. The repositories allow for version control of automatically generated documentation and Matlab (and other languages) have tools for automatic publishing of documentation from comments embedded in the header of each function. Every code and its documentation will be reviewed by others (peer-review) in a process handled by the project admins, before being admitted to for general use by others in the project/group/lab/world.

Repository Organization

Each project might have subprojects that may or may not share code with each other. Since creating a separate repository for each little variation within a major project makes organization and access harder, we would like to keep repositories named after major concepts and subprojects will be organized as subdirectories in the repository, as appropriate. In each subproject, there will be code or several codes (in separate sub-sub-directories) in development. These codes should all be developed and pushed to the neu-spiral organization for the main project and for each code, when an operational (complete) version is available, it should be copied (pushed) to a separate directory with a version number and should be copied over to the project code directory.

Project codes should strive to use shared libraries and available code in other projects as much as possible – do not reinvent the wheel. Project leaders are ultimately responsible for repository organization, however, coordination between project leaders to provide a uniform look across repositories, especially at the higher scales of the organizational hierarchy, will make it easy to navigate between several repositories for members that are involved in more than one project (which is everyone).

Note: If you run your code on Discovery Cluster, especially in batch mode, you must be using SLURM code files to run your code. Do not upload these files to GitHub and share with your collaborators / public. Not everybody has to install SLURM to use/run your code. Provide the code and instructions for running it for the language in which the code is written.

Push Code That Works and Push Frequently

Never push a code/module/function/script with errors to any repository. When you push a code it is assumed that you have a working version of the code. A code with errors is not a working version! In this statement, a code may refer to a logical functional unit within your overall framework, so code verification must be applied in a multiscale fashion. Do not wait to push until the entire project code is “working”. Push as intermediate modules are verified and work as they are supposed to.

You can adjust the scale at which you define “working code”, but a 2-line function that does a simple task is working if it is written properly and behaves as desired given its own test, regardless of whether or not the rest of the project code is ready and operational. At least one push per day and probably one or more pushes per hour are appropriate. The purpose is the develop code in a shared environment. You need to share your code with team members by pushing – don’t keep it all to yourself.

Speaking of tests, each function/class/code unit should have a test associated with it. Future revisions for this code unit must pass the original test proposed by the first author. The test should aim to confirm that the replacement is (1) backward compatible, (2) generates identical results on several operational conditions. For shared libraries, the test results for replacement code must be submitted as part of the code report that will be reviewed by reviewers to be determined by the admins of the repository. Only admins can push to libraries. We will describe these issues later in more detail.

It is important to always have the default/master/main branch working (meaning anyone who can clone the repository, will be able to run the code smoothly, following the instructions, at anytime), therefore a good habit is to create a branch from main for every functionality of the code which is in progress. Once the feature is done (developed and tested) you can submit a pull request to merge with the main branch. This way teammates can work on the same codebase while not conflicting with each other's work in progress and once they are done, all features of the project can be merged with the main branch (after resolving possible conflicts). For more information please read this.

Commit/Push with Comments

Write a clear and reasonably detailed explanation of what has changed from the previous version when you commit. One should be able to follow the changes just by looking at the commit messages. You can find good examples on how to write commit messages by searching about it in Google. Also recall that you should write comments embedded in the code for later publication as an html document. Make sure that as you commit changes to the code, changes to the comments are also made and committed simultaneously.

Document All Code

Follow documentation protocols required for extensive code documentation as emphasized before. Each code should be accompanied by a reviewed and published documentation (also stored in GitHub with GIT version control in markdown or other suitable format). This documentation is not a user manual, rather, it is a detailed record of how this particular code came about starting from a theoretical construct and following step-by-step methodological procedures used to obtain the particular lines of code written.

Major working versions of codes pushed to SPIRAL repositories should be copied to a separate directory with an identifying version number, should have reviewed and approved documentations, as specified by the SPIRAL technical reporting rules, and should be copied in their entirety to the relevant Project\Code\Versions directory for archiving, dissemination, and easy access. When such a version is generated, a technical report that wraps around the code documentation should also be prepared.

Write High Quality Code Worthy of Dissemination to the World

Don’t be lazy… write good code! If you think you write good code, ask 2 other people to review your code – especially senior ones in the group. With some small probability you are correct in your self-assessment. Write your code as modularly as possible. Use variable names that are descriptive; there are at least two ways: (i) use long/descriptive variable names, (ii) use variable names that mimic equations and math notation that appear in some technical report and code documentation. This way users and future developers can understand and trace your code faster. YOU ARE NOT WRITING CODE ONLY FOR YOURSELF! Your codes must be valuable contributions to the group and to the world (see dissemination scales mentioned earlier).

Poorly written code will be rejected and you can be certain that you will be asked to rewrite the entire thing (and by a tight deadline). Therefore, do it right from the beginning. Try to keep project/data specific details separate from generic tools and functions, so that it is easy to use existing libraries or easy for you to contribute to libraries with new tools. Use object oriented programming (even in Matlab) as much as possible (note that you can still write bad code with object oriented strategy – stay modular, keep objects, functions, classes, etc. simple… one logical item for one conceptual task). See existing code repositories to learn from (or to improve and fix if you spot errors and inefficiencies). All shared code must be backward compatible – make your revisions on existing codes carefully! When writing functions, think about future modifications and make it easy for the authors of revised versions to remain backward compatible.

Back to main page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly