All in One View

Last updated on 2026-02-25 | Edit this page

Estimated time: 15 minutes

Background

Writing software is an integral component of the research activities of many astronomers. Increasingly, such software is written in collaborative efforts involving multiple researchers across multiple institutions. Despite this, training for astronomy students and early career researchers in best practices for collaborative software development is often informal and fails to make use of software industry knowledge. The goal of this course is to fill this training gap.

In this course you will be guided through the development of a software package, beginning at proof of concept, and finishing with a project that is findable, accessible, interoperable, reusable (FAIR), and citable by others. This course is presented in lessons which represent development cycles. Each cycle will be another step along the path of making code that you are proud of and can be recognised for.

You can engage with this course as a reading assignment but we recommend that you follow along at home by applying all the lessons learned to your own software project. This course is designed with collaborative software development in mind. Throughout the course there will be self paced ungraded activities to complete that will be marked as either SOLO or GROUP activities. If you are joining this course as part of a group wishing to improve your collaborative software development skills then you should aim to complete all the activities. If you are joining this course on your own then some of the group activities will not be relevant to you and you can skip them as appropriate.

At the beginning of this course you will be taken through a typical research driven development phase. In this phase we will create a proof of concept code which we then build on throughout the course. This project is extremely simple in scope as it intended to be a minimum working example that we can use to demonstrate various development techniques. Many of the examples in this course will refer to the example code. If you have an existing software project we recommend that you use that project when doing the various activities as you will encounter more issues, learn more techniques, and the solutions to the activities will be directly usable and beneficial to you.

Assumed knowledge and software requirements

This course assumes that you have basic proficiency in python. While python is the language that we’ll be working with primarily, most of the lessons are applicable to any language.

A large part of working in a group will involve keeping track of changes to a shared code base. For this we will be using the git version control system, and in particular we will be using GitHub as the remote repository.

Software requirements:

Python 3.8 +
A integrated development environment (IDE) or text editor of choice
- We recommend PyCharm or Visual Studio Code
An operating system which gives you a proper command line
For windows this means using something like gitbash, WSL, or the Anaconda prompt
For OSX or Linux the regular terminal will be fine
git, either from the command line, using a git desktop app, or as an add on to PyCharm or VSCode.
An account on GitHub is required for some of the activities.

Content from Coding Up A Proof Of Concept

Last updated on 2026-02-24 | Edit this page

Estimated time: 110 minutes

Overview

Questions

Where/How do I start a new project?
How do I use a function?
How do I document and test my code?
Can I make a command line interface for my code?

Objectives

Build a minimal working or “proof of concept” project that we can build on in this workshop

Beginning a new software project

Before we do anything to our project we should talk about:

Organisation

Organisation is key to a good project. Every time you start a new project or explore a new idea it is a good idea to create a new space for that project. This means creating a new directory for you to collect all the relevant data, software, and documentation. You will be involved in many projects through your career and often will have to manage multiple projects simultaneously. It is therefore not just a good idea to organise each project, but to have a consistent organisation structure between projects. In this section we will make some recommendations for organising a software project.

Put each project in its own directory, which is named after the project.

The location of this directory will depend on a higher level organisation scheme. For example you may separate your projects based on funding, based on collaboration, or based on research area.

Within you software project directory we recommend the following structure:

OUTPUT

 .
 ├── README.md          <- Description of this project
 ├── bin                <- Your compiled code can be stored here (not tracked by git)
 ├── config             <- Configuration files, e.g., for doxygen or for your model if needed
 ├── data
 │   ├── external       <- Data from third party sources.
 │   ├── interim        <- Intermediate data that has been transformed.
 │   ├── processed      <- The final data sets for analysis.
 │   └── raw            <- The original, immutable data dump.
 ├── docs               <- Documentation, e.g., doxygen or reference papers (not tracked by git)
 ├── notebooks          <- Jupyter or R notebooks
 ├── reports            <- For a manuscript source, e.g., LaTeX, Markdown, etc., or any project reports
 │   └── figures        <- Figures for the manuscript or reports
 └── src                <- Source code for this project
     ├── external       <- Any external source code, e.g., pull other git projects libraries
     └── tools          <- Any helper scripts go here

Setting up an empty structure such as the above can be done either by making an template and then copying that every time you start a new project. Additionally there are python packages such as cookiecutter (pypi, rtfd, github) that can automate this process for you, and offer a range of templates to work with.

Name all files to reflect their content or function.

It is also convenient to use a consistent and descriptive naming format for all your files and sub-folders. For example, use names such as galaxy_count_table.csv, manuscript.md, or light_curve_analysis.py. Do not using sequential numbers (e.g., result1.csv, result2.csv) or a location in a final manuscript (e.g., fig_3_a.png), since those numbers will almost certainly change as the project evolves (and are meaningless on their own).

Starting our project

Now we are ready to actually start doing something with our project.

Project evolution

In the typical project cycle for an astronomer or research software engineer (or RSE, a formal name for people who combine professional software expertise with an understanding of research), you will not sit down and have a detailed discussion about what the project is, where it needs to go, what the user stories and milestones are, and who will be involved. Usually research evolves organically through informal discussions with colleagues, or a sudden thought in the shower. Similarly our software projects evolve in an organic manner, often beginning with a small script of function to do just this one thing, which then over time gets used, reused, augmented, shared, and thus evolves into a software project. This evolution of ideas and code does not fit will with much of the more formal structures that professional software developers adhere to, and so we will not try to fit our projects to such a scheme. Instead we will create a path for our software that will be suited to our work style, but which draws on the knowledge and experience of professional software developers. Thus we will begin with a proof of concept code – a short bit of work that proves that something works using the minimal amount of effort.

Example project

The example project that we will work with will involve tasks that are familiar to many astronomers. The work that is being done is mostly for demonstration purposes – existing libraries will be able to do this task faster and easier that we will. The point of this example project is not the content, but the methodology that we use as we pass through various cycles of development.

Example project: an astronomy catalogue simulator

The example project that we will be developing will simulate a catalogue of foreground stars in the direction of the Andromeda galaxy. The initial requirements are as follows:

Stars should have randomised sky positions around the Andromeda galaxy Positions should fall within 1 degree of the central location Each star should have a unique ID The star ID and position should be saved in a csv file to be analysed by other programs This program is intended to be used by the developer and their research group which includes people who are not proficient python programmers.

It is intended that the software will grow in capability and complexity only as needed to support a current research project.

With this in mind we move to the first stage of our software project – the proof of concept.

Discussion

SOLO Activity: Project structure

Consider the astronomy catalogue simulator project mentioned in the previous lesson.

Come up with a planned project structure that would be appropriate for this project. How much of the structure would change if you were to use a different language (e.g. Python vs C)? The scope of work is quite modest and could be achieved with a flat file structure or a single file. Which statements in the previous lesson suggest that a good project structure will be beneficial to future work?

Proof of concept code

The first iteration of our code is shown below. It was written in a stream-of-conscious mode with a focus on getting a minimum working example that proves that the work can be done.

PYTHON


#! /usr/bin/env python
# Demonstrate that we can simulate a catalogue of stars on the sky

# Determine Andromeda location in ra/dec degrees
import numpy as np
import math
# from wikipedia
ra = '00:42:44.3'
dec = '41:16:09'

d, m, s = dec.split(':')
dec = int(d)+int(m)/60+float(s)/3600

h, m, s = ra.split(':')
ra = 15*(int(h)+int(m)/60+float(s)/3600)
ra = ra/math.cos(dec*math.pi/180)

# make 1000 stars within 1 degree of Andromeda
ra_offsets = np.random.uniform(-1, 1, size=1000)
dec_offsets = np.random.uniform(-1, 1, size=1000)

ras = ra + ra_offsets
decs = dec + dec_offsets

# now write these to a csv file for use by my other program
with open('catalog.csv', 'w') as f:
    print("id,ra,dec", file=f)
    for i in range(1000):
        print("{0}, {1:7.4f}, {2:7.4f}".format(i, ras[i], decs[i]), file=f)

The above code was saved as sim.py. When run from the command line it produces a file (catalog.csv) with a header, one thousand rows, and columns showing the id/ra/dec of the simulated points on the sky. Right now the code runs without error and at first glance does what we need.

If we plot the ra/dec locations of the catalog we get the following output:

Sky plot of the catalog that comes from the sim.py program.

Making code reusable via functions

Now that we have proved to ourselves that it’s possible to generate a catalog, we want to tidy up the code a little so that it can be reused and adapted easily.

At the moment our intended audience is ourselves, so we are content with keeping all the relevant information within the code itself. As a first step we create new variables that record the target location (ref_ra/ref_dec), the number of stars to generate (nsources), and how far from the central location the new positions should be generated (radius). This is done using global variables at the top of our file like this:

PYTHON

#! /usr/bin/env python
# Demonstrate that we can simulate a catalogue of stars on the sky

# Determine Andromeda location in ra/dec degrees
import numpy as np
import math

# configuration using global variables
nsources = 1000
# from wikipedia
ref_ra = '00:42:44.3'
ref_dec = '41:16:09'
radius = 1

From here we need to make a few changes to our code so that these variables are now used in place of the previously hard-coded values.

We then take all the code that is part of the position generation stage and bundle it all together into a function which we call generate_positions. This allows us to keep that part of the code separate from the file writing stage. In fact while we are at it we should make a function for doing the file writing. Lets call it write_file. While we are writing these functions we can use python docstrings to document the intent of each function.

PYTHON

def generate_positions():
    """
    Create `nsources` random locations within `radius` degrees of the reference `ref_ra`/`ref_dec`.

    Returns
    -------
    ra, dec : numpy.array
        Arrays of ra and dec coordinates in degrees.
    """

    # convert DMS -> degrees
    d, m, s = ref_dec.split(':')
    dec = int(d)+int(m)/60+float(s)/3600

    # convert HMS -> degrees
    h, m, s = ref_ra.split(':')
    ra = 15*(int(h)+int(m)/60+float(s)/3600)
    ra = ra/math.cos(dec*math.pi/180)  # don't forget projection effects

    ra_offsets = np.random.uniform(-1*radius, radius, size=nsources)
    dec_offsets = np.random.uniform(-1*radius, radius, size=nsources)

    ras = ra + ra_offsets
    decs = dec + dec_offsets
    return ras, decs


def write_file(ras, decs):
    """
    Write the ra/dec catalog to a file, and include a header and IDs.
    """
    with open('catalog.csv', 'w') as f:
        # creat a header row
        print("id,ra,dec", file=f)
        for i in range(nsources):
            # use a csv format
            print("{0}, {1:7.4f}, {2:7.4f}".format(i, ras[i], decs[i]), file=f)
    return

Note that the first function takes no parameters, while the second is designed to take two lists of positions as input. By breaking our code into functional blocks we have a number of advantages:

We can reuse a block of code by calling the function multiple times, and don’t have to bother with duplicated code. Duplicated code means duplicated bugs! The code within each function will not interfere with code in other functions, allowing us to reuse names of variables, and for unused variables and memory to be deleted upon exiting the function. We can document each function separately using a docstring to describe the intent of the code. This is in addition to the inline comments that we have used.

Our script can now be finished with a few more lines of code.

PYTHON

# Do the work
ras, decs = generate_positions()
write_file(ras, decs)

As a developer the code is now separated into functional parts, so if something goes wrong or needs changing, we know where to look to make those changes. As a user of the code we can open the file and read the first few lines to see what the default parameters are, and modify them if we choose. Additionally, as a user we can read the names and docstrings of the functions to understand what the code is doing rather than having to read the code itself.

Now we have a piece of code that is easier to use. However, this code is only really usable as is, and requires people to read/edit the source code to understand how it works and adapt it for their use. In the next section we’ll see how to further generalise our code by using more function parameters, and adding a command line interface.

Creating a command line interface

In the previous lesson we saw how we could use functions to separate different tasks, and docstrings to describe the behaviour of these functions. In this lesson we’ll generalise our code even further by removing the global variables, adding functions parameters, and adding a command line interface.

The first thing that we will do is remove the global variables, and have these values passed to the functions directly. This will mean that someone reading the code for a function doesn’t have to hunt through the rest of the code to figure out what the global variables are.

For the generate_positions function we can use parameters with the same name as the previously existing global variables, and we can set their default values to match. We have also taken the opportunity to update the docstring so that we know what types of parameters should be passed and what they are for.

PYTHON

def generate_positions(ref_ra='00:42:44.3',
                       ref_dec='41:16:09',
                       radius=1.,
                       nsources=1000):
    """
    Create nsources random locations within radius of the reference position.

    Parameters
    ----------
    ref_ra, ref_dec : str
        Reference position in "HH:MM:SS.S"/"DD:MM:SS.S" format.
        Default position is Andromeda galaxy.

    radius : float
        The radius within which to generate positions. Default = 1.

    nsources : int
        The number of positions to generate

    Returns
    -------
    ra, dec : numpy.array
        Arrays of ra and dec coordinates in degrees.
    """
    ...
    return

For the write_file function, we already had two parameters that needed to be passed, but now we’ll add another which is the output file name. This wasn’t part of the global variables, but it was a hard coded file that we might want to change as we create multiple catalogs. Note that this function used to use the nsources global variable to know how many ra/dec values were passed. In this revised version we just look at the length of the ras list instead.

PYTHON

def write_file(ras, decs,
               outfile='catalog.csv'):
    """
    Write the ra/dec catalog to a file, and include a header and IDs.

    Parameters
    ----------
    ras, decs : list, numpy.array, or any iterable
        Iterable of ra and dec coordinates. The length of these need to match.

    outfile : str
        Path/filename for the output file. (Overwite=True)
    """
    with open(outfile, 'w') as f:
        # creat a header row
        print("id,ra,dec", file=f)
        for i in range(len(ras)):
            # use a csv format
            print("{0}, {1:7.4f}, {2:7.4f}".format(i, ras[i], decs[i]), file=f)
    return

By moving the global variables into the parameters of the functions, and making use of the default values we don’t need to make any further changes to our code. The following block is unchanged.

PYTHON

# Do the work
ras, decs = generate_positions()
write_file(ras, decs)

One advantage to our changes is that we could now generate a different catalog by adjusting these last two lines like this:

PYTHON

ras, decs = generate_positions(radius=2.)
write_file(ras,decs, outfile='catalog_2deg.csv')

We will leverage this ability in order to make a command line interface. The recommended way to do this is to use the argparse module, and an if __name__ clause as follows.

PYTHON

if __name__ == '__main__':
    # Set up the parser with all the options that you want
    parser = argparse.ArgumentParser(prog='sim')
    group1 = parser.add_argument_group()
    group1.add_argument('--ref_ra', dest='ref_ra', type=str, default='00:42:44.3',
                        help='Central/reference RA position HH:MM:SS.S format')
    group1.add_argument('--ref_dec', dest='ref_dec', type=str, default='41:16:09',
                        help='Central/reference Dec position DD:MM:SS.S format')
    group1.add_argument('--radius', dest='radius', type=float, default=1.,
                        help='radius within which the new positions are generated (deg)')
    group1.add_argument('--n', dest='nsources', type=int, default=1_000,
                        help='Number of positions to generate')
    group1.add_argument('--out', dest='outfile', type=str, default='catalog.csv',
                        help='Filename for saving output (csv format)')

    # parse the command line input
    options = parser.parse_args()

Here we have created a single group of arguments called group1, and then we add five different arguments to that group. Note that we provide both a call signature (--out) for specifying a parameter, as well as the name/type/default value of that parameter (dest/type/default). The help is optional but highly recommended. If we were to run the code above with the --help option then we would get the following output:

OUTPUT

$> python sim.py --help
usage: sim [-h] [--ref_ra REF_RA] [--ref_dec REF_DEC] [--radius RADIUS] [--n NSOURCES] [--out OUTFILE]

optional arguments:
  -h, --help         show this help message and exit

  --ref_ra REF_RA    Central/reference RA position HH:MM:SS.S format
  --ref_dec REF_DEC  Central/reference Dec position DD:MM:SS.S format
  --radius RADIUS    radius within which the new positions are generated (deg)
  --n NSOURCES       Number of positions to generate
  --out OUTFILE      Filename for saving output (csv format)

Right away we have a way for people to understand how to use the program without having to open the source code. This will include you, two days from now, when you forgot some of the details of the code.

Finally, we can connect the user input to the program by using the options object. Each of the parameters that were read in with a dest=thing can be accessed using options.thing. If we specified a type then argparse will make sure that users don’t give input that can’t be converted to that type. The final part of our code now looks like this:

PYTHON

    # parse the command line input
    options = parser.parse_args()

    ras, decs = generate_positions(ref_ra=options.ref_ra,
                                   ref_dec=options.ref_dec,
                                   radius=options.radius,
                                   nsources=options.nsources)
    write_file(ras, decs, outfile=options.outfile)

Since we specified default values for all of the inputs, we can run python sim.py and it will run with the default values.

Creating a simple testing script

We now have a script with a command line interface that allows people to use the code without having to read the source. At this point we are feeling rather professional and are happy to share the code with others. One of the things that is often scary is the thought of others finding bugs in our code and then having to fix these bugs. A good way to alleviate these fears is to do some testing of the code ourselves to catch all the obvious errors before the script is sent to others. In a later lesson we’ll go through some more formal and rigorous testing, but for now we are going to start simple with some of the most basic tests.

In our file test_sim.sh we will make three tests. The first two tests we are going to do will just ensure that the code will not immediately crash when we run it:

BASH

#! /usr/bin/env bash

echo "Testing sim.py"
python sim.py || { echo "FAILED to run with default parameters"; exit 1 ;}
python sim.py --help || { echo "FAILED to print help"; exit 1 ;}

The next test we do will ensure that when we pass a filename to the --out option, we will get a new file with that name. We use the bash [ ] test and -f conditional to see that the file exists.

BASH

python sim.py --out test.csv
if [ ! -f "test.csv" ]; then
    echo "FAILED to generate ouput test.csv"
    exit 1
fi

echo "all tests PASSED"
exit 0

The last two lines above then make sure that when the tests all pass we get a positive message and return an exit status of 0.

We now have a test script that will catch the most egregious issues with our script. Any time we make changes to our script we should run the test script just to make sure that we haven’t seriously broken anything.

More advanced testing with bash is not explored here as we will move to a python based test environment in a later lesson. The python based testing will make it easier to do things like make sure the right number of entries exist in the catalogue file and that the positions all lie within the specified region.

Summary

In this first development cycle we have:

Demonstrated that we can simulate a catalogue of positions on the sky;
Written a function to do the simulation, and another to write the results to a file;
Created a command line interface to make the script easy to use and re-use; and
Made a simple test script to alert us when our changes break the code.

If you would like to see the status of the project at the end of this cycle you can look at the cycle1 branch on GitHub (at this link).

Content from Developing Software In A Team

Last updated on 2026-02-27 | Edit this page

Estimated time: 180 minutes

Overview

Questions

What are some pro/con of developing in a team?
What are some common team roles?
How can GitHub help managing tasks in a group project?

Objectives

Gain experience developing software in a team environment
Understand some common project management styles
Create a new GitHub repo and push our project
Experience a full development cycle with our test project

Benefits and pit-falls

Working on a software project as part of a team is a very different experience from working on a solo project. There are of course great advantages to having multiple people working on a problem, but there are also some pit-falls that need to be avoided, and overheads that cannot be avoided.

Benefits of working in a team

Each person has different areas of expertise
More person-hours available
Group discussion leads to better decision making
Group members can learn from each other faster than they can learn from a book/website
Seeing how others work and solve problems can help you learn new skills that you didn’t know you needed
Bug-fixing is easier when you have an informed buddy to talk to
Work can be assigned to people with the most relevant skills increasing efficiency
Dividing the development and testing of a work item between two people can make for better code and more complete tests

Pit-falls to avoid

Siloing work:
- Separating work items can be a good idea but there needs to be frequent check-ins to ensure that the software being developed by two people is not diverging or conflicting
Inconsistent standards:
- Even though people may have personal preferences or styles, documentation, commenting, and testing should be consistent across a code base.
Non-constructive criticism:
- When things work they just work and no one talks about it
- When things break everyone talks about it
- Given that issues result from negative experiences, it is important for team moral to always try and keep feedback as constructive as possible
Platform / dependency conflicts:
- There is no guarantee that all developers will be using the same operating system or development tools.
- Allowing people the freedom to choose is important but the differences need to be managed via an agreement on how/when/where testing takes place, what acceptance looks like, and the use of a style guide.
- Having some files with windows style end of line and others with linux style end of line can make it tedious to track changes

Overheads that cannot (should not) be avoided

Project management will take time, and may not be a skill that you have. Invest time in training. Communication and coordination take time but are vital to success Commit conflicts will arise even if everything is done right, these need to be discussed and handled properly rather than ignored The moral of the team is important to success and will take time and effort to maintain, but it is worth doing so that your project can see the benefits listed above.

Summary

Working in a team can be a hugely beneficial experience for all involved, however there are some negative features that need to be avoided or managed. In the next section we’ll explore how we can create and document project standards that will make working in a team much easier.

Communication and project management

Communication

Team work requires effective communication between team members. This helps to keep everyone up to date on the status of the project, the current and future direction of the work, and avoids duplication of effort. Email, slack, messenger, GitHub issues, chats over a coffee, or a formal weekly meeting, are all valid ways of keeping the teem together. A formal meeting once a month complimented by a weekly email to status check, and real time slack messaging for immediate questions may be a good solution for a 5 person team working on a year long project. An informal weekly chat over lunch with intermittent emails may be good for a 2 person team working on a project that lasts a few months. The most important thing is that the chosen method is effective and that it includes all team members.

If an informal or ephemeral communication medium is involved when coming to an agreement or making a decision, it is good practice to have a follow up communication to reiterate the decision and reasoning using a medium that can be archived. For example, if the team meets over a coffee, discusses a current issue, and decides on a way forward, it is a good idea for someone to follow this up with an email so that the details of the decision are not lost.

Project management styles

The two main project management styles that are applied to software development are waterfall and agile. Neither of these are a project management methodology but are more like umbrella terms that group a set of methodologies that share a common mindset. In the waterfall mindset there is a very linear approach to the design and execution and delivery of the project with the main focus being on the process. In the agile mindset the main focus is on outcomes and deliverables, with the design and execution and delivery occurring in cycles.

Waterfall

In a waterfall project the whole life cycle of a project is mapped onto distinct, sequential work items, with each item relying on those that occur before, and blocking those that come after. Waterfall project management therefore represents a very rigid and linear approach. The waterfall system is a very traditional method for managing a project with participants being assigned clear roles and expectations.

Waterfall Methodology Image credit: https://startinfinity.com/project-management-methodologies/waterfall

The name waterfall comes from the fact that each work item will cascade onto the next. In a waterfall project there is a lot of attention paid to defining the entirety of the project up front, and then a close adherence to the project plan and timeline. It is therefore difficult to incorporate changes into the project plan, and therefore it is hard to respond to setbacks, opportunities, or changes in requirements.

The waterfall project management style was initially designed in the 1970’s for use in software development projects. It was taken on by many other industries to great success, but is now seen as an outdated methodology for software projects.

Agile

In 2001 the Manifesto for Agile Software Development was published. The agile manifesto can be summed up as:

We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value:

Individuals and interactions over processes and tools Working software over comprehensive documentation Customer collaboration over contract negotiation Responding to change over following a plan

That is, while there is value in the items on the right, we value the items on the left more.

https://agilemanifesto.org/

The agile manifesto is based on the following principles:

The highest priority is to satisfy the customer through early and continuous delivery of valuable software.
Welcome changing requirements, even late in development. Agile processes harness change for the customer’s competitive advantage.
Deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale.
Business people and developers must work together daily throughout the project.
Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done.
The most efficient and effective method of conveying information to and within a development team is face-to-face conversation.
Working software is the primary measure of progress.
Agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely.
Continuous attention to technical excellence and good design enhances agility.
Simplicity–the art of maximising the amount of work not done–is essential.
The best architectures, requirements, and designs emerge from self-organising teams.
At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behaviour accordingly.
The agile mentality is designed to be able to be flexible, respond to change, deliver a working product early and have it improve incrementally over time.

Advantages of the agile mentality include:

faster deployment of working solutions
better quality of work
increased client satisfaction
reduced risk thanks to frequent testing and deployment
increased team moral
a project that finishes early is still able to deliver a working result

Some potential drawbacks include:

potential for scope creep from continuous client feedback
workload management can be difficult when duties and requirements are not known far in advance
short development cycles can mean that skills gaps are not identified early leading to delays in delivery

You will likely see that the Agile development mentality fits much better with research project management including both traditional research work and software development projects. Additionally you may note that your organisation may employ a mix of different management methodologies for different projects, or at different levels of the organisation. Understanding the benefits and drawbacks of each will help you choose a project management style that works best for your project.

Popular Agile methodologies

Scrum

Scrum is mainly focused on the idea of sprints. Sprints are where the bulk of the “work” gets done, although there is a significant amount of effort put into the preparation and planning of each sprint, and then the post-sprint review and retrospective. Sprints are typically 1-2 weeks in duration and will focus on a particular set of goals. During a sprint there is usually a daily scrum at the start of the day in which people talk about what they did the day before, what their plans are for today, and what problems they might foresee. The sprint review is about reviewing the work that was completed, reporting related to said work, and identification of incomplete work. The retrospective is a more meta-level reflection on how the sprint went, meant to identify how the team worked together, what organisation or communication worked well or not, and what changes could be made to make the next sprint more productive.

scrum Image credit: https://jordanjob.me/blog/scrum-diagram/

The scrum methodology is based on team work and has well defined roles. The benefits of scrum include the ability to react to changes in requirements or resourcing, continuous testing/integration is effectively built in, prioritisation of tasks can be adjusted throughout the process. The disadvantages of scrum are that you need very good organisation before each sprint to maximise productivity. Additionally it requires that team members are able to block out a 1-2 week period in which they focus solely on the sprint and maintain a high focus. Clearly, scrum requires a medium to large team (5+) in order to be effective.

Kanban

In the Kanban methodology the main focus is on the visualisation that is used during the development process – the Kanban board. One goal of the board is to identify potential bottlenecks in the process. An example board is shown below with the following features:

Tasks that are represented as squares. Sticky notes or cards are often used on a physical wall, or their digital representation online.

Columns that represent the state of each task. A task is expected to migrate from the left most column (Ready/Backlog) through the central columns and into the final (Done/Complete) column, as the related work is being done.

Work in progress limits. In order to maintain focus and productivity, there are often limits on the number of cards that can be placed into some of the columns. In particular the “in progress” and “testing” columns are limited by the available people time allocated to the project.

Swim lanes (not shown), are rows that separate tasks based on categories of work such as documentation, testing, or reporting. Alternatively, colour coding of cards can be used to identify these categories.

Kanban Methodology Image credit: https://startinfinity.com/project-management-methodologies/kanban

Even when the Kanban methodology is not being explicitly used, the visualisation process is so powerful that it often used in many other project management strategies. It’s not uncommon to see a Kanban board on the wall during a two week sprint.

The advantage of Kanban is that you have a single place to track the progress of all the tasks, and bottle necks can be quickly identified. One of the main disadvantages of Kanban come from people failing to update the Kanban board as a task is being worked on (poor communication), or updating too often (high overhead). Many project management software tools incorporate a Kanban board and tools like JIRA even have a way of migrating and updating tasks based on activity in a linked GitHub repository so that people can keep the Kanban up to date without having to visit the board itself.

Summary

Communication and effective project management are critical to the success of any collaborative project. There is no one-size-fits-all solution, and you will need to consider the project requirements, and the size and expertise of your team when deciding on a communication and management strategy. Thankfully there are many project management tools online that are free that will support whatever choice you make.

Member roles

Depending on the scope of your project, the size of the team, and the management strategy that is being used, you will have a number of roles that need to be filled within the team. Below is a list of some common roles that may be applicable to your project along with a short description. It should be noted that not all roles are required for all projects, and that the roles do not need to map to people on a 1:1 basis. It is normal to have multiple people per role and to have some people acting in multiple roles. The most important thing is that you consider the different roles and responsibilities, and have at least an implicit agreement on who is going to be filling each of the roles. This will help people to understand their responsibilities and who they should be handing work over to or working with during the project development.

Roles

Software Developer
- Primarily focused on writing the software, fixing bugs, implementing new features.
- Should be responsive to issues, and provide implementation advice to the team.
- Documentation and Test Developer
- Primarily focused on the creation of documentation and developing tests.
- Test cases and docstrings should be consistent.
- Typically also will be responsible for automating the testing process and reporting issues.
- Documentation can be focused on docstrings but can also include online documentation, help files, tutorials, and examples.
Product Owner
- The person who takes responsibility for setting the software project goals.
- This person is tasked with understanding the client/user needs and translating them into user stories, milestones, and tasks.
Domain Expert
- Someone who has a good understanding of the context in which the software will be deployed or used.
- When developing software for astronomers, it is not essential for all the development team to have a good understanding of the general or specific field of application.
- A domain expert can be an internal point of contact for the development team when they have questions about the correctness or utility of an aspect of theproject.
Project Manager
- Primarily focused on the organisation of the team, ensuring that team members have tasks matched to their skills, and that the required skills are available within the team.
- The project manager will be responsible for the timing and scheduling of work and deadlines.
Reviewer
- Any piece of work that is completed should be assigned to a reviewer to ensure that the work is up to standard.
- The reviewer should have a good understanding of the goals of a piece of work and be able to give feedback on areas that need improvement.
- The reviewer should ideally not be involved in the development of the piece of work they are reviewing.
Approver
- Similar to a reviewer, except that there is no requirement for an approver to understand the implementation of the work being done.
- An approver is focused on ensuring that the proper procedures have been followed.
User Acceptance Tester
- Someone who is able to perform the necessary tests to ensure that each of the user stories have been met.
- Acceptance testing is typically not able to be automated and will require manual interaction to ensure that the end user can perform the tasks outline in the user stories.

The following workflow describes the interaction of the above roles within a generic software project:

Solid lines indicate the flow of content for action, approval, or feedback. Dashed lines indicate input in the form of advice or oversight.

Discussion

GROUP Activity: Who does what and when?

Within your software project group review the past three lessons and discuss the following:

What software development roles are required for your project?
What project management roles are required for your project?
Are there any roles that would be beneficial to your project that are not listed?
How are these roles distributed among the team?
Are there people with multiple roles or roles shared among multiple people?
Who determines the timeline and deliverables for this project?
Is the timeline flexible?
Can you classify the deliverables as one of required/desired/optional?
How often and in what format should communication be managed?

If you have been involved in a previous software development project, please share the following with your teammates:

What practices worked well and could be brought into this and future projects?
What practices didn’t work well and should be avoided in the future?
Was the previous project larger or smaller than the current one and how would that affect your choice of management/communication?

Create project standards and expectations

When a single person is working on a project there will typically be a consistency imposed simply by the fact that the developer has ‘their way’ of doing things (though this consistency may not exist through time). When multiple developers are working on a project it is good practice to have an agreed up on set of standards that will be followed to ensure that the project has a consistent style and that common practices are followed. These practices can include how/when to test, the branch/develop/merge cycle, documentation formats, as well as code style. A common place to note these standards and expectations is in the CONTRIBUTING.md file in the root of your software project. This file should be considered to be supplementary to the README.md file, and for a slightly different audience. Whilst the README.md file is aimed at users of the software, the CONTRIBUTING.md file is aimed at people who might develop the software, give feedback, or submit bug/feature requests.

Sections to consider for `CONTRIBUTING.md`

Welcome and encourage people to contribute to the project
Table of Contents (especially if the file is long)
Style guide
- List standards for code style
- Consider using a linter and listing it here (with settings)
- Note the docstring format and guidelines
Testing
- Where the tests are located
- How the test are run
- The machine/environment on which the tests are expected to pass
How to submit changes
Who can submit changes
Expectations for what changes will be accepted
The pull request approval process
How to report a bug
- What is expected for a good bug report
- What tags/categories should be used when submitting a report
- How to request an “enhancement”
- What enhancements are likely to be support and which are not
Templates
Examples for enhancement / bug report / change requests.
Code of Conduct
- Expectations for behaviour when contributing to the project
- Consequences for breaching the code of conduct
Email/web address for reporting breaches
Recognition model
- Let people know how their contributions will be recognised.
- The following may be appropriate:
  - An acknowledgement section on the wiki
  - Co-authorship or acknowledgements in a published paper
  - Invitation to join as github contributor
  - A shout out on social media
  - A beverage of choice

The CONTRIBUTING.md file can become quite long if all of the above are included. The key is to have a record of how people should interact with others and with the project and that the project maintainers adhere to these guide lines.

Creating a GitHub repository

At the moment we have a bunch of code, documentation, and associated files. We wish to make these available to others in the easiest way possible. We could just zip them all up and email to collaborators. However this means that collaborators no longer benefit from any future development that you do, and even worse, will start asking questions about code that may have diverged from the version you are working on. A solution to this problem is to keep all your code under version control, and to make use of one of the many free, online repositories to host a copy of the code. For this lesson we’ll focus on GitHub, but gitlab and bitbucket are also good alternatives.

Create an account on Github

Github accounts are free, you just need an email address to sign up. Since some academic institutions shut off your email address promptly when your contract ends, it may be a good idea to use a non-institutional or personal email to sign up to github or other services that will outlast your current contract.

Create a new repository on Github

Go to Github and sign in. You should be taken to a page which has a listing of your repositories and a green button to create a new one. Click the button and you should see a screen similar to the following.

For now, lets create a truly empty repository so don’t select any of the last three boxes. Just give the repo a name (your project name is a good choice if available). You can change the description later or fill it in now.

Once you create the repository you’ll see a set of instructions about what to do next.

Now we are going to go with “create a new repository from the command line”.

The first part is to make a new local git repository using init / add / commit and to name the branch to be main (instead of master).

BASH

git init
git add README.md
git add requirements.txt encironment.yml
git add sim_catalog skysim/*.py
git commit -m "first commit"
git branch -M main

The next step is to link your local repository with the one that you just created on Github. Replace PaulHancock/sky_sim.git with your username and the name of the repo that you chose.

BASH

git remote add origin git@github.com:PaulHancock/sky_sim.git
git push -u origin main

After you have done the git push you’ll have added all your local changes to the remote repository and you can view the new state of the project on Github. Note how your README.md file gets rendered into a nice splash page.

When you push to origin main you’ll have to authenticate with Github, see the github documentation for instructions on how to set that up for ssh.

If you set your repository to be public then anyone on the internet can see and download your code, they can make a fork (copy) of it to work on themselves, and can even send feature requests or bug reports via the issue tracker. We will get into those features later, but for now all you need to do is let your collaborators know that your code is available on Github, send them the link, and then they can download/clone it as needed. An added bonus is that as you make changes to your code, and then add/commit/push these changes, your collaborators can then pull those changes and get the updates without having to bug you about it.

Issue tracking on GitHub

Once your code is in the wild hopefully people will find it, use it, and give feedback. Positive feedback in person or by email is always nice, however there is also the issue of bugs, new feature requests, and people wanting clarification. Dealing with these last three points requires some organisation, and so we will learn how to use an issue tracker for this.

Github, gitlab, and bitbucket all offer an form of issue tracking that is attached to each of your software repositories. In this lesson we’ll focus on the Github issue tracker, but the lessons learned here are applicable to any issue tracker system.

The issue tracker is a way of engaging with your co-developers and end-users to discuss any problems that people may be having when using the software. The issue tracker is available for all Github repositories, and enabled by default. However, publishing code on Github does not mean that you are obligated to provide any support at all. If you don’t intend on providing support for your software, it would be a good idea to mention this in the README.md file that is shown on the landing page so that people have clear expectations. If you do intend to provide support and receive feedback then the issue tracker is for you.

Overview

We will cover four of the most common issues that you are likely to see or use on the issue tracker, and give some guidance and advice about how to approach them. We’ll cover general questions, bug reports, feature requests, and pull requests.

Github issues has become a full featured work planning and project management system (see link), most of which is beyond the scope of this course. We will be focusing on the basic capabilities of the issue tracker to get you and your group started. Once you are up and running you should explore the other features.

To begin, let’s navigate to the github repository for our project of choice. On the front page you should see a set of tabs. By default you’ll be seeing the <> code tab, but we want to select the ⊙ Issues tab.

Initially this will be blank for your project because there are no issues (yay).

Creating an issue

Navigate to the Issues tab of a repository on Github and you’ll see a “New issue” button in green. Press this and we’ll explore some of the options.

An issue has a title (or short description) and a comment (long description). When creating an issue you can add some labels to it so that others can easily understand what kind of issue you are reporting. Github has a range of built in labels, and the repository owner/admins can create more if needed.

Once an issue has been created, the author or repository owner/admins can adjust the issue by assigning people to work on it, updating the labels. This is a helpful piece of organisational work. Others can can also comment on the issue so that there is a back and forth between the developers, the person reporting the issue, and anyone else experiencing the same/similar issues. When an issue is resolved it can be marked as closed. Closed issues are not shown by default, but can be seen by removing the is:open or add the is:closed filter.

We will now go through some of the different types of issues that are typically reported and in the next lesson we’ll have a go at creating/resolving these issues.

Questions

This tag is usually used by people asking for clarification. There is not necessarily any issue with the code, it is just that the user doesn’t understand something. The threaded structure of the issue tracker makes this a useful place to have a discussion about the question. Questions usually lead to additional documentation being created – for example an FAQ section on the wiki for the repository. Questions can also be escalated into bugs or feature requests.

If someone sends you a question via email that does not involve personal/private information, then it is often a good idea to ask them to post the question on the issue tracker and then discuss it there. If the person isn’t that interested in using Github then I often just ask for their permission to replicate the email discussion on the issue tracker (with/without naming them depending on their preference). The mentality is the same with people asking questions in a classroom – if one person has a question, often there are others with the same question who are too shy to ask. By answering the question publicly you reduce the number of duplicate email chains that you have.

Bug reports

Bug reports are for when people find an error in your code. The common errors are code crashing, or code giving unexpected or wrong output. Best practice for reporting a bug is to give as much information as is required to reproduce the error. This is the minimum working example (MWE), or rather the minimum example that produces the error. It is at this point where code versions and environments can be super helpful. Asking people to run mycode --version and paste the output in the bug report can help a lot. Sometimes people work with old versions of code and simply updating to a new version can fix the issue. For a program crash, a copy/paste of the output/error is useful.

Github allows you to add files and images to the issue tracker so that people have the option of supplying a small amount of real data to reproduce the error. Similarly they can screenshot a problem (particularly useful for graphic interfaces or code that produces plots).

Keep in mind that when someone posts a bug report it is because they are having a problem. That problem is real. It may not be due to your code. It may be due to them misusing or misunderstanding your code. It may be due to issues with code that you are dependent on. It may be your fault. Just remember that people are not looking to place blame, but are looking to find a solution. Have a conversation with them in the issue tracker to figure out what is going on and how you could help solve the problem. If you don’t consider the bug to be a problem (it’s a feature not a bug) then note this in the issue tracker.

Feature requests

Sometimes a user will have an idea about how to improve or expand the capability of the software they are using. A feature request is a way for the user to suggest these improvements. Feature requests are not an indication that something is wrong, but that there is an opportunity to be better. Some example feature requests are:

support additional input/output formats, or
support additional operating systems, or
provide some sanity checking before users made silly mistakes, or
improve a users quality of life by combining multiple often used functions into one, or
provide documentation in an alternative format (html, pdf, online wiki etc).

Feature requests are typically a user’s wish list, which if fulfilled, will save the user time or allow them to expand the scope of their work.

Pull requests

For a collaborative software development project you’ll typically have multiple people contributing code to a range of branches. When the development of the branch is complete the developer will submit a pull request to have their changes/updates incorporated into a reference branch (usually dev or main). A pull request is essentially a moderated git merge (or git rebase) that allows you to see any conflicts, see/discuss/approve changes, and make any final changes required before the merge actually takes place.

People have the option of forking (copying) your public repository and making their own changes. If you are lucky, people will make useful changes to your code and then offer these changes back to you via a pull request. If these changes are aligned with the goals of your project and meet the various style and testing conditions that you set, then the pull request should be accepted.

A pull request is a request. There is no necessity for all pull requests to be accepted, however it is good practice (and polite) to give feedback on any pull requests that are not going to be accepted.

If you would like to capture the style, testing, and documentation expectations for your project then a file called CONTRIBUTING.md in the root of the repository is a common place to define this. You can ask that people making pull requests obey these expectations, and it is possible to create automated ways of ensuring these standards are obeyed.

Summary

Whether your development group is just you, or three, or ten people, the issue tracker is a free and convenient workflow management platform.

Discussion

GROUP Activity: Create, discuss, and resolve an issue on GitHub

Given the previous lessons, you should consider creating or updating the CONTRIBUTING.md file for your project.

Have one member of your team create a new issue in the GitHub issue tracker to create/update the file and assign the issue to at least one other member of the team.
Tag the issue with a tag such as documentation, or create a new tag that is more relevant.
Within your group, use the issue tracker to discuss what sections are required for the CONTRIBUTING.md file.
Once there is a consensus on the content, have someone make the required changes and push the file to GitHub (or edit directly online using the GitHub editor).
Try using the #<IssueNumber> format within the git commit to link the commit to the issue discussion.
When an acceptable file has been created/modified mark the issue as resolved (closed).

Branching and development

In this lesson we’ll focus on one of the most popular git workflows: Feature Branching. Many other workflows exist, but the most important feature of any workflow is that it provides benefit to the project. See the Atlassian tutorials on workflows for more information.

Feature branching

At the core of the feature branching workflow is the idea that all development should be done in a branch separate from the main branch. The rational for this is to ensure that the main branch of the project is always in a not-broken state. When people find your software repository and want to try it out, they will most likely check out the main branch and start their evaluation or usage journey from there. Having a broken main branch is a good way to turn people away from your software, and generate a lot of bug reports.

The diagram below shows the basic feature branch workflow.

In the above case a feature has been requested in the issue with ticket number 123. A developer is assigned the task of developing this feature and begins by creating a new branch with git branch using a branch name appropriate for the task. The development proceeds on the feature branch, occasionally breaking the code, fixing the bugs, updating tests and documentation, until finally a new version of the code has been created which implements the new feature. At this point the developer responsible for this branch ensures all their changes are pushed to Github and the opens a pull request. During the pull request other developers, and maybe the person who submitted the initial feature request, will review and discuss the changes, ensure that the code meets the repository standards for style and quality. Once everyone is happy with the changes in the feature branch it is merged into main by accepting the pull request on Github. Once the feature branch has been merged into main it is deleted.

In this scheme many features branches can be created, developed, and then deleted over the life-cycle of the project.

A common variation on the feature branch workflow is to include a development branch as an intermediary between the main and feature branches. Feature branches are created off the develop branch and then merged back when complete. The develop branch therefore contains all of the latest features and if new features interact with each other in unexpected ways, this can be discovered on the develop branch rather than the main branch. The main branch is used for tagging and releasing new versions of the software, and these new versions can each include a number of developments.

Setting up branches

Git does not see any branch as being special. We assign “specialness” to a branch based on the name, but we can rename branches or change our idea of special as we like. Regardless of how a git repository has been set up, you can move from one branching scheme to another at any time. Creating a new develop branch and then making a habit of branching features from that instead of main can be done at any time. The key thing is to make sure that the branching/development workflow that you decide on is serving a purpose for your project. Early on in the development of some software you may have a single developer who is hashing out a proof of concept. In this case you may do all your development right on the main branch. As you start to share your code with others you may decide to move development into the develop branch, and merge back to main only when the code-base is in a working state. Finally, as you bring more developers into the project you may decide that a feature-develop-main workflow is a better way to keep the various developments from interfering with each other.

The point is that you should make a choice, write it down some place (CONTRIBUTING.md), stick to that choice for as long as it is useful, and revise it when needed.

An example development cycle for fixing a bug

As I’m using my own code for various tasks I notice that

./scripts/sim_catalog --ref_ra=00:42:44.3 --ref_dec=-00:30:19 --radius 1 --n 1000

produces some unexpected output. A plot of the sky locations is shown below. Note that the points have been generated around a central declination of 00:30:19 instead of -00:30:19. It seems that there is an issue with a negative reference declination.

Reporting the issue

As a user, once I have identified an issue I should go to github and make a bug report on the issue tracker. In the example project I’m the only on the developer team so I’ll assign myself to the issue. I’ll also label it as being a bug.

Confirming the issue

Now that the issue has been submitted I’ll tackle this problem from the developer point of view.

The first thing to do is read and understand the issue. In this case I’ll just run the script exactly as shown in the issue tracker, and see that I get the same problem.

Create a feature branch for this issue

At this point I should create a new branch in my repository with some relevant name. Since the names of issues are not guaranteed to be unique I will instead use the issue number (#1 in this case) as part of the branch name. For a project with multiple developers it is also a good idea to identify who is the ‘owner’ of each branch. An example branch choice would be:

git branch Paulhancock/Issue#1

After some trial and error I find that the cause of the bug is in the following code:

PYTHON

def generate_positions(ref_ra='00:42:44.3',
                       ref_dec='41:16:09',
                       radius=1.,
                       nsources=1000):
...
    # convert DMS -> degrees
    d, m, s = ref_dec.split(':')
    dec = int(d)+int(m)/60+float(s)/3600
...
    return ras, decs

The first thing that I do is to make a new test that will expose this bug.

Writing a test

We will now write our first formal test for our code. We’ll use a format that will make future automated testing easier. All the tests that we wish to run are python scripts, and we’ll place them in the tests/ directory. Each script will test a different sub-module, and for the sim sub-module of our skysim module, we’ll collect all the tests into tests/test_sim.py.

The first thing that we need to do is import the module/code that will be tested. In this case that will be the generate_positions function within the skysim.sim module.

PYTHON

#! /usr/bin/env python3
"""
Tests for the skysim.sim module
"""
import numpy as np
from skysim.sim import generate_positions

Each test that we write will be contained within a separate function whose name begins with test_, and which returns None when the test passes, and raises an AssertionError if the test fails. While we could home-brew our own set of standards for what pass/fail looks like, we will instead use standards set out by one of the common python testing frameworks called pytest.

We craft a piece of code that will detect the mistake in our original function. In this case the mistake is that the negative sign at the start of the declination is being ignored so we get the wrong positions. To test for this we’ll run generate_positions with a declination that is negative, and small radius, so that the expected output should consist entirely of negative declinations if the function works properly, and probably all positive declinations if it’s broken.

Our test function looks like this:

PYTHON

def test_negative_dec():
    """
    Test for the negative dec bug noted in issue #1
    """
    _, decs = generate_positions(ref_ra='00:00:00',
                                 ref_dec='-00:30:19',
                                 radius=0.1, nsources=10)
    if not np.all(decs < 0):
        raise AssertionError("Declinations should be <0, but are >0")
    return

In order to run the tests we can add the following snippet to the end of our script. The snippet essentially looks at all the global variables (including function names), selects those that start with test_, assumes that they are a function and calls that function. When the function is called there is a try/except for an AssertionError which reports failure if it’s caught, or reports success if no error was raised.

PYTHON

if __name__ == "__main__":
    # introspect and run all the functions starting with 'test'
    for f in dir():
        if f.startswith('test'):
            try:
                globals()[f]()
            except AssertionError as e:
                print("{0} FAILED with error: {1}".format(f, e))
            else:
                print("{0} PASSED".format(f))

When we run our test code we get the following result:

BASH

$ python tests/test_sim.py 
test_negative_dec FAILED with error Declinations should be <0, but are >0

This failure is not a bad thing, it means that we have successfully written a test function that will identify the bug. Now we can begin the process of fixing the bug.

Fixing the bug

Finally, once I have the test code in place, it’s time to fix the bug. I make some modifications to account for the leading minus sign on the declination as follows:

PYTHON

    # convert DMS -> degrees
    d, m, s = ref_dec.split(':')
    sign = 1
    if d[0] == '-':
        sign = -1
    dec = sign*(abs(int(d))+int(m)/60+float(s)/3600)

And I then re-run the code to make sure that the bug has been resolved, and then run my tests:

BASH

$ python tests/test_sim.py 
test_negative_dec PASSED

As I develop more and more tests the list of functions run will grow. Once the new bug has been solved I will re-run all my tests to ensure that fixing this bug has not caused a new bug some other place.

Checking in my work

I now check in my new test code, and updated version of sim.py:

BASH

git add tests/test_sim.py
git commit -m 'expose bug from issue#1'
git commit -m 'resolve #1' skysim/sim.py

Note that I have used #1 to refer to the issue from within my commit message. When viewed on Github these commit messages will automatically generate a link to the issue, and when viewing the issue I should be able to see the reverse link.

I now push the bug fix (and my new branch) to the Github repo.

BASH

git push --set-upstream origin Paulhancock/Issue#1

If we look on the original issue page, we can see the link to the commit.

Creating a pull request

When we navigate to the landing page for our repository we will see a new yellow banner appear as below:

github issue with recent updates on branch

We can click the green “Compare & pull request” button to start a new pull request. Alternatively we can go to the “Pull requests” tab. Either way we enter a title and description for the pull request.

Note that the assign/label/project/milestone options that we see on the pull request form are mostly the same as on the Issues form. This is because pull requests are just special types of issues. They share a numbering scheme. This is the first pull request for this repository but it will be labelled #2 because there is an existing issue #1. One difference between a pull request and an issue is that a pull request can have a reviewer assigned to it. Here I have selected myself as the assignee (the person looking after the pull request), and SkyWa7ch3r as the reviewer (the person who will review my code and sign off when they are happy).

Github does some work in the background to let me know that there will be no conflicts between this branch and the main branch, so that it is ‘safe’ to do the merge. Currently there is no indication that the code works or passes our tests. For now we let the reviwer do this work. The reviewer would pull the Paulhancock/Issue#1 branch, run the tests and see that they pass, then come back to github and make a note of it in the discussion. (In a later lesson we’ll see how we can make Github do most of this work for us using Github actions.)

Once our reviewer(s) are happy with the changes we can merge our branch back into main by pressing the green button. This will create a new commit on the main branch in order to do the merge, so we’ll be asked for a title/description for the commit. It is pre-filled for us. Once the merge is complete Github will let us know that all is good, and suggest that we delete the branch. Since the feature is merged we no longer need this branch and will delete it.

Closing the issue

If we navigate back to the issues tab, we’ll see that the issue related to the pull request has also been closed for us. This is because we wrote resolve #1 as a comment for our commit. When that commit is merged into the main branch git will automatically resolve (close) the linked issue. If we didn’t use this smart linking capability we can still go back to the open issue and close it. Either way it would be good to leave a note about the issue being fixed.

Summary

The development cycle for fixing a bug is as follows:

identify bug
report bug on the Gihub issue tracker
confirm that the bug exists
create a feature branch
write a test to expose the bug/error
fix the bug
run all tests
commit changes
create a pull request
merge the branch into main and delete the feature branch

In this example we had one person doing the reporting/fixing. Usually you’ll have an end user doing the finding/report part, and then one or more developers doing the remainder.

Discussion

GROUP Activity: Working on the same problem

In this activity you will practice working in a team as you go through a simple development cycle.

As a group, identify a minor issue with your current project. (Don’t feel bad, the only projects that don’t have issues are the ones that no one is using).
Since the focus here is on the life-cycle of a bug, choose a small thing that is easy to identify and fix.
A function without a docstring would be ideal (and probably easy to find).
Have one team member create an issue that describes the problem and explains what a solution would look like.
eg “myFunc(4) should return a filename”, or “myFunc should have a docstring in ”.
Create a new branch for development work called or Issue.
Assign at least two team members to work on this issue.
Have one team member create a solution for the issue and then:
Push the changes to the remote repository,
Open a pull request to merge this branch into main,
Set the assignees to be all the members working on the issue,
Set the reviewer to be a team member who is not working on the code (if you have enough people, otherwise choose someone other than the pull request creator), and
Add whatever labels look appropriate.
Have the other team members:
View the changes via the “files changed” tab of the pull request,
Comment on what is good and what needs improving,
Pull the active branch to their local machine,
Make an improvement and commit the change, and
Push the change back to the active branch.
Everyone should have an opportunity to make comments and changes, but you’ll need to coordinate who is doing what and when to minimise the number of merge conflicts that occur.
Once everyone has had a chance to view/comment/change, have the reviewer sign off on the pull request and merge the changes into the main branch.

Content from Making Code Reusable by You and Your Group

Last updated on 2026-02-25 | Edit this page

Estimated time: 180 minutes

Overview

Questions

How do I easily share or re-use my code?
When sharing, what information should I provide?
Where and how can I easily share code?

Objectives

Package your code into a module.
Create a command line interface for your code.
Identify and include meta-data in your module.

Separating the UI from the functionality

A programming principle called model-view-controller (MVC) is recommended when designing code. Under MVC, programs are separated into one of three components, the model (library) which provides the core functionality, a viewer which provides a visualisation of the results, and a controller which links the user to the model. In our test code the model would be the two functions generate_positions and write_file, whilst the controller would be the command line interface. [In this example we could consider the file output to be the viewer.]

At the end of cycle1 we had a single file, sim.py, which contained both the model and the controller. Our first task will be to separate this into two parts. The first part will be a python module which provides the functionality, and the second will be a script which receives user input and calls the library functions.

Creating a python module

Python modules, like the numpy module that we have already used, can be easily created by obeying a simple directory/file structure. If we want to create a module called skysim then all we need to do is create a directory with the same name, and add an empty file called __init__.py. Let’s do that now:

BASH

$ mkdir skysim
$ touch skysim/__init__.py

To access the module we simply use import skysim.

BASH

$ python
>>> import skysim
>>> dir(skysim)
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__']
>>>

We can see that the skysim module already has some attributes defined, and they all start with a double underscore ( __ or ‘dunder’ ). The file attribute is a string containing the full path to the file __init__.py. The __name__ attribute will contain the string skysim because this is the name of the module. If we had renamed the module on import (using import skysim as other) then the __name__ attribute would still be the same. Feel free to explore the other attributes.

In order to add some functions or attributes to our module we can simply add our sim.py file to the skysim directory. If we do this and then restart our python interpreter we can import all the functions/modules/variables provided by sim.py by doing from skysim import sim. For example:

BASH

$ python
>>> from skysim import sim
>>> dir(sim)
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'argparse', 'generate_positions', 'math', 'np', 'write_file']
>>>

Above we can see the two functions that we defined, as well as the modules that we imported (argparse, math, and np). We now have a library that we can import. When we import a module, all the code within that file is executed. In the case of our sim.py file this means that we import some other modules and then define some functions.

If a python file is run via python myfile.py then the __name__ attribute will be set to the string __main__. However if a python file is imported via import myfile or import myfile as mf, then the __name__ attribute is set to be the filename (without the .py extension). Therefore when we import sim.py as part of our module the CLI that we created with argparse does not get executed because we hid it within an if __name__ == "__main__" clause.

Creating a stand alone script

To create a script that calls this library we create a new file called sim_catalog. We then move all the content of the if __name__ == "__main__" clause out of sim.py and into our new file. Finally, we the import the functions that we need from our newly created module.

Our new script looks like this:

PYTHON

#! /usr/bin/env python

import argparse
from skysim.sim import generate_positions, write_file

if __name__ == '__main__':
    # Set up the parser with all the options that you want
    parser = argparse.ArgumentParser(prog='sim')
    group1 = parser.add_argument_group()
    group1.add_argument('--ref_ra', dest='ref_ra', type=str, default='00:42:44.3',
                        help='Central/reference RA position HH:MM:SS.S format')
    group1.add_argument('--ref_dec', dest='ref_dec', type=str, default='41:16:09',
                        help='Central/reference Dec position DD:MM:SS.S format')
    group1.add_argument('--radius', dest='radius', type=float, default=1.,
                        help='radius within which the new positions are generated (deg)')
    group1.add_argument('--n', dest='nsources', type=int, default=1_000,
                        help='Number of positions to generate')
    group1.add_argument('--out', dest='outfile', type=str, default='catalog.csv',
                        help='Filename for saving output (csv format)')

    # parse the command line input
    options = parser.parse_args()

    ras, decs = generate_positions(ref_ra=options.ref_ra,
                                   ref_dec=options.ref_dec,
                                   radius=options.radius,
                                   nsources=options.nsources)
    write_file(ras, decs, outfile=options.outfile)

Note that we have a shebang line (#!) to indicate that we want to use the python interpreter. This means that we can make the file executable and then execute it like any other program without the user having to explicitly type python. It is not shown here, but I have made the file executable so that I can just type ./sim_catalog to run the above code.

We have now separated our interface (sim_catalog) from the model (skysim.sim). Currently the user will not notice any difference because the functionality hasn’t changed. However, we are now able to import the model into other python scripts. Our code is becoming easier to re-use by ourselves (and other developers).

Updating our test script

Finally, we just need to update our test script so that it will use the new sim_catalog script to do the testing.

Discussion

SOLO Activity: Command line interface

A command line interface (CLI) is a good way to allow non-developers to use your software. In the previous lesson we saw how we can separate the functionality of your software from the user interface by creating a module with the core functionality and a script to provide the user interface.

Using the example project we developed earlier:

Update the scripts/sim_catalog file so that it now has two argument groups,
1. The current group1 should have a title of “Program settings”
2. The second group (group2) should have a title of “Metadata”
Add the following parameters to group2:
1. --version, which will print the version
2. --date, to show the last modified date
3. --author, to list the authors
All the parameters in group2 should have a default value of False, and then use action='store_true' to set the parameter to True when this option is supplied by the user.
[Optional] Provide shortened versions of the parameters in group2 of the format -c etc..
Write the necessary code so that when any of the options in group2 are set, all the required information is printed and the program exits without performing the simulation.

For now you’ll have to define some variables within the script to hold these values, but later on we’ll demonstrate a better place to store them.

Setting up a python environment

You python environment is the ecosystem of modules that you have installed. If you have used pip to install modules then pip freeze will give you a list of all the currently installed modules and their versions. If you used conda (anaconda) to install modules then conda list will do the same. If you use a combination of the two then you are likely heading toward the following problem:

If you have ever sent a python script to a collaborator only for them to say that it doesn’t work on their machine, it’s likely that they have a different set of modules installed, or that they have a different version of the modules installed. It may feel like installing all the modules at once would help out. But different software may depend on different versions of the same modules which would require you to uninstall/reinstall different versions of modules depending on what you are working on.

A better solution is to install multiple different python environments – one for each piece of software that you wish to run. To do this we will work with virtual environments.

Creating a new virtual environment

With the introduction of python 3, there is now a new module called venv which handles the creation of new virtual environments. See the python documentation for a full breakdown. The short version is that we can create a new virtual environment simply via:

BASH

python3 -m venv [--prompt PROMPT] ENV_DIR

where PROMPT is the name of the virtual environment, and ENV_DIR is the location in which all the files for this environment will be stored. A nice place to store your ENV_DIR is in your home directory.

Once you have created a virtual environment you can activate it via:

BASH

source ENV_DIR/bin/activate

I find it nice to have a descriptive ENV_DIR such as .env-jupyter or .env-py3-myproject (where the ‘.’ indicates a hidden directory in linux) but a short PROMPT such as jpy, or py3-myproj. The reason for this is that once your project is activated, your terminal will have PROMPT added to the start of your command line. Since my PROMPT and ENV_DIR are different I then add an alias in my ~/.bash_aliases file to link them:

BASH

alias jpy='source ~/.env-jupyter/bin/activate'

Creating a new virtual environment is easy, and if you get it wrong, you can just delete the ENV_DIR. The other nice thing is that if you are on a Linux or OSX machine your system version of python (which is used by the OS) can remain untouched, and won’t cause your system to become flaky when you totally mess up your python installs.

If you use Anaconda to install/manage your python modules, then you can simply navigate to the ‘environments’ tab, and create a new environment with the click of a button.

Now that we have created a new python environment we need to install some things into it.

Preserving environments with requirements.txt or environment.yml

Once you and your collaborators are comfortable managing different python virtual environments you are all set to start exporting and importing those environments for sharing.

The simplest way to share a python environment is to use a file which lists all the modules that are installed, and even their versions. Pip and conda have different ways of capturing dependencies.

pip – requirements.txt

For our example project we have used only one non-builtin python module: numpy. To specify that this is module is required we can create a file called requirements.txt and add the following:

BASH

numpy>=1.19

where the >=1.19 indicates that version 1.19 or greater will be fine. See the pip documentation for examples on how you can specify different version numbers, ranges, and exclusions. By default each module listed will be installed from the python package index (pypi), but you can also indicate other locations including local files/folders or github repositories.

If you don’t know what versions or dependencies you need for your particular project then you can get a hint by inspecting your current environment using pip freeze. This will list all the installed modules and versions for the current environment. The list will be much longer than you anticipate! To get a minimal list, start with an empty environment and then keep installing modules until your software will run, then run pip freeze >> requirements.txt.

To install all the modules within a requirements.txt file you run pip:

BASH

pip install -r requirements.txt

You now have a way of preserving and sharing your python environment with others. It is good practice to keep a requirements.txt file in your project directory, and to have this as part of your version control repository.

conda – environment.yml

Anaconda uses a slightly different file format to capture essentially the same information. For the above example our file would look like:

BASH

dependencies:
  - numpy>=1.19

Anaconda is able to reproduce not just the python modules, but the entire virtual environment including the versions of python and anaconda and also other non-python based codes. You can generate a good starting point using conda list --export.

In order to 1) not dictate your users’ virtual environment management choices, and 2) not duplicate information, the following method will link your environment.yml and requirements.txt files:

BASH

name: my-env
dependencies:
  - python>=3.8
  - anaconda
  - pip
  - pip:
    - -r requirements.txt

Conda understands pip formatted requirements, but pip doesn’t understand conda environment files, so there is no inverse version of the above linking.

Summary

In combination with a python virtual environment (previous lesson), we now have a greater expectation that our software will run the same on other people’s machines as is does on our own (test) machine. There will be some localisation (win/linux/osx) or language (en, ch, fr, etc) difference that might cause some problems down the line, but we have solved one of the most common sources of error.

Useful project metadata

Previously we used an empty __init__.py file in a directory to cause python to interpret that directory as a module. There is meta data that we wish to store about our project, and the __init__.py file is an excellent place to do this.

Versioning

One of the most common items to store is the version of your software. For most modules the module.__version__ is used to store this information. There are many conventions for software versioning and no one single correct answer. Versioning is the practice of using unique numbers of names to refer to the different states of a software project or code. A common choice is to use major/minor/patch versions for the code such as 4.2.1, which is used by most python modules. See semver.org for a description on the how and why of semantic versioning.

A major version change will usually mean that the code has changed in some fundamental way, and that people should not expect compatibility between the two versions. For example: there were many changes between the last version of python 2 (2.7) and python 3 that meant not all code would work on both versions. If you have been using python for a while, or have seen python code from ~10 years ago (or ~2 years ago if it’s astronomers’ code!), then you’ll have seen some of these changes. Most notable is the change from print "hello" to print("hello").

A minor version change will usually indicate changes have been made that do not break compatibility within the major version. This would usually include the addition of new functionality that is compatible with (but not available in) previous versions of the software.

A patch version is used to distinguish states of development that do not change the intended functionality of the code base. These include bug fixes, security patches, or documentation updates.

Typically version 1.0 is used to represent the first stable/complete version of the software, and so versions such as 0.9 are used to indicate development versions.

Modification date

In addition to a semantic versioning noted above, some developers find it useful to record the date of last change for each version of their software (and indeed each file/module within). For this purpose we would make use of the module.__date__ attribute. Date formats are a perpetual problem for people and computers alike so it is recommended that a single format be chosen and used consistently throughout a project. The format YYYY-MM-DD is recommended as it has the advantage of being time sorted when sorted alphanumerically.

Authors / developers

One method tracking attribution (or blame) in a project is to use the module.__author__ attribute to store the author name as either a string or list. For a project with few developers this can be handled easily. For larger groups or projects git blame would be a better method for tracking contributions on a line by line basis.

Citation

For any researcher writing software there is an eternal battle between writing good code and “doing science”. Acknowledging the use of software is common but not yet standard or required when publishing papers. To make it easier for people to cite your work you can use an attribute such as module.__citation__ to employer people to cite your work, link to papers or code repositories that should be cited or referenced. When combined with a --cite command line option, this is a great way for people to properly credit your work. If your target audience are researchers then it can be very useful to store a bibtex entry in the citation string so that people can just copy/paste into their LaTeX document.

Meta data for our example project

The __init__ file in our example project can be updated to include the above recommendations. Note the format of the __citation__ string being multi line, and including LaTeX formatting.

PYTHON

# /usr/bin/env python

__author__ = ['Dev One', 'Contrib Two']
__version__ = '0.9'
__date__ = '2021-12-02'
__citation__ = """
% If this work is used to support a publication please
% cite the following publication:
% Description of This code
@ARTICLE{ExcellentCode_2022,
   author = {{One}, D. and {Two}, C. and {People}, O},
    title = "{Awesome sauce code for astronomy projects}",
  journal = {Nature},
 keywords = {techniques: image processing, catalogues, surveys},
     year = 2021,
    month = may,
   volume = 1337,
    pages = {11-15},
      doi = {some.doi/link.in.here}
}

% It is also appropriate to link to the following repository:
https://github.com/devone/AwesomeSauce
"""

Discussion

GROUP Activity: metadata

In the past few lessons we learned about virtual environments, and project metadata. We will now put some of these lessons in to practice.

As a group:

Decide upon a versioning scheme that will be used for the project, and if none currently exists, agree on what the current version number should be.
Discuss how people should give recognition for using this project as part of their research (or other) work.
Create issues on github that request the following changes. Share the creation and assignee responsibilities between group members.
1. The creation of a requirements.txt and environment.yml file that include the dependencies for your project. For each dependency specify the minimum version requirement to be the smallest version that is used by one of the group members.
2. Record the agreed upon current version number of the project in the <project>/__init__.py file.
3. Record the authors in an agreed upon order in the __init__.py file
4. Record the last modification date in the __init__.py file
Describe the method by which people should cite or give recognition for using this project
For each of the issues above, have a group member follow the branch-develop-pull request workflow that was described earlier. When creating pull requests set the reviewer to include the person who created the initial issue.
[Optional] Once all the pull requests have been resolved, repeat the previous SOLO activity for your project (creating a command line interface that will give users access to the --cite, --date, --version, --author information.

Licensing your work

By default any creative work is under an exclusive copyright which means that the author(s) of that work have a say in what others can do with it. In general this means that no one can build upon, use, reuse, or distribute your work without your permission. To use or build upon software that has no licence requires the new developer/user to contact the original author(s) and get permission. This is time consuming, annoying, and often not done. If you want your work to be used by others your best bet is to provide an explicit software license as part of your project so that people know up front what is allowed and not allowed. A common way of licensing software is to provide a LICENSE (or LICENCE) file in the root of the project. (Alternatively you can provide the license as part of the header for each file, but that’s a lot of repetition, and goes against our good practice of don’t repeat yourself).

Choosing a software licence for your project

Your home institute may have opinions/guidelines for appropriate licensing software. Ask around and follow the advice of you local experts.

If you don’t have any local constraints on licensing your software you can use one of the many license templates available on Github. To use a template you need to log into your Github account, navigate to your repository and then click the “add file”->”create new file” button. You will be presented with a blank text editor and be asked for a file name. If you use LICENSE.md (or any similar spelling/extension) then you’ll see a new button appear on the right of the screen saying “Choose a licence template”. Click that.

You’ll then see a list of common software licences that you can choose from. If you are brave you can read each of them in full. Alternatively you can simply read the Github provided summary at the top of what the permissions/limitations are. Choose one that feels right to you and then press “Review and submit”. This will create a new licence file.

Once you have a license file GitHub will add a badge to the “About” section of your project like this:

As you can see, for the example project I chose the GNU General public licence.

Basic documentation in README.md

Upon downloading new software, the first point of call for many people is to look for some help on how to install and use the software. This is where a file such as INSTRUCTIONS, INSTALL or README can come in handy. The name of the file says what it is and will attract the attention of the user. While these files can be in any format, and have any name, a common choice is README.md. If you have navigated to a GitHub software repository, and seen the nice documentation available on the front page, then this has been generated from the README.md file.

The markdown format (guide here) is a simple to use, future proof, platform independent, document format that can be rendered into a range of other formats. As a bonus, the raw files are easily read and written by humans.

Things to consider for your README.md file

The name of the project
A description of the purpose of the software
Maybe a one liner for each script
Install instructions
List some high level dependencies
Usage instructions
If you have a CLI then the output of mycode --help is appropriate to include verbatim
A link to documentation
Author information and contact details (email, or just a link to github issues)
A note on how people should credit this work

README.md for our example project

BASH

# SkySim
This project was built in order to simulate source (star/galaxy/other) positions over an area of sky.

## Installing
This project relies only on python built-ins and the numpy library.
Use `pip install -r requirements.txt` if you don't yet meet these requirements.

## Usage
The main entry point for this project is `sim_catalog`:

./sim_catalog --help
usage: sim [-h] [--ref_ra REF_RA] [--ref_dec REF_DEC] [--radius RADIUS] [--n NSOURCES] [--out OUTFILE]

optional arguments:
  -h, --help         show this help message and exit

  --ref_ra REF_RA    Central/reference RA position HH:MM:SS.S format
  --ref_dec REF_DEC  Central/reference Dec position DD:MM:SS.S format
  --radius RADIUS    radius within which the new positions are generated (deg)
  --n NSOURCES       Number of positions to generate
  --out OUTFILE      Filename for saving output (csv format)


## Documentation
Documentation is currently just this file, and associated python docstrings.

## Author / Contribution
This project is developed by Dev One.
If you want to contribute to this project please create a fork and issue pull requests for new features or bug fixes.

## Credit
If you find this project to be useful in your academic work please cite the following paper:
> [One, D. et al. Nature, 2021](https://nature.com)

The above README.md file will render on github as below.

Summary

In this section we have made no changes to the core functionality of our software project. Instead we have made changes that make the software:

easier to use for both developers and users
by separating the command line interface from the core functionality;
more portable
- by listing dependencies in requirements.txt and environment.yml;
easier to acknowledge and get help,
- by adding version, date, authorship, and citation information; and
more accessible,
- by providing a README.md file.

The state of our project at the end of the second development cycle can be seen on the cycle2 branch of our github repository: here.

Discussion

SOLO Activity: A GitHub landing page

When people visit the GitHub page for your project they would like to see more than just a directory listing of the files. People are typically looking for some description of what the project is, and how they can install and use it. In this activity we will ensure that our project has a landing page that is attractive to potential users.

Use the GitHub online editor to create a new LICENSE file and choose an appropriate license from the template.
Create or update your project’s README.md file so that it contains at least:
1. A short description of the software
2. Some basic install instructions,
3. Some basic usage instructions,
4. A note about what documentation is available,
5. A section that lists the authors/contributors to the project (and a link to CONTRIBUTING.md if you have one).
6. A section that lets people know how you would like to receive credit if they should use your work.
Edit the “about” section of the GitHub page to include a short description of the project and at least one topic.

Content from Testing and Documenting Code

Last updated on 2026-02-25 | Edit this page

Estimated time: 180 minutes

Overview

Questions

How can I easily document my code?
Why is documentation and testing even needed?
How do I write and run test on my code?

Objectives

Understand why testing and documentation is important.
Add documentation to your project.
Document your work.

Documenting via docstrings

To avoid simulating the entire python interpreter in our minds, it is often easier to document the (intended) behavior of our code in a human readable format. Python offers the builtin function help() to display the documentation for a given function. For example, if we want to know what the numpy.sum function does we can just ask:

BASH

>>> import numpy as np
>>> help(np.sum)
Help on function sum in module numpy:

sum(a, axis=None, dtype=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)
    Sum of array elements over a given axis.
    
    Parameters
    ----------
    a : array_like
        Elements to sum.
    axis : None or int or tuple of ints, optional
        Axis or axes along which a sum is performed.  The default,
        axis=None, will sum all of the elements of the input array.  If
        axis is negative it counts from the last to the first axis.
    
        .. versionadded:: 1.7.0
    
        If axis is a tuple of ints, a sum is performed on all of the axes
        specified in the tuple instead of a single axis or all the axes as
        before.
    dtype : dtype, optional
        The type of the returned array and of the accumulator in which the
        elements are summed.  The dtype of `a` is used by default unless `a`
        has an integer dtype of less precision than the default platform
...

Where does help() get all this information from? In part, the information provided by help is part of the docstring for the enumerate function. We can view the docstring by viewing the __doc__ attribute of the function as follows:

BASH

>>> print(np.sum.__doc__)

    Sum of array elements over a given axis.

    Parameters
    ----------
    a : array_like
        Elements to sum.
    axis : None or int or tuple of ints, optional
        Axis or axes along which a sum is performed.  The default,
        axis=None, will sum all of the elements of the input array.  If
        axis is negative it counts from the last to the first axis.

        .. versionadded:: 1.7.0

        If axis is a tuple of ints, a sum is performed on all of the axes
        specified in the tuple instead of a single axis or all the axes as
        before.
...

Documentation vs commenting

There are two ways in which you can and should describe your code – documentation and commenting. These two ways of describing code have two audiences (which may overlap) – documentation is for the people who will use your code, whilst comments are for people who will develop your code. Both of these audiences include you, the original developer, some 6 months in the future when you have forgotten all the details about what you were doing. Quite simply:

Documentation is a love letter that you write to your future self.

Damian Conway

Comments

Comments should include design decisions, or explanations of difficult to interpret code chunks. Comments can include known/expected bugs or shortcomings in the code. Things that are not yet implemented, or hacks that deal with bugs in other modules, should also be in comments. Python comments come in two flavours: a single or part line comment which begins with a #, or a multiline comment which is any string literal.

PYTHON

'''
A comment that covers more than one line
because it is just so long
'''

def my_func(num):
    # assume that num is some numeric type, or at the very least
    # an object which supports division against an integer
    ans = num / 2 # A partial line comment
    return ans

The partial-line comment plus multi-line commands can be used to great effect when defining functions, dictionaries, or lists:

PYTHON

dict = {'key1': 0, # note about this item
        'key2': 1, # another note
       }

def my_func(num,
            ax, # a matplotlib axes object
            verbose=True, # TODO update to be logger.isEnabledFor(logging.DEBUG)
            **kwargs)

When python is interpreted (or compiled to byte-code), the interpreter will ignore the comments. The comments therefore only exist in the source code. Commenting your code has no effect on the behavior of your code, but it will (hopefully) increase your ability to understand what you did. Because the comments are ignored by the python interpreter only people with access to your source code will read them (developer usually), so this is a bad place to describe how your code should be used. For notes about code usage we instead use documentation.

Docstrings

Python provides a way for use to document the code inline, using docstrings. Docstrings can be attached to functions, classes, or modules, and are defined using a simple syntax as follows:

PYTHON

def my_func():
  """
  This is the doc-string for the function my_func.
  I can type anything I like in here.
  The only constraint is that I start and end with tripe quotes (' or ")
  I can use multi-line strings like this, or just a single line string if I prefer.
  """
  return

Docstrings can be any valid string literal, meaning that they can be encased in either single or double quotes, but they need to be triple quoted. Raw and Unicode strings are also fine.

Docstrings can be included anywhere in your code, however unless they immediately follow the beginning of a file (for modules) or the definition of a class or function, they will be ignored by the compiler. The docstrings which are defined at the start of a module/class/function will be saved to the __doc__ attribute of that object, and can be accessed by normal python introspection.

Docstring formats

While it is possible to include any information in any format within a docstring it is clearly better to have some consistency in the formatting.

There are, unfortunately, many ‘standard’ formats for python documentation, though they are all similarly human readable so the difference between the formats is mostly about consistency and automated documentation.

Scipy, Numpy, and astropy, all use the numpydoc format which is particularly easy to read. We will be working with the numpydoc format in this workshop.

Let’s have a look at an extensive example from the numpydoc website.

“““Docstring for the example.py module.

Modules names should have short, all-lowercase names. The module name may have underscores if this improves readability.

Every module should have a docstring at the very top of the file. The module’s docstring may extend over multiple lines. If your docstring does extend over multiple lines, the closing three quotation marks must be on a line by itself, preferably preceded by a blank line.

PYTHON

"""
from __future__ import division, absolute_import, print_function

import os  # standard library imports first

# Do NOT import using *, e.g. from numpy import *
#
# Import the module using
#
#   import numpy
#
# instead or import individual functions as needed, e.g
#
#  from numpy import array, zeros
#
# If you prefer the use of abbreviated module names, we suggest the
# convention used by NumPy itself::

import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt

# These abbreviated names are not to be used in docstrings; users must
# be able to paste and execute docstrings after importing only the
# numpy module itself, unabbreviated.


def foo(var1, var2, *args, long_var_name='hi', **kwargs):
    r"""Summarize the function in one line.

    Several sentences providing an extended description. Refer to
    variables using back-ticks, e.g. `var`.

    Parameters
    ----------
    var1 : array_like
        Array_like means all those objects -- lists, nested lists, etc. --
        that can be converted to an array.  We can also refer to
        variables like `var1`.
    var2 : int
        The type above can either refer to an actual Python type
        (e.g. ``int``), or describe the type of the variable in more
        detail, e.g. ``(N,) ndarray`` or ``array_like``.
    *args : iterable
        Other arguments.
    long_var_name : {'hi', 'ho'}, optional
        Choices in brackets, default first when optional.
    **kwargs : dict
        Keyword arguments.

    Returns
    -------
    type
        Explanation of anonymous return value of type ``type``.
    describe : type
        Explanation of return value named `describe`.
    out : type
        Explanation of `out`.
    type_without_description

    Other Parameters
    ----------------
    only_seldom_used_keywords : type
        Explanation.
    common_parameters_listed_above : type
        Explanation.

    Raises
    ------
    BadException
        Because you shouldn't have done that.

    See Also
    --------
    numpy.array : Relationship (optional).
    numpy.ndarray : Relationship (optional), which could be fairly long, in
                    which case the line wraps here.
    numpy.dot, numpy.linalg.norm, numpy.eye

    Notes
    -----
    Notes about the implementation algorithm (if needed).

    This can have multiple paragraphs.

    You may include some math:

    .. math:: X(e^{j\omega } ) = x(n)e^{ - j\omega n}

    And even use a Greek symbol like :math:`\omega` inline.

    References
    ----------
    Cite the relevant literature, e.g. [1]_.  You may also cite these
    references in the notes section above.

    .. [1] O. McNoleg, "The integration of GIS, remote sensing,
       expert systems and adaptive co-kriging for environmental habitat
       modelling of the Highland Haggis using object-oriented, fuzzy-logic
       and neural-network techniques," Computers & Geosciences, vol. 22,
       pp. 585-588, 1996.

    Examples
    --------
    These are written in doctest format, and should illustrate how to
    use the function.

    >>> a = [1, 2, 3]
    >>> print([x + 3 for x in a])
    [4, 5, 6]
    >>> print("a\nb")
    a
    b
    """
    # After closing class docstring, there should be one blank line to
    # separate following codes (according to PEP257).
    # But for function, method and module, there should be no blank lines
    # after closing the docstring.
    pass

The example above is intentionally extensive, but you should be able to see what is going on. There are a few parts to the documentation format, some of which are considered essential, good practice, or optional. See the numpy doc guide for a more gentle yet more complete discussion on the numpydoc standard.

Good practice documentation

The main goal of documentation is to describe the desired behavior or intended use of the code. As such every docstring should contain at least a one line statement that shows the intent of the code.

It is good practice to describe the expected input and output (or behavior) of your functions.

In the numpydoc format we put these into two sections:

Parameters: for the input Returns: for the output

There is no “Modifies” section for the documentation (though you could add one if you like). If the function modifies an input but does not return the modified version as an output then this should be included as part of the long form description.

The generate_positions function from the example skysim module has the following docstring:

PYTHON

def generate_positions(ref_ra='00:42:44.3',
                       ref_dec='41:16:09',
                       radius=1.,
                       nsources=1000):
    """
    Create nsources random locations within radius of the reference position.

    Parameters
    ----------
    ref_ra, ref_dec : str
        Reference position in "HH:MM:SS.S"/"DD:MM:SS.S" format.
        Default position is Andromeda galaxy.

    radius : float
        The radius within which to generate positions. Default = 1.

    nsources : int
        The number of positions to generate

    Returns
    -------
    ra, dec : numpy.array
        Arrays of ra and dec coordinates in degrees.
    """

Optional documentation

The type of errors that are raised, and under what conditions, can be documented in the Raises section.

Notes, References, and Examples, are also useful sections but not usually applicable to all functions or classes that you will be writing. If I have used code snippets from stack-overflow or similar, then I find Notes/References section to be a good place to acknowledge and link to those resources.

The Examples section can be used to show intended use. There is an automated testing suite called doctest which will scan your docstrings looking for segments starting with >>> and then run those segments in an interactive python interpreter. A solid test suite will typically contain many tests for a single function, thus trying to embed all the tests into your docstrings just makes for very long docstrings. It is preferable to keep your testing code in the tests module/directory of your python module, and to use the Examples section only for demonstrating functionality to the end user.

Making use of documentation

Some IDEs (the good ones) provide syntax highlighting, linting, and inline help as you write code. By providing docstrings for all your functions you can make use of the linting and inline help. Below is an example from VSCode in which the docstring for a function is being shown:

Help text being used as hint text in VSCode

You can use the help from the python console like this:

BASH

>>> from skysim import sim
>>> help(sim.generate_positions)

Help on function generate_positions in module skysim.sim:

generate_positions(ref_ra='00:42:44.3', ref_dec='41:16:09', radius=1.0, nsources=1000)
    Create nsources random locations within radius of the reference position.
    
    Parameters
    ----------
    ref_ra, ref_dec : str
        Reference position in "HH:MM:SS.S"/"DD:MM:SS.S" format.
        Default position is Andromeda galaxy.
    
    radius : float
        The radius within which to generate positions. Default = 1.
    
    nsources : int
        The number of positions to generate
    
    Returns
    -------
    ra, dec : numpy.array
        Arrays of ra and dec coordinates in degrees.
...

Additionally you can compile all the documentation into a website or other document using an automated documentation tool as described in the next section.

Automated Documentation

If your docstrings are formatted in a regular way then you can make use of an automated documentation tool. There are many such tools available with a range of sophistication.

The simplest to use is the pdoc package which can be obtained from pypi.org. The packaged can be installed via pip install pdoc, and then run on our test module using pdoc skysim.

By default pdoc will start a mini web sever with the documentation on it. This should be opened in your browser by default but if it isn’t you can navigate to localhost:8080 or 127.0.0.1:8080. Use <ctrl>+C when you want to stop the web server. For the example project this is the website that is generated:

Documentation rendered by pandoc for the skysim.sim module

To make documentation that is less ephemeral you can use the the -d docs option to cause all the documentation to be built and then placed into the docs folder. pdoc only supports html output, however other auto-documentation packages such as sphinx can write latex (and thus pdf), ePub, man pages, or plain text.

Documentation as part of your development cycle

A typical development cycle will consist of writing code, testing code, and writing documentation. The order in which this is done depends on the software development strategies that you set out for your project, or simply personal preference. At the end of the day the process is cyclic – with the end goal of having code, tests, and documentation that are all in agreement. Once your code/tests/documentation are consistent then you can package your code into a module and publish it for others to use.

Generating documentation with sphinx

The pdoc module that we used in the last lesson was nice, but more sophisticated options are available. Many of the documentation pages that you may have viewed (such as scipy, numpy, or astropy), use the Sphinx python documentation generator. Not only will Sphinx generate documentation from your docstrings, but it will also allow you to write documentation in reStructuredText which can then be rendered into HTML, PDF, ePub, and other document formats. For publishing code to a website such as readthedocs, Sphinx is an essential choice.

Creating documentation from our docstrings

The documentation work that we have done on the test module will not be wasted. Sphinx understands a range of documentation styles and can fairly reliably auto-detect which one you are using (even if you change throughout your project!).

To begin using Sphinx we need to install it via pip. It is recommended that we use a dedicated directory for our documentation (docs/). Once Sphinx is installed we initialise our documetaion using the sphinx-quickstart command.

BASH

pip install -U sphinx
mkdir docs
cd docs
sphinx-quickstart

We’ll be taken through a sort of install wizard with a few questions as follows. (bold are the answers that have been used for the example project).

OUTPUT

Welcome to the Sphinx 4.3.1 quickstart utility.

Please enter values for the following settings (just press Enter to
accept a default value, if one is given in brackets).

Selected root path: .

You have two options for placing the build directory for Sphinx output.
Either, you use a directory "_build" within the root path, or you separate
"source" and "build" directories within the root path.
> Separate source and build directories (y/n) [n]: y

The project name will occur in several places in the built documentation.
> Project name: SkySim
> Author name(s): Dev One
> Project release []: v3.0

If the documents are to be written in a language other than English,
you can select a language here by its language code. Sphinx will then
translate text that it generates into that language.

For a list of supported codes, see
https://www.sphinx-doc.org/en/master/usage/configuration.html#confval-language.
> Project language [en]:

Creating file /data/alpha/hancock/ADACS/MAP21B-Training-BPindor/docs/source/conf.py.
Creating file /data/alpha/hancock/ADACS/MAP21B-Training-BPindor/docs/source/index.rst.
Creating file /data/alpha/hancock/ADACS/MAP21B-Training-BPindor/docs/Makefile.
Creating file /data/alpha/hancock/ADACS/MAP21B-Training-BPindor/docs/make.bat.

Finished: An initial directory structure has been created.

You should now populate your master file /data/alpha/hancock/ADACS/MAP21B-Training-BPindor/docs/source/index.rst and create other documentation
source files. Use the Makefile to build the docs, like so:
   make builder
where "builder" is one of the supported builders, e.g. html, latex or linkcheck.

For most of these questions you can choose whatever answers you like, however separating the build/source directories for your documentation is recommended. Separate build/source directories make it easier to keep the documentation source under version control.

The setup will create the following file structure:

OUTPUT

docs/
├── build
├── make.bat
├── Makefile
└── source
    ├── conf.py
    ├── index.rst
    ├── _static
    └── _templates

If we move into the docs/ directory and type make html Sphinx will generate some documentation for us. The documentation will be a set of static .html files in the build/ directory. Open docs/build/index.html in your web browser to view them. At the moment there is no content, just a few links to empty or broken pages.

The reason that the website is empty is because we have a very spare docs/source/ directory. What we are seeing is simply the result of the index.rst page being rendered. In order to extract the docstrings from our module, and build the html documentation we have to edit some of the sphinx settings.

In the file socs/source/conf.py there are a lot of options that are set to default values. The answers that you chose in the initialisation stage are also in this file in case you need to change them (for example the release number).

PYTHON

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ['sphinx.ext.autodoc',
              'sphinx.ext.mathjax',
              'sphinx.ext.viewcode',
              'sphinx.ext.napoleon',
             ]

...

html_theme = 'sphinx_rtd_theme'

The extensions setting enables different sphinx extensions. See the sphinx extensions page for a list of what modules are available and what they do. The html_theme sets the theme for the website. Here we have chosen the sphinx_rtd_theme which renders the page using the default read the docs theme. If you want to explore more themes then you should visit the sphinx-themes gallery.

Some of the sphinx extensions (including the theme chosen here) require additional modules to be installed. Normally we would put these required modules in the requirements.txt file for our project. However, the documentation is typically only going to be built by developers (and automation scripts) and not end users, so instead we create a new file docs/requirements.txt that will list all the dependencies for building the documentation. Note that the theme sphinx_rtd_theme is provided by the sphinx-rtd-theme module (replacing _ with -).

Now that we have the autodoc extension enabeled, we just need to update the website to include this feature. In the docs/source/index.rst file we can add the following:

OUTPUT

Modules
=======

.. automodule:: skysim.sim
   :members:

With the extra extensions installed and the index updated we can now run make html again and see the updated documentation web pages.

We now have a way of turning all our module/class/function docstrings into documentation.

At this point we have generated documentation that would be useful to people who are either developing this module, or who are importing it for use in their own software. Essentially we are just documenting the API for this software.

Cross-linking your docstrings

Sphinx provides a mechanism that will allow you to make links between different parts of your documentation. Two main uses for this are for your “see also” section, where you would link to similar functions/modules, or in your parameter descriptions where you want to say that a parameter is some custom object type.

To make these links you simply use back-tics around the module/function/class you want to reference. Note that in our documentation for skysim.sim.generate_positions we indicate that the ra/dec return variables are of type numpy.array. At the moment this is treated as just a string, however we can link it to the numpy documentation by using the extension called intersphinx.

To enable intersphinx we update the extensions part of the conf.py file, and then add a new parameter called intersphinx_mapping as shown below.

PYTHON

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ['sphinx.ext.autodoc',
              'sphinx.ext.mathjax',
              'sphinx.ext.viewcode',
              'sphinx.ext.napoleon',
              'sphinx.ext.intersphinx',
              ]

# Add mappings
intersphinx_mapping = {
   'numpy': ('https://numpy.org/doc/stable', None),
   'python': ('http://docs.python.org/3', None),
}

To make links we use back-tics around the module/function/class you want to reference:

PYTHON

def generate_positions(ref_ra='00:42:44.3',
                       ref_dec='41:16:09',
                       radius=1.,
                       nsources=1000):
    """
...

    Returns
    -------
    ra, dec : :obj:`numpy.array`
        Arrays of ra and dec coordinates in degrees.
    """

We can also link to other functions/classes within our module using :func:`skysim.sim.generate_positions`.

Creating additional documentation

By default Sphinx can render any document that is in reStructuredText format. In the docs/source directory you’ll see a file index.rst which has been auto-generated by Sphinx. This file will contain some content and directives for generating content.

Initially the table of contents is empty and not shown on the main page, and the left pane just shows the section headers for the current page. The content on the main page has been generated from all the docstrings in your module(s). In order to include additional documentation you can add your own content pages and link to them from here.

Create a new file called docs/source/example.rst, and then add some content. Save the file. Now edit the docs/source/index.rst to link to this page by making the following change (last line):

OUTPUT

Welcome to MyProject's documentation!
=====================================

.. toctree::
   :maxdepth: 2
   :caption: Contents:

   example

You can now build an entire website using this strategy. You can link to pages using :ref:`page_name`, where page_name is the name of the file/page.

Using markdown

If you don’t like rst, and would prefer markdown you can write markdown for your custom pages. This is particularly useful if you already have pages in markdown format (e.g. from an existing wiki).

The extension to use is a third party extension called myst_parser. We can install it with pip install myst-parser, and then enable it in the conf.py as follows:

PYTHON

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ['sphinx.ext.autodoc',
              'sphinx.ext.mathjax',
              'sphinx.ext.viewcode',
              'sphinx.ext.napoleon',
              'sphinx.ext.intersphinx',
              'myst_parser',
              ]

# mappings for parsing files
source_suffix = {'.rst':'restructuredtext',
                 '.md':'markdown'}

The source_suffix provides sphinx with a way to determine what format to use when parsing the different file types.

Summary

You should now be able to create documentation from the docstrings of your module, and from additional .rst or .md files that provide more of a user guide style.

Once you have a set of documentation that you are happy with, you can host them at readthedocs.org. See the tutorial for instructions on how to do this.

Content from Automation and Continuous Integration (on GitHub)

Last updated on 2026-02-26 | Edit this page

Estimated time: 180 minutes

Overview

Questions

What is CI?
How can I use GitHub to test my code?
How do automatically build documentation?

Objectives

Become familiar with GitHub actions
Use a github action to test your code
Use a github action to build documentation

Using GitHub to run your tests

In this lesson we’ll learn how to have all our tests run every time we push changes to GitHub. This is an example of a continuous integration (CI) workflow. CI workflows are available via third party services such as Travis-CI, which you can link to your GitHub, GitLab, or Bitbucket repositories to build, test, and report on your projects. For this lesson we’ll use GitHub actions to perform the testing.

GitHub actions

GitHub Actions makes it easy to automate all your software workflows, now with world-class CI/CD. Build, test, and deploy your code right from GitHub. Make code reviews, branch management, and issue triaging work the way you want. GitHub.com

GitHub actions can perform a large variety of automation of tasks, but for this lesson we will focus only on the ability to install our module and run the tests. All of the actions that are associated with your project are stored in text files within your project. This means that you can edit the actions locally and then push them like any other file.

The actions are stored in a special directory called ./github/workflows/, and the files are stored in the .yml format.

Creating a new action

To create a new action the easiest way is to use the Actions tab of your GitHub repo, and select the New Workflow button.

The new workflow button on the actions page of your github repository

The New workflow button will take you to a page to choose a template from a long list. This list is semi-smart in that it will look at the files that you have in your repository and suggest a templates based on the language(s) that you are using. The first template that shows up for our example repository is “Python Package using Anaconda” which sounds like what we want so we’ll start with that. Note that you can “set up a workflow yourself” using the small link (this will begin with a blank workflow). Also note that the templates are hosted in GitHub repositories so that you can browse them and copy ideas from one to another. For the “Python Package using Anaconda” the template is in the actions/starter-workflows repository.

Python workflow templates available on Github

After selecting a template we’ll go to the online editor for committing a new file on GitHub. The file (./github/workflows/python-package-conda.yml) is currently as follows:

YML

name: Python Package using Conda

on: [push]

jobs:
  build-linux:
    runs-on: ubuntu-latest
    strategy:
      max-parallel: 5

    steps:
    - uses: actions/checkout@v2
    - name: Set up Python 3.10
      uses: actions/setup-python@v2
      with:
        python-version: 3.10
    - name: Add conda to system path
      run: |
        # $CONDA is an environment variable pointing to the root of the miniconda directory
        echo $CONDA/bin >> $GITHUB_PATH
    - name: Install dependencies
      run: |
        conda env update --file environment.yml --name base
    - name: Lint with flake8
      run: |
        conda install flake8
        # stop the build if there are Python syntax errors or undefined names
        flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
        # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
        flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
    - name: Test with pytest
      run: |
        conda install pytest
        pytest

The first two lines give the name of the action which is just for your own reference, and an on directive. The on directive will trigger this action when [push] happens. This is what we want.

The main part of the action is the jobs: section, which can specify multiple jobs. Currently there is just one job with the name build-linux. Within this job there are multiple steps that are run.

The template is mostly good however:

since this is the Anaconda template it uses conda for all the build/test. However Anaconda is slow to create new environments and install software;
there is a bug in this template: the python-version should be a string, not a float;
we don’t want to run the lint step.

We will replace conda with pip and can fix the other issues by updating the script to be as follows:

YML

name: Build and Test Python package using pip

on: [push]

jobs:
  build-linux:
    runs-on: ubuntu-latest
    strategy:
      max-parallel: 5

    steps:
    - uses: actions/checkout@v2
    - name: Set up Python "3.6"
      uses: actions/setup-python@v2
      with:
        python-version: "3.6"
    - name: Install dependencies
      run: |
        # upgrade pip
        python -m pip install --upgrade pip
        # install the testing requirements
        python -m pip install pytest pytest-cov
        # install the dependencies for this
        pip install -r requirements.txt
    - name: Install module
      run: |
        python -m pip install -e .
    - name: Test with pytest
      run: |
        pytest --cov=skysim --cov-report=term

Since we are now not using conda we should also change the name of the script to be ./github/workflows/python-build-test-pip.yml before we save it. Once we have made these changes and are happy we can press the green “Start commit” button where we’ll be asked for a commit message, and whether we want to commit to the main branch or to a new branch. Lets commit directly to the main branch.

Once we make the commit the action will run. This is because adding/changing a file via the GitHub webpage is considered the same as a push action, which will trigger our workflow.

Viewing the progress/result of an action

To see what actions have been run and why, we navigate to the “Actions” tab. Here you’ll see all the actions that have ever been triggered for your repository, how they were triggered, what the status is, how long it took to run, and what artifacts (if any) were produced. Below is an example showing the completed action that we just created above.

A completed workflow on the actions page

On the left, we see all the jobs that were part of this action, and on the right we have a summary for each. We want to see green ticks as this means that everything is ok. A red X will mean there was a problem somewhere. Either way we can click on one of the jobs to see the different steps of the job, and clicking the “>” next to a step will show all the details of that step. Below we can see the output of the “test with pytest” step.

If an action fails then the repository owner, and anyone watching the repository, will get a notification of the failure. When the action succeeds for the first time after a failure a notification will be sent, but if the action is a success following a success no further notifications will be sent. This means that you can push changes to your repo and you wont be bothered unless there is a problem.

Summary

We have seen how to setup a basic install-test workflow for a python project using a GitHub action. Once you have a working action you can extend it, create additional actions, or change how it triggers. See the documentation for more details and some examples.

Using GitHub to compile your documentation

In the previous lesson we learned how to use GitHub actions to build and test our python module. In this lesson we’ll learn how to compile our documentation with another GitHub action.

Creating a new workflow

To create a new action the easiest way is to use the Actions tab of your GitHub repo, and select the New Workflow button.

The new workflow button on the actions page of your github repository The New workflow button will take you to a page to choose a template from a long list. Last time we started with a pre-made template for building and testing python code. This time we are going to start with a generic template so we should click on the “set up a workflow yourself ->” link.

We should see the following template appear under the file name of .github/workflows/main.yml:

YML

# This is a basic workflow to help you get started with Actions

name: CI

# Controls when the workflow will run
on:
  # Triggers the workflow on push or pull request events but only for the main branch
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

  # Allows you to run this workflow manually from the Actions tab
  workflow_dispatch:

# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
  # This workflow contains a single job called "build"
  build:
    # The type of runner that the job will run on
    runs-on: ubuntu-latest

    # Steps represent a sequence of tasks that will be executed as part of the job
    steps:
      # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
      - uses: actions/checkout@v2

      # Runs a single command using the runners shell
      - name: Run a one-line script
        run: echo Hello, world!

      # Runs a set of commands using the runners shell
      - name: Run a multi-line script
        run: |
          echo Add other actions to build,
          echo test, and deploy your project.

First up we should change the filename to something other than main.yml and the name field to something other than CI. The section that controls when the workflow is run is maybe a little over-zealous but we can leave it for now. Finally, the first step which uses actions/checkout@v2 is always useful as it will check out our repository into the build environment. The named steps after this are not currently useful so we should delete them. We’ll replace these with a new step which does the documentation build for us.

Using a template from the market place

Since we want to build our documentation using Sphinx, we can find a ready made solution in the GitHub marketplace. On the right panel we select the marketplace and search for Sphinx. The first result “Sphinx Build” looks like it does what we want so we’ll select that.

Choosing the sphinx build workflow from the marketplace When we click the “Sphinx Build” result we get the following snippet of code. Which we can include in our workflow file.

YML

- name: Sphinx Build
  # You may pin to the exact commit or the version.
  # uses: ammaraskar/sphinx-action@8b4f60114d7fd1faeba1a712269168508d4750d2
  uses: ammaraskar/sphinx-action@0.4
  with:
    # The folder containing your sphinx docs.
    docs-folder: # default is docs/
    # The command used to build your documentation.
    build-command: # optional, default is make html
    # Run before the build command, you can use this to install system level dependencies, for example with "apt-get update -y && apt-get install -y perl"
    pre-build-command: # optional

To use the above template we make a new step called “Sphinx Build” and copy the template into that step. As we do this we need to fill in values for docs-folder and build-command. The pre-build-command is optional and we wont use it so we can either delete it or comment it out. Our step section now looks like this:

YML

    # Steps represent a sequence of tasks that will be executed as part of the job
    steps:
      # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
      - uses: actions/checkout@v2

      - name: Sphinx Build
        # You may pin to the exact commit or the version.
        uses: ammaraskar/sphinx-action@0.4
        with:
          # The folder containing your sphinx docs.
          docs-folder: docs/
          # The command used to build your documentation.
          build-command: make html

If we commit this file it will trigger all our on:push workflows to trigger – both the build/test workflow that we created earlier, and this documentation workflow. For my example I used the name “Documentation with Sphinx” for the workflow, and when it completes I see the following output.

Successfully run documentation workflow The good news is that the documentation built without error. The bad news is that I can’t see the documentation to ensure that it’s actually useful. Note in the above image that there are three notes following our job: Status, Total duration, and Artifacts. Artifacts are files that are retained after a job completes. By default there are no artifacts, but we could modify our workflow so that the documentation is part of the artifacts.

Saving the documentation

To save the documentation that was built we need to let the GitHub workflow know that the documentation should be considered an artefact of the build. The advanced guide for GitHub actions demonstrates how to do this using another pre-made recipe which is as follows:

YML

  - name: 'Upload Artifact'
    uses: actions/upload-artifact@v2
    with:
      name: my-artifact
      path: my_file.txt

For us we want the artifact to be a directory docs/build/ and we’ll give it a name of documentation-html.

Our full workflow for this lesson now looks like this:

YML

# This is a basic workflow to help you get started with Actions

name: Build Documentation with Sphinx

# Controls when the workflow will run
on:
  # Triggers the workflow on push or pull request events but only for the main branch
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

  # Allows you to run this workflow manually from the Actions tab
  workflow_dispatch:

# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
  # This workflow contains a single job called "build"
  build-docs:
    # The type of runner that the job will run on
    runs-on: ubuntu-latest

    # Steps represent a sequence of tasks that will be executed as part of the job
    steps:
      # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
      - uses: actions/checkout@v2

      - name: Sphinx Build
        # You may pin to the exact commit or the version.
        uses: ammaraskar/sphinx-action@0.4
        with:
          # The folder containing your sphinx docs.
          docs-folder: docs/
          # The command used to build your documentation.
          build-command: make html

      - name: Archive documentation
        uses: actions/upload-artifact@v2
        with:
          name: documentation-html
          path: docs/build/

If we save and wait for the triggered workflow to complete we should see that the artifacts field is now 1, and that there is a new section on our page which shows all the artifacts that were produced.

A successful workflow which has produced and artifact We can click on the documentation-html link to get a zip of the docs/build/ directory which contains all our documentation.

Summary

Building documentation requires a similar workflow to building and testing code.

The GitHub marketplace offers a variety of pre-made ‘steps’ to include in your workflow. There are many that will let you build documentation with Sphinx.

When building documentation we need to ensure that the documentation that was built is considered part of the workflow ‘artifact’ so that we can download and view it after the workflow completes.

Discussion

SOLO Activity: GitHub actions

In this activity you will apply the previous lessons to automate the documentation and testing using GitHub actions. This activity can be completed on your own solo project, or as part of a group project. The requirements are that the project has a python module that can be installed, has at least one function with a docstring, and at least one test function.

For your software project create a GitHub action that will build and test your python module.
1. “Build” in this case means install the required dependencies and then install your module using pip install -e .
2. The test phase can be carried out using either a test suite such as pytest, or by running your test script directly. If you are using a personal test script then you should ensure that it will exit with status != 0 when the tests fail, so that the build/test workflow will also report a failure.
Use the GitHub online editor to create/modify the relevant .yml files and set on status to be push so that the workflow will run each time that you make changes to the .yml file.
Once you have a working build/test workflow, create a documentation workflow.
Use the build/test workflow as a template and remove the test phase and replace it with a documentation build.
Modify the documentation step such that it will produce an artefact which is the contents of the documentation directory.
Verify that your documentation has completed properly by downloading and viewing the workflow artefact.

Content from Sharing Code To The Wider Community

Last updated on 2026-02-27 | Edit this page

Estimated time: 180 minutes

Overview

Questions

Why should I share my code?
Where and how can I share my code?
How can I make my python code easy to obtain/install?

Objectives

Have confidence to share your code
Make your code findable and accessible

‘Publishing’ code

Once you have code that you are happy to share among collaborators you should consider publishing this code.

What do we mean by publishing? From least to greatest effort, any of the following could be considered as a definition for publishing code:

Copying a version of your code to a public website for others to find and use,
Making your version control repository (github, gitlab, bitbucket) public so that others can use and reuse your code,
Uploading your code to a repository such a pypi.org so that others can easily download/install your software,
Archiving a version of your code to a doi minting / storage service such as zenodo.org,
Registering your code on a site such as ascl.net so that others can find your code,
Writing a paper describing an application of your code and submitting for peer review in a science focused journal such as PASA or MNRAS,
Writing a companion paper that describes your code and submitting the paper and code for peer review in a software focused journal such as A&C or JOSS. From the descriptions above you can see that the different options have slightly different intentions and audiences. We can take an lead from FAIR principles for data, and apply these principles to code.

Findable

Make code findable by creating a persistent identifier (eg doi) and including metadata. On pypi/zenodo/github you can use tags or topics to identify the software language but also the area of research or methodology that is being used. This makes it easier for people to find code that will suit their needs.

Accessible

Make code accessible by providing source code, install instructions, and documentation. Testing code on a range of platforms will also increase the accessibility of the code.

Interoperable

Use standard project templates, coding styles and idioms, and a modular design to allow your code to be used as part of a larger workflow or as a component of another product. This interoperability is useful for others, but will also make it easier for you to build on your own existing solutions.

Reusable

Make code reusable by providing a license, and indicating

Reasons not to publish

The following are often given as reasons not to publish code:

Reason	Counter argument
I don’t want to have to “support” my code.	Publishing code does not commit you to providing endless support. If you do not intend to reply to emails, fix bugs, or make updates to the code, simply say so in the README.md file. If this is the case you could invite keen users to fork the repo and provide their own fixes.
I don’t want people to steal my good work.	With an appropriate license and attribution request, you can let others use your work while you benefit from their reuse. If there is a paper that describes or uses the code, you can ask for it to be cited by others, and this will increase the impact of that paper (and your h-index!).
My code is a bit hacky and I don’t want others to see it.	Being embarrassed about less-than-perfect code is normal. However, a quick scan of GitHub will show you that hacked-together code is very common even among professional developers. If your code serves it’s intended purpose then it’s good enough to share. Consider writing a short blurb in the README.md file that clearly state the intended aim of the code, so that you can manage the expectations of your users.
I don’t know how to share my code.	It’s not difficult to share code, and it’s easy to learn. This course is one of many (MANY) that take you through the steps of sharing or publishing code. The small investment to learn how to share code will pay off quickly when you start to discover and use code written by others, get feedback and recognition for your code, or when your computer dies and you need to recover your work.
My code contains sensitive IP that I’m not allowed to share.	Good argument! Keep it secure some place. Many of the steps that you would take to prepare your data for publication are still worth doing to make your code usable within your trusted network.

Obtaining a doi from Zenodo.org

You have a piece of code on github, but it is changing over time, and you would like to provide a link to a particular version of the code. This is important for reproducibility of your research work, both for yourself and others.

The Zenodo repository provides a safe, trusted, and citable place to host your code. Zenodo is primarily focused on the storage of data, but this includes: documentation, papers, posters, raw or processed data, source code, and compiled binaries. Zenodo will allow you to version your data but does not provide a version control system such as git. However, Zenodo and Github are friends so you can link them together to get the best of both worlds.

You can create a new Zenodo account using an email address and password, or you can use your Github or ORCID accounts to login. Whatever you choose, you can still link your github/ORCID later and use them to sign in.

Create a new repository

Once signed in click on the upload button at the top of the page, and then on the next page click “New Upload”

The following page will have a lot of details, some of which are mandatory, but most of which are either recommended or optional. Begin by downloading a .zip of your files from Github, and then uploading it to Zenodo. Press the green “start upload” button and then start filling out the rest of the form.

As the upload is progressing you can fill in the upload type (Software) and basic information.

Leave the DOI blank, but click the “reserve DOI” button so that you can know what the final DOI will be.

Fill in the remainder of the form and then press “save” at the top of the page, this will make a draft of your repository that you can come back to later and update. When you are finally happy with all the details you can press ‘publish’.

Once your upload has been published you should navigate to the published repository in your uploads list, and select it.

An example doi for published software on zenodo

The above example is for the Aegean source finding software that I developed. You can see the DOI badge with the full DOI, a link to supplementary material, and the licence. If you click the DOI badge you’ll get a new pane that shows you how to embed this information into a markdown file such as your README.md that you have on your github page!

Below this panel you can also see a box that allows people to cite your code. There is even a box that allows people to get the citation in any format that they need it.

You now have a version of your code which is archived on zenodo and will not change. If you want to update the archive with new versions of the code, Zenodo has the capacity to do this, and will mint a new doi for each version. It is recommended that you don’t make a new doi for every small change you make to your code. A new version for each major or minor version change would be appropriate, or when you have published work that used a particular major/minor/patch version of the code.

Sadly Zenodo is not indexed by ADS so you can’t track citations very well from here.

Registering your code on ascl.net

In the last lesson we saw how to obtain a DOI by uploading a software project to Zenodo. In this lesson we’ll make an entry in the Astrophysics Source Code Library (ASCL.net), which is indexed by ADS, and can help you gain an audience, and track citations.

Submit a code to ASCL

Go to the “submit a code” page here.

The page asks for a title, credit (authors/contributors), abstract, and a site list for the code.

The site list should be a list of links to places where people can obtain the code. I highly recommend that you put a link to both your Github and Zenodo repositories. If the code was described in a paper you can put that in the “Preferred Citation Method” section.

ASCL.net does not store your code. No one will vet the quality of your code. ASCL.net is simply a place to register that some code exists and that you’d like to be acknowledged for creating/contributing. ASCL.net is indexed by ADS so it will get a bibcode in ADS, which can then be used to generate a bibtex entry for people to use when citing your code. Another goal of ASCL.net is to make it easier for people to find your code in the first place. If you haven’t explored the code available here I recommend that you do so now – there are some gems.

You can also get a nice little badge from ASCL to add to your README.md file!

Installing your module via setup.py

In our code directory we specified a requirements.txt file that allowed users to easily installed the dependencies for our code. However, if someone wants to run our main script (sim_catalog) then they have to be in the code directory. If we want to run the code from some other location on our system then it won’t work. The reason is that we haven’t installed our code as a python module.

Installing a python module will do the following:

Copy the module directory and files (eg skysim/*) to a central location so that python can access them no matter where it’s run from,
- something like /home/${USER}/.py3/lib/python3.8/site-packages/;
Copy and scripts (eg sim_sky) to a similar location so that they can be invoked from anywhere,
- something like /home/${USER}/.py3/bin/;
Make a note that the module is installed,
- so that pip freeze will report the name/version of the software.

In order to install a python module you need a special file called setup.py.

Template for `setup.py`

The python documentation covers all the gory details of the how and why of using a setup.py file. However, a great place to start is to use a template so we’ll provide one here to get started.

PYTHON

import setuptools
import skysim

with open("README.md", "r", encoding="utf-8") as fh:
    long_description = fh.read()

with open('requirements.txt', 'r') as fh:
    reqs = [line for line in fh.readlines() if not line.startswith('#')]

setuptools.setup(
    name="SkySim",
    version=skysim.__version__,
    author=skysim.__author__,
    author_email="author@example.com",
    description="Simulate sky locations",
    long_description=long_description,
    long_description_content_type="text/markdown",
    url="https://github.com/DevOne/sky_sim",
    scripts=['scripts/sim_catalog'],
    python_requires=">=3.6",
)

Note the following:

we import the skysim module so that we can read the __version__ and __author__ information directly,
we populate the long_description by reading the README.md file,
we read the requirements from the requirements.txt file,
we have set a minimum python version for this program, something that we can’t do with a requirements.txt file,
we have moved the script sim_catalog from the root directory into the scripts/ directory.

The above notes are in keeping with the good coding practice of not repeating ourselves. All the information is stored in a single location and duplication is minimised.

Installing the module

For someone to install our software they should do one of the following:

BASH

git clone git@github.com:<gituser>/<projectname>.git
pip install .

or just download a .zip file from github, unzip it, and then run pip install . in the same directory.

The final ‘.’ indicates that pip should install the module defined in this directory. Pip will search for a setup.py file for the required information.

Uninstall the module

A user can uninstall the module from anywhere by running:

BASH

pip uninstall SkySim

Developer mode

As a developer it is annoying to have to uninstall/install your module every time you make a change and want to check that things are working. Pip has a nice solution to this which is a developer mode install. Unlike a regular install, the developer mode will not copy files to some python directory, but make symlinks instead. This means that your changes to the files will be immediately used in the installed version of the code. If you move or add files however, you’ll need to uninstall/install the code again.

A developer mode install can be done using the -e flag for pip:

BASH

pip install -e .

Upgrading

If users want to upgrade the module they have to download or pull the new version, uninstall the old one, and then install the new one. It can be a little tedious and easy to forget. Luckily there is a python package index pypi.org which pip can look to in order to find different versions of your software. It is thanks to pypi.org that pip knows how to install all the modules that we listed in our requirements.txt file. We’ll explore the python package index in the next lesson.

Hosting your code on pypi.org

The python package index at pypi.org is the first place that pip will look when asked to install a package. If your package is avaialble via pypi.org then your users can install your software simply by typing pip install <package>. Installing from other locations is still possible and sometimes easy. In this lesson we’ll go through the process of preparing your code and uploading it to pypi.org. This content is based on the instructions at Packaging Python Projects (PyPA), which you should also refer to.

Name check

Unlike GitHub where the project name is disambiguated based on the project owner, pypi requires that each package have it’s own unique name. So before you think about uploading your package to pypi.org you should do a search to see if your package name is already in use, and choose a new one if this is the case.

Once you have a name you like you should google it. This is what many people will do when looking for your code and if the results that come back are for a different software project (e.g. same name different language), or are NSFW then people will be less likely to keep looking.

You can choose nearly anything you like as the package name, however PEP423 has some advice about how you might choose and structure your package name.

Project structure

If you use the structure outline in Beginning a new software project, and have created a setup.py file as described in Installing your module via setup.py, then you are good to go. The project structure recommended in these lessons differs from that which is given on the PyPA site but will still work.

Project metadata

In addition to the meta data that we have already included in the setup.py file we can include information that has special meaning on the pypi site. This information is used to populate many of the links on the landing page for a package. An example for the numpy package is shown below:

The information in the project description is rendered from the long_description metadata using the long_description_content_type to inform how the rendering should be done. Many projects choose to copy their README.md file into this field (as we did in a previous lesson), however you can of course create a custom description for a pypi audience.

The information in the project links section is drawn from the project_urls dictionary. You can have any description/links you like here. Some of the obvious ones get appropriate icons but these cannot be customised.

Another sectionson the pypi landing page that you may want to populate is “Classifiers”. For numpy they are shown below.

The classifiers for numpy as shown on pypi.org

The classifiers section helps people see additional metadata in a regular format, and also allows people to search for other projects with similar classifiers. Classifiers are specified as a list of strings in the classifiers parameter within setup.py. These classifiers can be changed for each release of a project. A list of classifiers can be found at pypi.org/classifiers/. It is a long list, so maybe the first point of call would be to look at a package like numpy or astropy that you think would have similar classifiers and copy the relevant ones into your package.

Building a distribution package

PyPA provides a package simply called build that will create a package from your project. Install it using python3 -m pip install --upgrade build, and then run it using python3 -m build. It will create a directory called dist/ which contains a .tar.gz file (your source archive) and a .whl file which is the build distribution. Congratulations, you now have a package that can be served by pypi – you just need a way to upload it.

Uploading to pypi

The first thing that you need is to create an account on pypi.org. Accounts are free and easy to make so go sign up here.

The second thing you’ll need is an API token. As noted on PyPA:

To securely upload your project, you’ll need a PyPI API token. Create one at https://test.pypi.org/manage/account/#api-tokens, setting the “Scope” to “Entire account”. Don’t close the page until you have copied and saved the token — you won’t see that token again.

https://packaging.python.org/en/latest/tutorials/packaging-projects/

Uploading to pypi is handled by the twine module which can be installed using python3 -m pip install --upgrade twine.

To upload your package use:

BASH

python3 -m twine upload dist/*

When prompted for a username you should use __token__ and for your password you should use pypi-<API_Token_Value>.

If you want to test that your upload will work and see what the page would look like without placing it on the main package index you can use the test repository. To do this just add --repository testpypi after twine in the above command. When you upload to the test repository you can see the results at https://test.pypi.org/project/example-pkg-YOUR-USERNAME-HERE. This test site is good to use for the first time you are uploading as you’ll invariably want to make some small changes once you’ve seen how the landing page renders.

Congratulations, you now have a package published in the python package index. For more details and some alternatives to the method given above you are referred to the PyPA tutorial.

All in One View

Background

Assumed knowledge and software requirements

Overview

Questions

Objectives

Beginning a new software project

Organisation

Put each project in its own directory, which is named after the project.

OUTPUT

Name all files to reflect their content or function.

Starting our project

Project evolution

Example project

Example project: an astronomy catalogue simulator

SOLO Activity: Project structure

Proof of concept code

PYTHON

Making code reusable via functions

PYTHON

PYTHON

PYTHON

Creating a command line interface

PYTHON

PYTHON

PYTHON

PYTHON

PYTHON

OUTPUT

PYTHON

Creating a simple testing script

BASH

BASH

Summary

Overview

Questions

Objectives

Benefits and pit-falls

Benefits of working in a team

Pit-falls to avoid

Overheads that cannot (should not) be avoided

Summary

Communication and project management

Communication

Project management styles

Waterfall

Agile

Popular Agile methodologies

Scrum

Kanban

Summary

Member roles

Roles

GROUP Activity: Who does what and when?

Create project standards and expectations

Sections to consider for CONTRIBUTING.md

Creating a GitHub repository

Create an account on Github

Create a new repository on Github

BASH

BASH

Issue tracking on GitHub

Overview

Creating an issue

Questions

Bug reports

Feature requests

Pull requests

Summary

GROUP Activity: Create, discuss, and resolve an issue on GitHub

Branching and development

Feature branching

Setting up branches

An example development cycle for fixing a bug

Reporting the issue

Confirming the issue

Create a feature branch for this issue

PYTHON

Writing a test

PYTHON

Sections to consider for `CONTRIBUTING.md`