1. What is Version Control

https://phdcomics.com/comics.php?f=1531

Not this.

1.1 What is Version Control?

Version control, also known as source control, is the practice of tracking and managing changes to software code. Version control systems are software tools that are used to help individuals and software development teams manage changes to code/documents over time.

Consider the following scenario. You and your colleagues have been asked to do an analysis of the Work Environment Survey study to determine if workplace satisfaction has increased or decreased significantly since the start of the pandemic. You and your colleagues want to be able to work on the project at the same time, but have run into problems doing this in the past. If you take turns, each of you will spend a lot of time waiting for the other to finish, but if you work on your own copies and save them to the local network, things will be lost, overwritten, or duplicated. A colleague suggests using version control to manage your work.

We’ve all been in this situation before: it seems unnecessary to have multiple nearly-identical versions of the same document. Some word processors let us deal with this a little better, such as Microsoft Word’s Track Changes and Google Docs’ version history.

Version control systems start with a base version of the document and then record changes you make each step of the way. You can think of it as a recording of your progress: you can rewind to start at the base document and play back each change you made, eventually arriving at your more recent version.

Changes Are Saved Sequentially

Once you think of changes as separate from the document itself, you can then think about “playing back” different sets of changes on the base document, ultimately resulting in different versions of that document. For example, two users can make independent sets of changes on the same document.

Different Versions Can be Saved and Merged

Unless multiple users make changes to the same section of the document - a conflict - you can incorporate two sets of changes into the same base document.

A version control system is a tool that keeps track of these changes for us, effectively creating different versions of our files. It allows us to decide which changes will be made to the next version (each record of these changes is called a commit), and keeps useful metadata about them. The complete history of commits for a particular project and their metadata make up a repository. Repositories can be kept in sync across different computers, facilitating collaboration among different people.

1.2 Why Version Control?

Version control is the lab notebook of the digital world: it’s what professionals use to keep track of what they’ve done and to collaborate with other people.

Version control is better than sharing files back and forth because:

  • Nothing that is committed to version control is ever lost, unless you work really, really hard at it. Since all old versions of files are saved, it’s always possible to go back in time to see exactly who wrote what on a particular day, or what version of a program was used to generate a particular set of results.

  • As we have this record of who made what changes when, we know who to ask if we have questions later on, and, if needed, revert to a previous version, much like the “undo” feature in an editor.

  • When several people collaborate in the same project, it’s possible to accidentally overlook or overwrite someone’s changes. The version control system automatically notifies users whenever there’s a conflict between one person’s work and another’s.

Note

Teams are not the only ones to benefit from version control: lone researchers can benefit immensely. Keeping a record of what was changed, when, and why is extremely useful for all researchers if they ever need to come back to the project later on (e.g., a year later, when memory has faded).

Related Reading

Excuse me, do you have a moment to talk about version control? Jenny Bryan https://peerj.com/preprints/3159/

1.3 What is a Version Control Repository?

In version control systems, a repository is a data structure that stores metadata for a set of files or directory structure. Some of the metadata that a repository contains includes, among other things, a historical record of changes in the repository, a set of commit objects, and a set of references to commit objects, called heads.

Essentially, a version control repository keeps track of a set of files in a project. It is like a regular project folder containing sub folders to organize your work, but with an additional .git folder in which to house metadata. We will go into more detail on how the .git folder and metadata files are created during the workshop.

Using the WES analysis example above, you and your colleagues may choose to structure your version control project like this:

├── wes-analysis
|   ├── data    
|   │   ├── wes-results
|   │   │   ├── wes-results-2019.csv  <-- DON'T WORRY, THESE WON'T BECOME PUBLIC!
|   │   │   ├── wes-results-2020.csv
|   │   │   ├── wes-results-2021.csv
|   │   |   |── wes-results-2022.csv
|   ├── doc
|   ├── ref
|   │   ├── references
|   │   ├── images
|   ├── plots
|   ├── code
|   ├── .git      <------------------------ THIS MAKES IT A GIT REPOSITORY!
|   ├── .gitignore <-------------- FILES listed here are ignored by git (not tracked, e.g., csv files)