23 Workflow Packages

The main focus of a workflow package is the vignette!

23.1 What is a workflow vignette?

Workflow vignettes are documents which describe a bioinformatics workflow that involves multiple Bioconductor packages. These workflows are usually more extensive than the vignettes that accompany individual Bioconductor packages.

Existing Workflows

Workflow vignettes may deal with larger data sets and/or be more computationally intensive than typical Bioconductor package vignettes. For this reason, the automated builder that produces these vignettes does not have a time limit (in contrast to the Bioconductor package building system which will time out if package building takes too long). It is expected the majority of vignette code chunks are evaluated.

23.2 Who should write a workflow vignette?

Anyone who is a bioinformatics domain expert.

23.3 How do I write and submit a workflow vignette?

  • Write a package with the same name as the workflow. The workflow vignette written in Markdown, using the rmarkdown package should be included in the vignette directory. You may include more than one vignette but please use useful identifying names.

  • The package does not need man/ or R/ directories nor a data/ directory as ideally workflows make use of existing data in a Bioconductor repository or on the web; the workflow package itself should not contain large data files.

  • In the DESCRIPTION file, include the line “BiocType: Workflow.” Please also include a detailed Description field in the DESCRIPTION file. The DESCRIPTION file should contain biocViews which should be from the Workflow branch. If you think a new term is relevant please reach out to .

  • Submit the package to the GitHub submission tracker for a formal review. Please also indicate in the tracker issue that this package is a workflow.

  • Workflows are git version controlled. Once the package is accepted it will be added to our git repository at and instructions will be sent for gaining access for maintainence.

23.4 Consistent formatting

  • In an effort to standardize the workflow vignette format, it is strongly encouraged to use either BiocStyle for formatting or utilize BiocWorkflowTools. The following header shows how to use BiocStyle in the vignette:

    output:
        BiocStyle::html_document
  • The following should also be include

    - author affiliations
    - a date representing when the workflow vignette has been modified
  • The first section should have some versioning information. The R version, Bioconductor version, and package version should be visible. The following is an example of how this could be achieved:

    **R version**: `r R.version.string`
    **Bioconductor version**: `r BiocManager::version()`
    **Package**: `r packageVersion('annotation')`
  • An example start to a workflow vignette:

---
title: Workflow Vignette Title
author:
 - name: Workflow Author
   affiliation: Workflow Author Affiliation
date: `r format(Sys.time(), '%B %d, %Y')`
vignette: >
  %\VignetteIndexEntry{Workflow Introduction}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
output:
    BiocStyle::html_document
---

# Version Info

```{r, include=TRUE, results="hide", warning=FALSE, message=FALSE}
library('BiocGenerics')
```

**R version**: `r R.version.string`
**Bioconductor version**: `r BiocManager::version()`
**Package**: `r packageVersion('BiocGenerics')`

23.5 Tidying package loading output

Most workflows load a number of packages and you do not want the output of loading those packages to clutter your workflow document. Here’s how you would solve this in RMarkdown. In Latex, you may have to create two code chunks where one is evaluated and hidden and the other is not and shown.

For RMarkdown, set up a code chunk with the following chunk options so that the code is visible but the results are hidden. We also set warning=FALSE and message=FALSE to be sure that no output from this chunk ends up in the document:

```{r, include=TRUE, results="hide", warning=FALSE, message=FALSE}
library(GenomicRanges)
library(GenomicAlignments)
library(Biostrings)
library(Rsamtools)
library(ShortRead)
library(BiocParallel)
library(rtracklayer)
library(VariantAnnotation)
library(AnnotationHub)
library(BSgenome.Hsapiens.UCSC.hg19)
library(RNAseqData.HNRNPC.bam.chr14)
```

23.6 Citations

To manage citations in your workflow document, specify the bibliography file in the document metadata header.

bibliography: references.bib

You can then use citation keys in the form of @label to cite an entry with an identifier “label.”

Normally, you will want to end your document with a section header “References” or similar, after which the bibliography will be appended.

For more details see the rmarkdown documentation.

23.7 Questions

If you have any questions, please ask on the bioc-devel mailing list.