4 Important Bioconductor Package Development Features
4.1 biocViews
Packages added to the Bioconductor Project require a biocViews:
field in their DESCRIPTION file. The field name “biocViews” is
case-sensitive and must begin with a lower-case ‘b’.
biocViews terms are “keywords” used to describe a given package. They are broadly divided into four categories, representing the type of packages present in the Bioconductor Project
- Software
- Annotation Data
- Experiment Data
- Workflow
biocViews are available for the release and devel branches of Bioconductor. The devel branch has a check box under the tree structure which, when checked, displays biocViews that are defined but not used by any package, in addition to biocViews that are in use. See also description section
4.1.1 Motivation
One can use biocViews for two broad purposes.
A researcher might want to identify all packages in the Bioconductor Project which are related to a specific purpose. For example, one may want to look for all packages related to “Copy Number Variants”.
During development, a package contributor can “tag” their package with biocViews so that when someone looking for packages (like in scenario 1) can easily find their package.
4.1.2 biocViews during new package development
Visit the ‘devel’ biocViews when you are in the process of adding biocViews to your new package. Identify as many terms as appropriate from the hierarchy. Prefer ‘leaf’ terms at the end of the hierarchy, over more inclusive terms. Remember to check the box displaying all available terms.
Please Note:
Your package will belong to only one part of Bioconductor Project (Software, Annotation Data, Experiment Data, Workflow), so choose only biocViews from that category.
biocViews listed in your package must match exactly (e.g., spelling, capitalization) the terms in the biocViews hierarchy.
When you submit your new package for review , your package is checked and built by the Bioconductor Project. We check the following for biocViews:
Package contributor has added biocViews.
biocViews are valid.
Package contributor has added biocViews from only one of the categories.
If you receive a “RECOMMENDED” direction for any of these biocViews after you have submitted your package, you can try correcting them on your own following the directions given here or ask your package reviewer for more information.
If a developer thinks a biocViews term should be added to the current acceptable list, please email bioc-devel@r-project.org requesting the new biocView, under which hierarchy the term should be placed, and the justification for the new term.
4.2 Common Bioconductor Methods and Classes
We strongly recommend reusing existing methods for importing data, and reusing established classes for representing data. Here are some suggestions for importing different file types and commonly used Bioconductor classes. For more classes and functionality also try searching in biocViews for your data type.
4.2.1 Importing data
- GTF, GFF, BED, BigWig, etc., – rtracklayer
::import()
- VCF – VariantAnnotation
::readVcf()
- SAM / BAM – Rsamtools
::scanBam()
, GenomicAlignments::readGAlignment*()
- FASTA – Biostrings
::readDNAStringSet()
- FASTQ – ShortRead
::readFastq()
- MS data (XML-based and mgf formats) – Spectra
::Spectra()
, Spectra::Spectra(source = MsBackendMgf::MsBackendMgf())
4.2.2 Common Classes
- Rectangular feature x sample data –
SummarizedExperiment
::SummarizedExperiment()
(RNAseq count matrix, microarray, …) - Genomic coordinates – GenomicRanges
::GRanges()
(1-based, closed interval) - Genomic coordinates from multiple samples –
GenomicRanges
::GRangesList()
- Ragged genomic coordinates – RaggedExperiment
::RaggedExperiment()
- DNA / RNA / AA sequences – Biostrings
::*StringSet()
- Gene sets – BiocSet
::BiocSet()
, GSEABase::GeneSet()
, GSEABase::GeneSetCollection()
- Multi-omics data –
MultiAssayExperiment
::MultiAssayExperiment()
- Single cell data –
SingleCellExperiment
::SingleCellExperiment()
- Mass spec data – Spectra
::Spectra()
- File formats – BiocIO
::`BiocFile-class`
In general, a package will not be accepted if it does not show interoperability with the current Bioconductor ecosystem.
4.3 Vignette
Every submitted Bioconductor package should have at least one Rmd (preferred) or
Rnw vignette, ideally utilizing BiocStyle::html_document
as output
rendering. This should include evaluated R package code and a detailed
introduction/abstract section that provides motivation for inclusion in
Bioconductor and when appropriate a review and comparison to existing
Bioconductor packages with similar functionality or scope. See vignette
documentation section for more details.