5 Common Bioconductor Methods and Classes

5.1 Motivation

Bioconductor is a large and diverse project with many packages that provide functionality for a wide range of biological data types and statistical methods. It has a rich set of classes and methods that are widely used across many packages. It is, therefore, important to reuse existing data classes and methods to ensure that packages are interoperable with the rest of the Bioconductor software ecosystem. Central data representations allow users to readily integrate analysis workflows across multiple Bioconductor packages providing a more seamless user experience.

Many classes in Bioconductor are implemented using the S4 object-oriented system in R. The S4 system is particularly well-suited for the representation of complex genomic data structures. The initial motivations to use S4 in Bioconductor were centered around its benefits over other systems such as S3. These benefits include, but are not limited to, formal class definitions, multiple inheritance, and validity checking.

Although Bioconductor promotes the re-use of existing S4 classes to represent genomic data, there are cases where new classes are needed for cutting-edge technologies. In such cases, new classes should be developed, ideally, with open discussion and consideration of the Bioconductor community.

5.1.1 Use Case: Importing data

For developers who import data into their package, it is important to know which packages and methods are available for reuse. The following list provides commonly used packages and their methods to import various data types:

GTF, GFF, BED, BigWig, etc., – rtracklayer ::import()
VCF – VariantAnnotation ::readVcf()
SAM / BAM – Rsamtools ::scanBam(), GenomicAlignments ::readGAlignment*()
FASTA – Biostrings ::readDNAStringSet()
FASTQ – ShortRead ::readFastq()
MS data (XML-based and mgf formats) – Spectra ::Spectra(), Spectra ::Spectra(source = MsBackendMgf::MsBackendMgf())

This list is not exhaustive, and developers are encouraged to initiate dialogue with other community members to identify additional packages and methods that may be useful for their specific use case. We acknowledge that class and method discoverability can be a challenge and we are working to improve this aspect of the Bioconductor project.

5.1.2 Common Classes

The following table, though certainly not exhaustive, provides select classes and constructor functions to represent genomic data:

Data Type	Package and Function	Description
Rectangular feature by sample	SummarizedExperiment `::SummarizedExperiment()`	RNAseq count matrix, microarray, etc.
Genomic coordinates	GenomicRanges `::GRanges()`	1-based, closed interval genomic coordinates
Genomic coordinates (multiple)	GenomicRanges `::GRangesList()`	Genomic coordinates from multiple samples
Ragged genomic coordinates	RaggedExperiment `::RaggedExperiment()`	Ragged (variable length) genomic coordinates
DNA/RNA/AA sequences	Biostrings `::*StringSet()`	DNA, RNA, or amino acid sequences
Gene sets	BiocSet `::BiocSet()`, GSEABase `::GeneSet()`, GSEABase `::GeneSetCollection()`	Collections of gene sets
Multi-omics data	MultiAssayExperiment `::MultiAssayExperiment()`	Data integrating multiple omics assays
Single cell data	SingleCellExperiment `::SingleCellExperiment()`	Single-cell expression and related data
Mass spec data	Spectra `::Spectra()`	Mass spectrometry data
File formats	BiocIO `::BiocFile-class`	Classes for interacting with various biological data file formats

Search biocViews for other classes and methods that may be useful for your package.

5.2 Package Submission Considerations

Bioconductor strives for interoperability across packages, and package submissions are generally not accepted unless they demonstrate such interoperability, typically by reusing existing Bioconductor classes and methods where appropriate. Submissions that introduce new classes or data structures must provide strong justification and clearly describe how they interoperate with existing Bioconductor infrastructure.

In the case where the data does not conform to an existing data class, we recommend discussing the design of a new class with the Bioconductor community. The open discussion can take place on main Bioconductor communication channels such as the bioc-devel mailing list, or the Bioconductor community Slack.

5.3 Package Implementations

The following packages are examples of packages that reuse Bioconductor classes and methods:

package	inherits classes and methods from:
DESeq2	SummarizedExperiment, GenomicRanges
GenomicAlignments	GenomicRanges, Rsamtools
VariantAnnotation	GenomicRanges, SummarizedExperiment, Rsamtools

4 Important Bioconductor Package Development Features

6 The README file