10 R code

Everyone has their own coding style and formats. There are however some best practice guidelines that Bioconductor reviewers will look for (see coding style).

There are also some other key points, detailed in the following sections.

10.2R CMD check and BiocCheck

Many common coding and sytax issues are flagged in R CMD check and BiocCheck() (see the R CMD check cheatsheet and BiocCheck vignette.

Some of the more prominent offenders:

• Use vapply() instead of sapply(), and use the various apply functions instead of for loops.
• Use seq_len() or seq_along() instead of 1:....
• Use TRUE and FALSE instead of T and F.
• Use of numeric indices (rather than robust named indices).
• Use is() instead of class() == and class() !=.
• Use system2() instead of system().
• Do not use set.seed() in any internal code.
• Do not use browser() in any internal code.
• Avoid the use of <<-.
• Avoid use of direct slot access with @ or slot(). Accessor methods should be created and utilized
• Use <- instead of = for assigning variables.

10.2.1 Formatting and syntax

• Function names should be camelCase or utilize the underscore _ and not have a dot . (which indicates S3 dispatch).
• Use dev.new() to start a graphics drive if necessary. Avoid using x11() or X11(), for it can only be called on machines that have access to an X server.
• Use the functions message(), warning() and error(), instead of the cat() function (except for customized show() methods). paste0() should generally not be used in these methods except for collapsing multiple values from a variable.

10.2.2 Re-use of functionality, classes, and generics

Avoid re-implementing functionality or classes (see also The DESCRIPTION file). Make use of appropriate existing packages (e.g., biomaRt, AnnotationDbi, Biostrings, GenomicRanges) and classes (e.g., SummarizedExperiment, AnnotatedDataFrame, GRanges, DNAStringSet) to avoid duplication of functionality available in other Bioconductor packages. See also Common Bioconductor Methods and Classes.

This encourages interoperability and simplifies your own package development. If a new representation is needed, see the Essential S4 interface section of Robust and Efficient Code. In general, Bioconductor will insist on interoperability with Common Classes for acceptance.

Developers should make an effort to re-use generics that fit the generic contract for the proposed class-method pair i.e., the behavior of the method aligns with the originally proposed behavior of the generic. Specifically, the behavior can be one where the return value is of the same class across methods. The method behavior can also be a performant conceptual transformation or procedure across classes as described by the generic. BiocGenerics lists commonly used generics in Bioconductor. One example of a generic and method implementation is that of the rowSums generic and the corresponding method within the DelayedArray package. This generic contract returns a numeric vector of the same length as the rows and is adhered to across classes including the DelayedMatrix class. Re-using generics reduces the amount of new generics by consolidating existing operations and avoids the mistake of introducing a “new” generic with the same name. Generic name collisions may mask or be masked by previous definitions in ways that are hard to diagnose.

10.2.3 Methods development

We encourage maintainers to only create new methods for classes exported within their packages. We discourage the generation of methods for external classes, i.e., classes outside of the package NAMESPACE. This can potentially cause method name collisions (i.e., where two methods defined on the same object but in different packages) and pollute the methods environment for those external classes. New methods for established classes can also cause confusion among users given that the new method and class definition are in separate packages.

10.2.4 Functional programming

Avoid large chunks of repeated code. If code is being repeated this is generally a good indication a helper function could be implemented.

10.2.5 Function length

Excessively long functions should also be avoided. Write small functions.

It is best if each function has only one job that it needs to do. And it is also best if that function does that job in as few lines of code as possible. If you find yourself writing great long functions that extend for more than a screen, then you should probably take a moment to split it up into smaller helper functions.

Smaller functions are easier to read, debug and to reuse.

10.2.6 Function arguments

Argument names to functions should be descriptive and well documented. Arguments should generally have default values. Check arguments against a validity check.

10.2.7 Vectorization

Vectorize!

Many R operations are performed on the whole object, not just the elements of the object (e.g., sum(x) instead of x[1] + x[2] + x[2] + ...). In particular, relatively few situations require an explicit for loop. See the Vectorize section of Robust and Efficient Code for additional detail.

10.2.8 Web resources

Follow guiding principles on Querying Web Resources, if applicable.

10.2.9 Parallelisation

For parallel implementation please use BiocParallel. See also the Parallel Recommendations section of Robust and Efficient Code.

A minimal number of cores (1 or 2) should be set as a default.

10.2.10 File caching

Files downloaded should be cached. Please use BiocFileCache. If a maintainer creates their own caching directory, it should utilize standard caching directories tools::R_user_dir(package, which="cache"). It is not allowed to download or write any files to a users home directory or working directory. Files should be cached as stated above with BiocFileCache (preferred) or R_user_dir or tempdir()/tempfile() if files should not be persistent.