<- function(qmd_file = NULL,
build_my_report pseq_path = NULL,
primary_var = NULL) {
{# extracting the name of the source .qmd file
<- qmd_file
base_name <- gsub(".qmd", "", base_name)
base_name # render the file
::quarto_render(input = qmd_file,
quarto# We add the suffix '_report' to output html file
output_file = paste0(base_name, "_report.html"),
quiet = FALSE,
# supply the params arguments
execute_params = list("pseq_path" = pseq_path,
"primary_var" = primary_var),
execute_debug = FALSE,
cache = NULL,
cache_refresh = FALSE,
debug = FALSE)
} }
Automate Microbiome Analysis Using Quarto
Before you start, please familiarize yourself with RStudio and its RStudio projects.
As someone analyzing microbiome data, you may find yourself frequently working with different data sets and performing the same basic analyses over and over again. These standard tasks, such as quality checking, calculating alpha and beta diversity, and visualizing the composition between groups, can be time-consuming if you’re constantly copy-pasting code or searching for solutions on Stack Overflow or GitHub.
Wouldn’t it be great if you could automate these analyses? That’s where Quarto comes in. Developed by Posit (formerly RStudio), Quarto is the next generation version of Rmarkdown (.rmd
) tool that makes it easy to perform these common tasks and more. In fact, if you’re already familiar with Rmarkdown, you’ll find the Quarto document (.qmd
) very similar to .rmd
. Here you can find the installation details.
Here’s a simple example of what you can do with Quarto: read in a phyloseq object stored as an *.rds file, calculate reads per sample, Shannon diversity, and Observed ASVs, and investigate whether there is a correlation between reads per sample and Shannon diversity, Observed ASVs. I use the data from Fuentes et al. that consists of gut microbiota profiles from older adults with (two time points, L1 and L2) and without (C) influenza-like illness. This information is stored in the column called ILI in the sample_data
of the phyloseq object FuentesIliGutData.rds
. You can access the data from the GitHub repo of this website.
---
title: "Microbiota Analysis"
author: Sudarshan A. Shetty
date: "2023-01-08"
format:
html:
code-tools: true
code-fold: false
code-copy: true
self-contained: true
toc-location: left
knitr:
opts_chunk:
collapse: TRUE
dpi: 150
message: FALSE
warning: FALSE
error: TRUE
editor: visual
---
In this document, we investigate whether the Shannon diversity and Observed species of samples is correlated to their sequencing depth.
## Load libraries
```{r}
library(phyloseq)
library(microbiome)
library(ggpubr) # stats with viz
library(patchwork) # combine plots
```
## Read Data
```{r}
<- readRDS("data/FuentesIliGutData.rds")
pseq ```
## Check Reads
```{r}
# add reads/sample to sample_data()
sample_data(pseq)$reads.per.sample <- sample_sums(pseq)
# Compare between groups
|>
pseq ::meta() |>
microbiomeggplot(aes(ILI, reads.per.sample)) +
geom_violin(aes(color=ILI))+
geom_boxplot(aes(color=ILI), width=0.2)+
labs(x="", y="Reads/sample")
```
## Alpha diversity
```{r}
# we can add shannon diversity to sample_data(pseq)
sample_data(pseq)$shannon <- microbiome::diversity(pseq, index="shannon")[,1]
sample_data(pseq)$observed <- microbiome::richness(pseq, index="observed")[,1]
<- meta(pseq) |>
p.shannon ggplot(aes(ILI, shannon)) +
geom_violin(aes(color=ILI, shannon))+
geom_boxplot(aes(color=ILI, shannon), width=0.2) +
labs(x="", y="Shannon Diversity")+
theme_bw()
<- meta(pseq) |>
p.observed ggplot(aes(ILI, observed)) +
geom_violin(aes(color=ILI, shannon))+
geom_boxplot(aes(color=ILI, shannon), width=0.2) +
labs(x="", y="Observed ASVs")+
theme_bw()
| p.observed ) +
(p.shannon plot_layout(guides = "collect") +
plot_annotation(tag_levels = "a")
```
## Correlation
Check if reads per sample is correlated to Shannon diversity and Observed ASVs.
```{r}
<- meta(pseq) |>
p.shanon.cor ggplot(aes(reads.per.sample, shannon, group=ILI)) +
geom_point(aes(color=ILI), alpha=0.5) +
geom_smooth(method = "lm", aes(color=ILI), show.legend = FALSE) +
::stat_cor(method = "spearman", aes_string(color=primary_var), show.legend = FALSE,
ggpubrlabel.x.npc = 0.5, label.y.npc = 0.3, hjust = 0)+
labs(x="Reads/sample", y="Shannon Diversity") +
theme_bw()
<- meta(pseq) |>
p.observed.cor ggplot(aes(reads.per.sample, observed, group=ILI)) +
geom_point(aes(color=ILI), alpha=0.5) +
geom_smooth(method = "lm", aes(color=ILI), show.legend = FALSE) +
::stat_cor(method = "spearman", aes_string(color=primary_var), show.legend = FALSE,
ggpubrlabel.x.npc = 0.5, label.y.npc = 0.3, hjust = 0) +
labs(x="Reads/sample", y="Observed ASVs") +
theme_bw()
| p.observed.cor) +
(p.shanon.cor plot_layout(guides = "collect") +
plot_annotation(tag_levels = "a")
```
```{r}
sessionInfo()
```
As you may notice this is one of the common tasks in a microbiota profiling study. This can be boring if you’re constantly writing or copying the same code.
But what if you could automate these tasks and save yourself time and effort? In a recent post, Mike Mahoney demonstrated How to use Quarto for Parameterized Reporting. This inspired me to try Quarto for automating my analysis.
One of the handy features of Quarto is the params option, which can be used in the .qmd
file to customize your analysis for each project. This option is also available in the Rmarkdown format. Here’s an example of how to use params in .qmd
:
---
title: "Microbiota Analysis"
author: Sudarshan A. Shetty
date: "2023-01-08"
format:
html:
code-tools: true
code-fold: false
code-copy: true
self-contained: true
toc-location: left
knitr:
opts_chunk:
collapse: TRUE
dpi: 150
message: FALSE
warning: FALSE
error: TRUE
editor: visual
params:
pseq_path: "data/FuentesIliGutData.rds"
primary_var: "ILI"
---
```{r}
library(phyloseq)
library(microbiome)
library(ggpubr) # stats with viz
library(patchwork) # combine plots
```
## Usage of parameters
The parameters can be passed on simply as follows at the start of the code chunk```{r}
= params$pseq_path
pseq_path = params$primary_var
primary_var ```
## Read Data
```{r}
<- readRDS(pseq_path)
pseq print(pseq)
```
And rest of the analysis...
Did you notice the pseq_path: “data/FuentesIliGutData.rds” and primary_var: ILI variables in the params section of .qmd
file? These are floating variables that can be customized for each project. For every new data set, you can change the pseq_path
and primary_var
and render the .qmd
file.
To make it even easier to reuse this analysis, you can save the file for example basic_analysis.qmd in your RProject directory and write a simple function called build_my_report()
that processes the .qmd
file and renders it as an html
report.
Here’s a draft of what the full basic_analysis.qmd might look like.
---
title: "Microbiota Analysis"
author: Sudarshan A. Shetty
date: "2023-01-08"
format:
html:
code-tools: true
code-fold: false
code-copy: true
self-contained: true
toc-location: left
knitr:
opts_chunk:
collapse: TRUE
dpi: 150
message: FALSE
warning: FALSE
error: TRUE
editor: visual
params:
pseq_path: pseq_path
primary_var: primary_var
---
In this document, we investigate whether the Shannon diversity and Observed ASVs of samples is correlated to their sequencing depth. ## Load libraries
```{r}
library(phyloseq)
library(microbiome)
library(ggpubr) # stats with viz
library(patchwork) # combine plots
```
## Define parameters
```{r}
= params$pseq_path
pseq_path = params$primary_var
primary_var ```
## Read Data
```{r}
<- readRDS(pseq_path)
pseq ```
## Check Reads
```{r}
# add reads/sample to sample_data()
sample_data(pseq)$reads.per.sample <- sample_sums(pseq)
# Compare between groups
# notice here the use of ase_string instead of aes for ggplot. Since our primary variable 'primary_var' will be in quotes, we can use this handy option in ggplot2
|>
pseq ::meta() |>
microbiomeggplot(aes_string(primary_var, "reads.per.sample")) +
geom_violin(aes_string(color=primary_var))+
geom_boxplot(aes_string(color=primary_var), width=0.2)+
labs(x="", y="Reads/sample")+
theme_bw()
```
## Alpha diversity
```{r}
# we can add shannon diversity to sample_data(pseq)
sample_data(pseq)$shannon <- microbiome::diversity(pseq, index="shannon")[,1]
sample_data(pseq)$observed <- microbiome::richness(pseq, index="observed")[,1]
<- meta(pseq) |>
p.shannon ggplot(aes_string(primary_var, "shannon")) +
geom_violin(aes_string(color=primary_var))+
geom_boxplot(aes_string(color=primary_var), width=0.2) +
labs(x="", y="Shannon Diversity")+
theme_bw()
<- meta(pseq) |>
p.observed ggplot(aes_string(primary_var, "observed")) +
geom_violin(aes_string(color=primary_var))+
geom_boxplot(aes_string(color=primary_var), width=0.2) +
labs(x="", y="Observed ASVs")+
theme_bw()
| p.observed ) +
(p.shannon plot_layout(guides = "collect") +
plot_annotation(tag_levels = "a")
```
## Correlation
Check if reads per sample is correlated to Shannon diversity```{r}
<- meta(pseq) |>
p.shanon.cor ggplot(aes_string("reads.per.sample", "shannon", group=primary_var)) +
geom_point(aes_string(color=primary_var), alpha=0.5) +
geom_smooth(method = "lm", aes_string(color=primary_var), show.legend = FALSE) +
::stat_cor(method = "spearman", aes_string(color=primary_var), show.legend = FALSE,
ggpubrlabel.x.npc = 0.5, label.y.npc = 0.3, hjust = 0)+
labs(x="Reads/sample", y="Shannon Diversity") +
theme_bw()
<- meta(pseq) |>
p.observed.cor ggplot(aes_string("reads.per.sample", "observed", group=primary_var)) +
geom_point(aes_string(color=primary_var), alpha=0.5) +
geom_smooth(method = "lm", aes_string(color=primary_var), show.legend = FALSE) +
::stat_cor(method = "spearman", aes_string(color=primary_var), show.legend = FALSE,
ggpubrlabel.x.npc = 0.5, label.y.npc = 0.3, hjust = 0) +
labs(x="Reads/sample", y="Observed ASVs") +
theme_bw()
| p.observed.cor) +
(p.shanon.cor plot_layout(guides = "collect") +
plot_annotation(tag_levels = "a")
```
```{r}
sessionInfo()
```
Notice that the ILI variable has been replaced with primary_var
in the params
and in the plotting aes_string
. This allows you to customize the analysis for each project simply by changing the values of these floating variables in your build_my_report()
function.
Below is the build_my_report()
function.
The build_my_report(qmd_file, pseq_path, primary_var)
function takes three arguments:
qmd_file is the name of the
.qmd
template file (e.g. basic_analysis.qmd).pseq_path is the path to the phyloseq object that you want to analyze.
primary_var is the name of the primary variable you want to investigate. In this case, ILI is the column name in the
sample_data()
of theFuentesIliGutData
phyloseq object.
You can create an RProject
and organize your files. Here is one simple example.
`RProject`:
Structure of your
my_project/
|----- my_project.Rproj # short description of what it contains.
|----- basic_analysis.qmd # Template .qmd file.
|----- build_my_report.R # R file with function to render report from .qmd file.
|----- data/ # A folder where data is stored. |----- FuentesIliGutData.rds
After this you can simply source
the function from the file build_my_report.R
in your session.
source("build_my_report.R") # source function to run
And run build_my_report()
by supplying the necessary files/info to the arguments.
# run function
build_my_report(qmd_file = "basic_analysis.qmd",
pseq_path = "data/FuentesIliGutData.rds",
primary_var = "ILI")
Check the rendered html file in the tab Example Report above!
I think that Quarto is a useful tool for microbiome researchers looking to streamline their workflows and save time on routine analyses. By using the params option in the .qmd
file, you can easily customize your analysis for each project, while the ability to store your .qmd
file in your RProject
directory means you can easily reuse your analysis across multiple projects. The reports can be standardized and shared easily with collaborators. There are several aesthetic and practical settings like hiding codes, defining visualization themes, etc as well as using the knitr::opts_chunk
for setting global rules for your documents. Moreover, beyond html
, reports can be generated as pdf
or word doc
files. You can familiarize yourself with Quarto quide and explore the numerous options for automating your data analysis workflows. Give it a try and see how it can change the way you approach your routine microbiome data analytics.
© Copyright 2023 Sudarshan A. Shetty