NOTE While we continue to maintain this R package, the development has been discontinued as we have shifted to supporting methods development based on the new TreeSummarizedExperiment data container, which provides added capabilities for multi-omics data analysis. Check the miaverse project for details.

The microbiomeutilities R package is part of the microbiome-verse tools that provides additional data handling and visualization support for the microbiome R/BioC package

Philosophy: “Seemingly simple tasks for experienced R users can always be further simplified for novice users

Install

install.packages("devtools")
devtools::install_github("microsud/microbiomeutilities")

Load libraries

Basics

Example data

Package data from Zackular et al., 2014: The Gut Microbiome Modulates Colon Tumorigenesis.

Import test data.

data("zackular2014")
ps0 <- zackular2014
ps0
#> phyloseq-class experiment-level object
#> otu_table()   OTU Table:         [ 2078 taxa and 88 samples ]
#> sample_data() Sample Data:       [ 88 samples by 7 sample variables ]
#> tax_table()   Taxonomy Table:    [ 2078 taxa by 7 taxonomic ranks ]

The print_ps from microbiomeutilities can give additional information. See also microbiome::summarize_phyloseq.

print_ps(ps0)
#> 01] ntaxa = 2078
#> 02] nsamples = 88
#> 03] nsamplesvariables = 7
#> 04] nranks = 7
#> 05] Min. number of reads = 15704
#> 06] Max. number of reads = 216888
#> 07] Total number of reads = 4594626
#> 08] Average number of reads = 52211.66
#> 09] Median number of reads = 49415.5
#> 10] Sparsity = 0.728426590252865
#> 11] Number of singletons = 0
#> 12] % of taxa that are singletons 
#>   (i.e. exactly one read detected across all samples) = 0

Formatting the Phyloseq Object

Most commonly it is observed that the taxonomy file has classification until a given taxonomic level. We can change the names in both otu table and taxonomy table with the best taxonomic classification available. This can be useful if the analysis has to be done at OTU/ASVs level. Only ID are less useful.

Check the taxonomy in phyloseq object.

Domain Phylum Class Order Family Genus Species
d__denovo1773 k__Bacteria p__Bacteroidetes c__Bacteroidia o__Bacteroidales f__Bacteroidaceae g__Bacteroides s__
d__denovo1771 k__Bacteria p__Bacteroidetes c__Bacteroidia o__Bacteroidales f__Bacteroidaceae g__Bacteroides s__
d__denovo1776 k__Bacteria p__Firmicutes c__Clostridia o__Clostridiales f__Ruminococcaceae g__ s__
d__denovo1777 k__Bacteria p__Bacteroidetes c__Bacteroidia o__Bacteroidales f__Bacteroidaceae g__Bacteroides s__
d__denovo1775 k__Bacteria p__Firmicutes c__Clostridia o__Clostridiales f__Lachnospiraceae g__Blautia s__
d__denovo2639 k__Bacteria p__Bacteroidetes c__Bacteroidia o__Bacteroidales f__Bacteroidaceae g__Bacteroides s__

Some have only g__ of s__ information.

data("zackular2014")
p0 <- zackular2014
# reduce size for example
ps0 <- core(p0, detection = 10, prevalence = 20 / 100)
# Add a prefix to taxa labels
ps0.f2 <- format_to_besthit(ps0, prefix = "MyBug-")

kable(head(tax_table(ps0.f2))[3:6])
Domain Phylum Class Order Family Genus Species best_hit
MyBug-d__denovo165:f__Ruminococcaceae Bacteria Firmicutes Clostridia Clostridiales Ruminococcaceae f__Ruminococcaceae f__Ruminococcaceae MyBug-d__denovo165:f__Ruminococcaceae
MyBug-d__denovo167:g__Coprococcus Bacteria Firmicutes Clostridia Clostridiales Lachnospiraceae Coprococcus g__Coprococcus MyBug-d__denovo167:g__Coprococcus
MyBug-d__denovo166:o__Clostridiales Bacteria Firmicutes Clostridia Clostridiales o__Clostridiales o__Clostridiales o__Clostridiales MyBug-d__denovo166:o__Clostridiales
MyBug-d__denovo161:g__Roseburia Bacteria Firmicutes Clostridia Clostridiales Lachnospiraceae Roseburia g__Roseburia MyBug-d__denovo161:g__Roseburia

Now the available taxonomy is added.

As can be seen, the rownames have the OTUIDs and available taxonomic name(s).

Distribution of reads

Useful for QC purposes. Check for distribution of sequencing depth.

data("zackular2014")
p0 <- zackular2014
p <- plot_read_distribution(p0, groups = "DiseaseState", 
                            plot.type = "density")
print(p + theme_biome_utils())

This is a diagnostic step. Key to check if there is variation between groups that will be compared for downstream analysis.

Convert phyloseq object to long data format

Useful if the user wants to plot specific features.
Converting to long data format opens several opportunities to work with Tidyverse.

data("zackular2014")
p0 <- zackular2014
# reduce size for example
ps0 <- core(p0, detection = 10, prevalence = 20 / 100)

pseq_df <- phy_to_ldf(ps0, transform.counts = NULL)
#> An additonal column Sam_rep with sample names is created for reference purpose

kable(head(pseq_df))
OTUID Domain Phylum Class Order Family Genus Species Sam_rep Abundance investigation_type project_name DiseaseState age body_product FOBT.result material
d__denovo66 k__Bacteria p__Firmicutes c__Clostridia o__Clostridiales f__Lachnospiraceae g__Ruminococcus2 s__ Adenoma10-2757 27 metagenomic The Gut Microbiome Improves Predictive Models for Diagnosis of Colorectal Cancer nonCRC 37 feces negative feces
d__denovo175 k__Bacteria p__Bacteroidetes c__Bacteroidia o__Bacteroidales f__Rikenellaceae g__Alistipes s__ Adenoma10-2757 0 metagenomic The Gut Microbiome Improves Predictive Models for Diagnosis of Colorectal Cancer nonCRC 37 feces negative feces
d__denovo165 k__Bacteria p__Firmicutes c__Clostridia o__Clostridiales f__Ruminococcaceae g__ s__ Adenoma10-2757 49 metagenomic The Gut Microbiome Improves Predictive Models for Diagnosis of Colorectal Cancer nonCRC 37 feces negative feces
d__denovo167 k__Bacteria p__Firmicutes c__Clostridia o__Clostridiales f__Lachnospiraceae g__Coprococcus s__ Adenoma10-2757 155 metagenomic The Gut Microbiome Improves Predictive Models for Diagnosis of Colorectal Cancer nonCRC 37 feces negative feces
d__denovo166 k__Bacteria p__Firmicutes c__Clostridia o__Clostridiales f__ g__ s__ Adenoma10-2757 0 metagenomic The Gut Microbiome Improves Predictive Models for Diagnosis of Colorectal Cancer nonCRC 37 feces negative feces
d__denovo161 k__Bacteria p__Firmicutes c__Clostridia o__Clostridiales f__Lachnospiraceae g__Roseburia s__ Adenoma10-2757 11 metagenomic The Gut Microbiome Improves Predictive Models for Diagnosis of Colorectal Cancer nonCRC 37 feces negative feces

Taxa overview

One of the first questions arise regarding which taxa are present , how they are distributed in the data. This can be done with following functionality.

Check distribution

A quick check to see how different taxa are distributed in your data.

library(microbiomeutilities)
library(RColorBrewer)
library(patchwork)
library(ggpubr)
data("zackular2014")
pseq <- zackular2014

# check healthy
health_ps <- subset_samples(pseq, DiseaseState=="H")
p_hc <- taxa_distribution(health_ps) + 
  theme_biome_utils() + 
  labs(title = "Healthy")

# check CRC
crc_ps <- subset_samples(pseq, DiseaseState=="CRC")
p_crc <- taxa_distribution(crc_ps) + 
  theme_biome_utils() + 
  labs(title = "CRC")

# harnessing the power of patchwork
p_hc / p_crc + plot_layout(guides = "collect") + 
  plot_annotation(tag_levels = "A")

There are Cyanobacteria/Chloroplast related sequences which can be removed if not expected in the samples.

Dominant taxa

Sometimes, we are interested in identifying the most dominant taxa in each sample. We may also wish to check what percent of samples within a given group are these taxa dominating.

library(microbiomeutilities)
library(dplyr)
data("zackular2014")
p0 <- zackular2014
p0.gen <- aggregate_taxa(p0,"Genus")
x.d <- dominant_taxa(p0,level = "Genus", group="DiseaseState")
head(x.d$dominant_overview)
#> # A tibble: 6 × 5
#> # Groups:   DiseaseState [3]
#>   DiseaseState dominant_taxa      n rel.freq rel.freq.pct
#>   <fct>        <chr>          <int>    <dbl> <chr>       
#> 1 H            g__Bacteroides    15     50   50%         
#> 2 nonCRC       g__Blautia        13     46.4 46%         
#> 3 CRC          g__Bacteroides     9     30   30%         
#> 4 CRC          g__Blautia         9     30   30%         
#> 5 H            g__Blautia         7     23.3 23%         
#> 6 nonCRC       g__Bacteroides     6     21.4 21%

As seen in the table above, 50% of the samples in H group are dominated by *g__Bacteroides* and so on…

Get taxa summary

This can be used for entire dataset.

library(microbiomeutilities)
data("zackular2014")
p0 <- zackular2014
tx.sum1 <- taxa_summary(p0, "Phylum")
#> Data provided is not compositional 
#>  will first transform
tx.sum1
#>                            Taxa          Max.Rel.Ab          Mean.Rel.Ab
#> 1              p__Euryarchaeota   0.218508035656191   0.0200267274400673
#> 2                           p__  0.0359956743395643  0.00213630393381328
#> 3             p__Actinobacteria   0.189937172231165   0.0294096453288836
#> 4              p__Bacteroidetes   0.703769740193581    0.244709411770076
#> 5  p__Cyanobacteria/Chloroplast 0.00391786546652736 0.000248991452140149
#> 6        p__Deinococcus-Thermus  0.0306807689079157  0.00136619388828292
#> 7                 p__Firmicutes   0.935779308307329    0.609116019352896
#> 8               p__Fusobacteria   0.390135934239562   0.0079458294240542
#> 9             p__Proteobacteria   0.403932082216265     0.04116346455827
#> 10             p__Synergistetes  0.0106576610162058 0.000207008030950549
#> 11           p__Verrucomicrobia   0.320077232101149   0.0436704048205656
#>           Median.Rel.Ab             Std.dev
#> 1  4.08341966447163e-05  0.0445500008069431
#> 2  0.000335981950200747 0.00577447214134393
#> 3    0.0152970058974793  0.0361457627428506
#> 4     0.193736718210234   0.183986562646175
#> 5  2.58996131628288e-05 0.00070384339925814
#> 6                     0 0.00449398785197828
#> 7     0.598709982286244   0.182320763133074
#> 8                     0  0.0514603201611758
#> 9    0.0130121759294526  0.0774502267516943
#> 10                    0 0.00119199461324549
#> 11  0.00665688163392832  0.0679829229763011

Get taxa summary by group(s)

For group specific abundances of taxa get_group_abundances.

library(microbiomeutilities)
data("zackular2014")
p0 <- zackular2014
grp_abund <- get_group_abundances(p0, 
                                  level = "Phylum", 
                                  group="DiseaseState",
                                  transform = "compositional")

How to visualize these data?

mycols <- c("brown3", "steelblue","grey50")

# clean names 
grp_abund$OTUID <- gsub("p__", "",grp_abund$OTUID)
grp_abund$OTUID <- ifelse(grp_abund$OTUID == "", 
                          "Unclassified", grp_abund$OTUID)


mean.plot <- grp_abund %>% # input data
  ggplot(aes(x= reorder(OTUID, mean_abundance), # reroder based on mean abundance
             y= mean_abundance,
             fill=DiseaseState)) + # x and y axis 
  geom_bar(     stat = "identity", 
                position=position_dodge()) + 
  scale_fill_manual("DiseaseState", values=mycols) + # manually specify colors
  theme_bw() + # add a widely used ggplot2 theme
  ylab("Mean Relative Abundance") + # label y axis
  xlab("Phylum") + # label x axis
  coord_flip() # rotate plot 

mean.plot

This plot is diagnostic to have an idea about the taxonomy in each group. Statements based on comparisons between groups may not make sense here because only mean abundance values are plotted, and not standard deviation within groups. For visualizing error bars and more check Statistical tools for high-throughput data analysis.

Find samples dominated by specific taxa

Finding samples dominated by user provided taxa in a phyloseq object. This is useful especially if user suspects a taxa to be contaminant and wishes to identify which samples are dominated by the contaminant taxa.


library(microbiomeutilities)
data("zackular2014")
p0 <- zackular2014
p0.f <- aggregate_taxa(p0, "Genus")
bac_dom <- find_samples_taxa(p0.f, taxa = "g__Bacteroides")
bac_dom
#>  [1] "Adenoma14-2817"    "Adenoma20-2995"    "Adenoma21-3129"   
#>  [4] "Adenoma22-3133"    "Adenoma24-3217"    "Adenoma9-2717"    
#>  [7] "Cancer10-2567"     "Cancer11-2573"     "Cancer1-2355"     
#> [10] "Cancer17-2625"     "Cancer20-2667"     "Cancer22-2703"    
#> [13] "Cancer30-2863"     "Cancer5-2523"      "Cancer9-2547"     
#> [16] "Healthy10-3087"    "Healthy11-3123"    "Healthy1-2027"    
#> [19] "Healthy13-3241"    "Healthy14-3245"    "Healthy16-3251"   
#> [22] "Healthy17-3257"    "Healthy18-3293"    "Healthy19-3339"   
#> [25] "Healthy20-3341"    "Healthy26-3407"    "Healthy30-3291-66"
#> [28] "Healthy6-3061"     "Healthy8-3073"     "Healthy9-3075"

#get samples dominated by g__Bacteroides
ps.sub <- prune_samples(sample_names(p0.f) %in% bac_dom, p0.f)

Alpha diversity

Rarefaction curves for alpha diversity indices

A common approach to check for sequencing depth and diversity measures.
Here, we can access the numerous alpha diversity measures supported by microbiome R package.

NOTE: This can take sometime to complete.

library(microbiomeutilities)
data("zackular2014")
p0 <- zackular2014
# set seed
set.seed(1)
subsamples <- seq(0, 5000, by=100)[-1]
#subsamples = c(10, 5000, 10000, 20000, 30000)

p <- plot_alpha_rcurve(p0, index="observed", 
                       subsamples = subsamples,
                       lower.conf = 0.025, 
                       upper.conf = 0.975,
                       group="DiseaseState",
                       label.color = "brown3",
                       label.size = 3,
                       label.min = TRUE) 
# change color of line 
mycols <- c("brown3", "steelblue","grey50")

p <- p + scale_color_manual(values = mycols) + 
  scale_fill_manual(values = mycols)
print(p)

Plot alpha diversities

Utility plot function for diversity measures calculated by microbiome package.


library(microbiome)
data("zackular2014")
ps0 <- zackular2014
mycols <- c("brown3", "steelblue","grey50")
p <- plot_alpha_diversities(ps0,
                            type = "dominance",
                            index.val = "all",
                            plot.type = "stripchart",
                            variableA = "DiseaseState",
                            palette = mycols)

p <- p + theme_biome_utils() + 
  ggplot2::theme(legend.position = "top",
                 text = element_text(size=14))
print(p)

comps <- make_pairs(sample_data(ps0)$DiseaseState)

p <- p + stat_compare_means(
      comparisons = comps,
      label = "p.format",
      tip.length = 0.05,
      method = "wilcox.test")
p
#> Warning in wilcox.test.default(c(7855, 4967, 16964, 6997, 11425, 17013, : cannot
#> compute exact p-value with ties

Alternatively, one can plot one index at a time with pair-wise stats.


library(gghalves)
library(microbiomeutilities)
data("zackular2014")
p0 <- zackular2014
mycols <- c("brown3", "steelblue","grey50")

p.m <- plot_diversity_stats(p0, group = "DiseaseState", 
                            index = "diversity_shannon", 
                            group.order = c("H", "CRC", "nonCRC"), 
                            group.colors = mycols,
                            label.format="p.format",
                            stats = TRUE)
#> Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
#> "none")` instead.
p.m + ylab("Shannon Diversity") + xlab("")

Composition plots

Ternary plot

library(microbiome)
library(microbiomeutilities)
library(dplyr)
data("zackular2014")
p0 <- zackular2014
tern_df <- prep_ternary(p0, group="DiseaseState", 
               abund.thres=0.000001, level= "Genus", prev.thres=0.01)
head(tern_df)
#> # A tibble: 6 × 11
#> # Groups:   unique [6]
#>   unique         CRC      H  nonCRC Domain Phylum Class Order Family Genus OTUID
#>   <chr>        <dbl>  <dbl>   <dbl> <chr>  <chr>  <chr> <chr> <chr>  <chr> <chr>
#> 1 Acetanaer… 0.00995 0.0135 0.00858 Bacte… Firmi… Clos… Clos… Rumin… Acet… Acet…
#> 2 Acidamino… 0.115   0.164  0.0846  Bacte… Firmi… Nega… Sele… Acida… Acid… Acid…
#> 3 Acinetoba… 0.0108  0.0312 0.00762 Bacte… Prote… Gamm… Pseu… Morax… Acin… Acin…
#> 4 Actinomyc… 0.0169  0.0338 0.0218  Bacte… Actin… Acti… Acti… Actin… Acti… Acti…
#> 5 Akkermans… 1.26    1.05   1.53    Bacte… Verru… Verr… Verr… Verru… Akke… Akke…
#> 6 Alistipes  0.929   0.894  0.654   Bacte… Bacte… Bact… Bact… Riken… Alis… Alis…

# install.packages("ggtern")
require(ggtern)

# Replace empty with Other
tern_df$Phylum[tern_df$Phylum==""] <- "Other"

ggtern(data=tern_df, aes(x=CRC, y=H, z=nonCRC)) + 
  geom_point(aes(color= Phylum), 
             alpha=0.25, 
             show.legend=T, 
             size=3) +
  #scale_size(range=c(0, 6)) + 
  geom_mask() + 
  scale_colour_brewer(palette = "Paired") +
  theme_biome_utils()

detach("package:ggtern", unload=TRUE)

Plot taxa boxplot

Plot relative abundance of top taxa specified by user.

library(microbiomeutilities)
library(RColorBrewer)
data("zackular2014")
ps0 <- zackular2014
mycols <- c("brown3", "steelblue", "grey50")
pn <- plot_taxa_boxplot(ps0,
                        taxonomic.level = "Family",
                        top.otu = 3, 
                        group = "DiseaseState",
                        add.violin= FALSE,
                        title = "Top three family", 
                        keep.other = FALSE,
                        group.order = c("H","CRC","nonCRC"),
                        group.colors = mycols,
                        dot.size = 1)

print(pn + theme_biome_utils())

Plotting selected taxa

Using a list of taxa specified by the user for comparisons.

library(microbiome)
library(microbiomeutilities)
library(ggpubr)
data("zackular2014")
p0 <- zackular2014
p0.f <- format_to_besthit(p0)
#top_taxa(p0.f, 5)
select.taxa <- c("d__denovo1:g__Blautia", "d__denovo3:g__Bacteroides")

mycols <- c("brown3", "steelblue", "grey50")

p <- plot_listed_taxa(p0.f, select.taxa, 
                      group= "DiseaseState",
                      group.order = c("H","CRC","nonCRC"),
                      group.colors = mycols,
                      add.violin = TRUE,
                      violin.opacity = 0.3,
                      dot.opacity = 0.25,
                      box.opacity = 0.25,
                      panel.arrange= "grid")
#> An additonal column Sam_rep with sample names is created for reference purpose
#> Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
#> "none")` instead.
print(p + ylab("Relative abundance") + scale_y_continuous(labels = scales::percent))

Adding statistical test with ggpubr::stat_compare_means()


# If more than two variables
comps <- make_pairs(sample_data(p0.f)$DiseaseState)
print(comps)
#> [[1]]
#> [1] "CRC" "H"  
#> 
#> [[2]]
#> [1] "CRC"    "nonCRC"
#> 
#> [[3]]
#> [1] "H"      "nonCRC"
p <- p + stat_compare_means(
      comparisons = comps,
      label = "p.format",
      tip.length = 0.05,
      method = "wilcox.test")
p + scale_y_continuous(labels = scales::percent)
#> Warning in wilcox.test.default(c(0.0737753988097334, 0.0369437665507036, :
#> cannot compute exact p-value with ties
#> Warning in wilcox.test.default(c(0.0218880793675399, 0.000317486610347303, :
#> cannot compute exact p-value with ties

Plot top four Genera

library(microbiome)
library(microbiomeutilities)
data("zackular2014")
p0 <- zackular2014
p0.f <- aggregate_taxa(p0, "Genus")
top_four <- top_taxa(p0.f, 4)
top_four
#> [1] "g__Bacteroides"      "g__Blautia"          "g__Faecalibacterium"
#> [4] "g__Akkermansia"

mycols <- c("brown3", "steelblue", "grey50")

p <- plot_listed_taxa(p0.f, top_four, 
                      group= "DiseaseState",
                      group.order = c("H","CRC","nonCRC"),
                      group.colors = mycols,
                      add.violin = TRUE,
                      violin.opacity = 0.3,
                      dot.opacity = 0.25,
                      box.opacity = 0.25,
                      panel.arrange= "wrap")
#> Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
#> "none")` instead.

comps <- make_pairs(sample_data(p0.f)$DiseaseState)
p <- p + stat_compare_means(
      comparisons = comps,
      label = "p.format",
      tip.length = 0.05,
      method = "wilcox.test")

print(p + ylab("Relative abundance") + scale_y_continuous(labels = scales::percent))
#> Warning in wilcox.test.default(c(0.0242631263865901, 5.19237758969832e-05, :
#> cannot compute exact p-value with ties

Abundance-Prevalence relationship

Plots mean Abundance-Prevalence for taxa. Mean abundance, mean prevalence, and upper and lower confidence interval for each taxa is calculated by random sub-sampling.
This can be useful in identifying highly prevalent taxa and their mean relative abundance in a group of samples. Taxa that are highly prevalent with low variation in lower and upper CI can be identified at varying values of mean relative abundance. These are likely core taxa in the groups of samples.

See core microbiota analysis in microbiome R package

library(microbiomeutilities)
library(dplyr)
library(ggrepel)
asv_ps <- zackular2014
asv_ps <- microbiome::transform(asv_ps, "compositional")
# Select healthy samples
asv_ps <- subset_samples(asv_ps, DiseaseState=="H")
asv_ps <- core(asv_ps,detection = 0.0001, prevalence = 0.50) # reduce size for example
asv_ps <- format_to_besthit(asv_ps)

set.seed(14285)
p_v <- plot_abund_prev(asv_ps, 
                       label.core = TRUE,
                       color = "Phylum", # NA or "blue"
                       mean.abund.thres = 0.01,
                       mean.prev.thres = 0.99,
                       dot.opacity = 0.7,
                       label.size = 3,
                       label.opacity = 1.0,
                       nudge.label=-0.15,
                       bs.iter=9, # increase for actual analysis e.g. 999
                       size = 20, # increase to match your nsamples(asv_ps)
                       replace = TRUE,
                       label.color="#5f0f40")  
p_v <- p_v + 
  geom_vline(xintercept = 0.95, lty="dashed", alpha=0.7) + 
  geom_hline(yintercept = 0.01,lty="dashed", alpha=0.7) +
  scale_color_brewer(palette = "Dark2")
p_v

Heatmaps

Heatmap using phyloseq and pheatmap

Useful for visualizing differences in top OTUs between sample groups.

library(microbiomeutilities)
library(pheatmap)
library(RColorBrewer)
data("zackular2014")
ps0 <- zackular2014

#optional step if required to rename taxa_names
taxa_names(ps0) <- gsub("d__denovo", "OTU:", taxa_names(ps0))

# create a gradient color palette for abundance
grad_ab <- colorRampPalette(c("#faf3dd","#f7d486" ,"#5e6472"))
grad_ab_pal <- grad_ab(10)

# create a color palette for varaibles of interest
meta_colors <- list(c("positive" = "#FFC857", 
                      "negative" = "#05B083"), 
                    c("CRC" = "steelblue", 
                      "nonCRC" = "grey50", 
                      "H"="brown3"))
# add labels for pheatmap to detect
names(meta_colors) <- c("FOBT.result", "DiseaseState")

p <- plot_taxa_heatmap(ps0,
                       subset.top = 25,
                       VariableA = c("DiseaseState","FOBT.result"),
                       heatcolors = grad_ab_pal, #rev(brewer.pal(6, "RdPu")),
                       transformation = "log10",
                       cluster_rows = T,
                       cluster_cols = F,
                       show_colnames = F,
                       annotation_colors=meta_colors)
#> Top 25 OTUs selected
#> log10, if zeros in data then log10(1+x) will be used
#> First top taxa were selected and 
#> then abundances tranformed to log10(1+X)
#> Warning in transform(phyobj1, "log10"): OTU table contains zeroes. Using log10(1
#> + x) transform.

#the plot is stored here
p$plot

# table used for plot is here
p$tax_tab[1:3,1:3]

Elegant option with ggplot2

library(microbiomeutilities)
data("zackular2014")
p0 <- zackular2014
p0.rel <- transform(p0, "compositional")
#heat.cols <- c("#a8dadc","#457b9d", "#1d3557")
# create a gradient color palette for abundance
grad_ab <- colorRampPalette(c("#96d4ca","#d3f3f1", "#7c65a9"))
heat.cols <- grad_ab(10)
simple_heatmap(p0.rel,
               group.facet = "DiseaseState",
               group.order = NULL,
               abund.thres = 0.01,
               prev.thres = 0.1,
               level = "Genus",
               scale.color = "log10",
               na.fill = "white",
               color.fill = heat.cols,
               taxa.arrange=TRUE,
               remove.other=TRUE,
               panel.arrange="grid",
               ncol=NULL,
               nrow=NULL)
#> Warning: Transformation introduced infinite values in discrete y-axis

For heatmap options see microbiome::heat() and ampvis2

For other longitudinal related functionality check the [Longitudinal data analysis and visualization] section in the Articles section.

MicrobiomeHD datasets as phyloseq objects

We provide access to a subset of studies included in the MicrobiomeHD database from Duvallet et al 2017: Meta-analysis of gut microbiome studies identifies disease-specific and shared responses. Nature communications.

The phyloseq objects are stored and accessed from microbiomedatarepo.


study <- list_microbiome_data(printtab = FALSE)

knitr::kable(study)

Below is the per study reference.

NOTE: When using these studies, please cite Duvallet et al. 2017 and the respective studies.


file <- system.file("extdata", "microbiomeHD_ref.txt", package = "microbiomeutilities")
reference <- read.table(file, header = T, sep = "\t")

knitr::kable(reference)

Experimental

Plot ordination and core


library(microbiomeutilities)
library(RColorBrewer)
data("zackular2014")
p0 <- zackular2014

ps1 <- format_to_besthit(p0)

#ps1 <- subset_samples(ps1, DiseaseState == "H")
ps1 <- prune_taxa(taxa_sums(ps1) > 0, ps1)
prev.thres <- seq(.05, 1, .05)
det.thres <- 10^seq(log10(1e-4), log10(.2), length = 10)
pseq.rel <- microbiome::transform(ps1, "compositional")
# reduce size for example
pseq.rel <- core(pseq.rel, detection = 0.001, prevalence = 20 / 100)

ord.bray <- ordinate(pseq.rel, "NMDS", "bray")

p <- plot_ordiplot_core(pseq.rel, ord.bray,
                        prev.thres, det.thres,
                        min.prevalence = 0.9,
                        color.opt = "DiseaseState", 
                        shape = NULL, Sample = TRUE)

print(p)

Useful resources:

The microbiomeutilties depends on the phyloseq data structure and core Phyloseq functions.

Tools for microbiome analysis in R. Microbiome package URL: microbiome package.

For more tutorials and examples of data analysis in R please check:

Contributions are welcome:

Issue Tracker
Pull requests
Star us on the Github page

sessionInfo()
#> R version 4.2.1 (2022-06-23 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.utf8 
#> [2] LC_CTYPE=English_United States.utf8   
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.utf8    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] pheatmap_1.0.12             ggrepel_0.9.1              
#>  [3] gghalves_0.1.3              ggpubr_0.4.0               
#>  [5] patchwork_1.1.2             RColorBrewer_1.1-3         
#>  [7] tibble_3.1.7                knitr_1.40                 
#>  [9] microbiomeutilities_1.00.16 dplyr_1.0.9                
#> [11] microbiome_1.18.0           ggplot2_3.3.6              
#> [13] phyloseq_1.40.0            
#> 
#> loaded via a namespace (and not attached):
#>  [1] nlme_3.1-157           bitops_1.0-7           fs_1.5.2              
#>  [4] rprojroot_2.0.3        GenomeInfoDb_1.32.2    backports_1.4.1       
#>  [7] tools_4.2.1            bslib_0.4.0            utf8_1.2.2            
#> [10] R6_2.5.1               vegan_2.6-2            DBI_1.1.3             
#> [13] BiocGenerics_0.42.0    mgcv_1.8-40            colorspace_2.0-3      
#> [16] permute_0.9-7          rhdf5filters_1.8.0     ade4_1.7-19           
#> [19] withr_2.5.0            tidyselect_1.1.2       compiler_4.2.1        
#> [22] textshaping_0.3.6      cli_3.3.0              Biobase_2.56.0        
#> [25] desc_1.4.2             labeling_0.4.2         sass_0.4.2            
#> [28] scales_1.2.1           pkgdown_2.0.6          systemfonts_1.0.4     
#> [31] stringr_1.4.1          digest_0.6.29          rmarkdown_2.16        
#> [34] XVector_0.36.0         pkgconfig_2.0.3        htmltools_0.5.3       
#> [37] highr_0.9              fastmap_1.1.0          rlang_1.0.5           
#> [40] rstudioapi_0.14        farver_2.1.1           jquerylib_0.1.4       
#> [43] generics_0.1.3         jsonlite_1.8.0         car_3.1-0             
#> [46] RCurl_1.98-1.6         magrittr_2.0.3         GenomeInfoDbData_1.2.8
#> [49] biomformat_1.24.0      Matrix_1.5-1           Rcpp_1.0.8.3          
#> [52] munsell_0.5.0          S4Vectors_0.34.0       Rhdf5lib_1.18.2       
#> [55] fansi_1.0.3            abind_1.4-5            ape_5.6-2             
#> [58] lifecycle_1.0.2        stringi_1.7.6          yaml_2.3.5            
#> [61] carData_3.0-5          MASS_7.3-57            zlibbioc_1.42.0       
#> [64] rhdf5_2.40.0           Rtsne_0.16             plyr_1.8.7            
#> [67] grid_4.2.1             parallel_4.2.1         crayon_1.5.1          
#> [70] lattice_0.20-45        Biostrings_2.64.0      splines_4.2.1         
#> [73] multtest_2.52.0        pillar_1.8.1           igraph_1.3.1          
#> [76] ggsignif_0.6.3         reshape2_1.4.4         codetools_0.2-18      
#> [79] stats4_4.2.1           glue_1.6.2             evaluate_0.16         
#> [82] data.table_1.14.2      vctrs_0.4.1            foreach_1.5.2         
#> [85] gtable_0.3.1           purrr_0.3.4            tidyr_1.2.0           
#> [88] assertthat_0.2.1       cachem_1.0.6           xfun_0.31             
#> [91] broom_1.0.1            rstatix_0.7.0          ragg_1.2.2            
#> [94] survival_3.3-1         iterators_1.0.14       memoise_2.0.1         
#> [97] IRanges_2.30.0         cluster_2.1.3          ellipsis_0.3.2