The decreasing cost of Next Generation Sequencing is allowing researchers to gain valuable information on microbial communities in unique and often complex environments. The microbial composition in aquaponic (AP) systems is a relative ‘black box.’ An increasing number of studies are providing insight into the role of microorganisms in nutrient cycling, plant productivity, and food safety of AP systems. One challenge is determining how to filter the data post-sequencing. In Illumina short-read sequencing, a per-nucleotide Phred quality score (Q) indicates the probability that the sequencer correctly identified the nucleotide at a specific position. Q filtering is essential in sequencing data analysis as it removes imprecise reads that would otherwise inflate microbial diversity estimates. Here we demonstrate the impact of Q selection on diversity metrics and taxonomic composition of plant-associated microbiomes in AP systems.
16S rRNA raw sequencing data (Illumina MiSeq) was obtained from research examining the effect of development stage on the composition of rhizosphere microbiome in bell pepper (Capsaicin annum) plants grown in AP systems. Sequencing data was analyzed with QIIME2, using Greengenes database for taxonomic identification. Nucleotide quality filtering was performed at High (H; >30), Medium (M; >20) and Low (L; >5) Q thresholds.
Quality score selection resulted in significant differences in alpha diversity, beta-diversity, and microbial abundance estimates. Retention of H quality nucleotides retained 50% of input data, while 25% and 10% were retained on M and L, respectively. Shannon diversity index was significantly higher in H-filtered samples (6.66 ± 0.72) than M (3.79 ± 0.98) and L (5.94 ± 0.77), which were significantly different from each other (Figure 1 [p ≤ 0.05). In addition, the number of taxa identified varied considerably between Q, with H, M and L comprised of 296, 103 and 192 unique taxa, respectively. High and L Q resulted in similar pattern of dissimilarity on a principle component analysis plot, which differed considerably from M.
Results from this exercise highlight the importance of filtering reads in AP microbiome data. Retention of low-quality nucleotides can lead to sequencing errors being misinterpreted as unique taxa, thereby inflating diversity metrics, while H may eliminate rare but potentially important taxa altogether. Medium-quality filtering may inadvertently remove both genuine low-abundance taxa and higher quality reads that would contribute to the richness and evenness required for a higher Shannon index. Analysis of the microbial composition of APs systems may benefit from a multi-filtering approach, where H and L quality datasets are analyzed together. Rare taxa that appear consistently across both H and L filtering could be treated as part of the ‘extended core microbiome’, which may be functionally important despite their low abundance.