Aquaculture America 2023

February 23 - 26, 2023

New Orleans, Louisiana USA

LOW-COST MICROHAPLOTYPE DISCOVERY AND ALLELE FREQUENCY ESTIMATION USING POOLED SEQUENCING DATA

Thomas A. Delomas* and Stuart Willis

 

USDA ARS National Cold Water Marine Aquaculture Center, Kingston, RI 02881, thomas.delomas@usda.gov

 



As the cost of obtaining genetic information has decreased, more applications for this information have been created. Genotypes are now used by breeding programs in a variety of aquaculture species for parentage inference and genomic selection. For parentage inference, low-density panels of highly variable loci are used to maximize statistical power when determining pedigree relationships. For genomic selection, a typical strategy is to apply a low-density panel of highly variable loci to the majority of individuals and a high-density panel to a subset of individuals. The missing genotypes are then imputed to yield high-density genotypes for all individuals. In both these applications, using a low-density panel often lowers genotyping costs sufficiently to make the application economically sustainable.

For commercial-scale breeding programs, low-density amplicon sequencing panels targeting biallelic single nucleotide polymorphisms (SNPs) are typically used due to the cost-efficiency and accuracy of amplicon sequencing. However, these panels are limited in statistical power because each locus only expresses two alleles. Microhaplotypes are loci that contain multiple SNPs close enough together to be genotyped in the same sequencing read. These loci can therefore display more than two alleles, which increases their statistical power above that of SNPs, and genotyping a microhaplotype locus via amplicon sequencing uses the same resources as a locus with one SNP. Panels of microhaplotypes therefore have the potential to be more cost-effective than panels of SNPs for parentage inference and imputation.

Development of microhaplotype panels is hindered by a lack of cost-effective methods for allele frequency estimation of candidate loci. Currently, candidate microhaplotypes for a low-density panel can be discovered by either a reduced representation technique (e.g., RAD-seq), which only surveys a small fraction of the genome, or whole genome sequencing of many individuals, which is typically cost-prohibitive. To address this barrier, we developed new computational methods for estimating candidate microhaplotype allele frequencies from pooled sequencing and low-coverage whole genome sequencing (skim-seq) data. We validated these methods using datasets from three different species: Pacific oysters Crassostrea gigas, Atlantic salmon Salmo salar, and Pacific lamprey Entosphenus tridentatus. Across all three datasets, allele frequency estimates were unbiased and mean square error plateaued at a depth of 20 – 30 reads / locus. This demonstrates that the developed methods will allow cost-effective pooled sequencing or skim-seq data to be used for discovery and evaluation of candidate microhaplotypes. In turn, this will facilitate the creation of low-density microhaplotype panels for parentage inference and imputation, thereby lowering genotyping costs for aquaculture breeding programs.