Aquaculture 2022

February 28 - March 4, 2022

San Diego, California

EXPRESSION PROFILE ANALYSIS VIA MACHINE LEARNING TO IDENTIFY CRITICAL GENES DETERMINANT OF HYBRID STRIPED BASS Morone chrysops × M. saxatilis GROWTH

Linnea K. Andersen*, Willy A. Valdivia-Granda, Benjamin J. Reading

North Carolina State University

Department of Applied Ecology

100 Brooks Ave Box 7617

Raleigh, NC 27607

lkander5@ncsu.edu

 



Hybrid striped bass are the fourth largest finfish aquaculture industry in the U.S. and are reared in other countries worldwide. Decades of research have been conducted on the parent fish of the hybrid striped bass, striped bass (Morone saxatilis) and white bass (M. chrysops). Numerous genomic data, including recently updated reference genomes and transcriptomes, have been generated for these species and their hybrid offspring. The high volume of this information has increased the focus on developing data analysis to improve genetic breeding efforts. However, analytical approaches that reduce highly dimensional data into information that can be applied to breed consistently high-yielding cultivars are largely unavailable. Here we report a novel machine learning (ML) data analysis pipeline that reduces high-throughput “omics” data to advance breeding hybrid striped bass and their parental fish. Transcriptomes of hybrid striped bass white muscle tissue (n = 40 individuals) collected at final harvest were scanned against a genomic library of 34,000 unique protein motif fingerprints (MFs). Each MF is a twelve amino acids-long fragment that forms quantitative patterns. These data were initially reduced by excluding those that did not significantly differ in expression (mean read count in each of the six reading frames) between any sample or technical replicate thereof, leaving 15,000 MFs of interest. A ML pipeline and cross-validation strategy was applied to further reduce these data by determining MF inclusion or exclusion points to focus on those most critical to growth performance. Trained ML models were used to predict fish growth performance as either superior, inferior, and/or average as determined at two critical time points of production: grading (2-3 months of age) and final harvest (15 months of age). The data analysis pipeline identified fewer than 1,000 unique MFs as highly determinant of grade and/or growth performance at final harvest of hybrid striped bass. When concatenated, these MFs annotated to thirty-four (34) unigenes, all of which can be regarded as potential targets for breeding or other genetic targeting efforts. Moreover, the examination of individual MFs mapped to translated regions of the reference genome assemblies for the parent fish have enabled the determination of instances in which the expression of one allele specific to the striped bass or white bass parent is more influential for growth performance in the hybrid offspring for the first time. This is the only study that has been conducted to date that explores gene expression at the allele-level in hybrid striped bass to understand heterosis. Both the findings of this study and the analysis pipeline used to produce these results can be utilized by other groups concerned with aquaculture or any other animal rearing for agricultural purposes.