Rainbow trout (Oncorhynchus mykiss) are a widespread aquaculture species and a model organism for fish research. Currently, there are four chromosome-level rainbow trout genome assemblies available. The first genome (Omyk_1.0) was assembled from the Swanson line, a line of rainbow trout from Alaska that has been in hatcheries for two generations and is thus classified as “semi-domesticated”. The second rainbow trout genome assembly, USDA_OmykA_1.1, came from a different clonal line, Arlee, which is fully domesticated and originated in northern California. The other two genome assemblies are from wild fish that were captured in the Whale Rock Reservoir in southern California (USDA_OmykWR_1.0) or Keithly Creek in Idaho (USDA_OmykKC_1.0). Annotation of protein coding genes from the four genomes is needed to enrich the reference transcriptome and to enable pan-gene comparative analyses. However, a high-quality RefSeq annotation from NCBI is currently only available for the Arlee reference genome assembly.
Here, we developed a bioinformatic annotation pipeline to generate a reference transcriptome for each of the four genome assemblies using the Comparative Annotation Toolkit, with the Arlee RefSeq gene set as the reference, along with the BRAKER3 pipeline for the incorporation of novel gene predictions. Input for gene models came from public rainbow trout RNA-seq data and the OrthoDB database. New long-read transcriptome (Iso-Seq) data that we generated from a disease resistance study was used for discovery of novel genes and transcript isoforms in all four genomes. For functional annotation of the predicted gene models, we leveraged the Arlee RefSeq gene set, as well as the InterPro database to predict protein domains, gene ontology and pathways.
Additionally, to better understand the impact of rainbow trout genome structural variation on gene structure and content, we used the program MCScanX to identify syntenic blocks based on gene order collinearity among the four genomes. The synteny information will enable us to identify gene differences that may be associated with differences in the life history and evolution of the four genetic lines.
Overall, the annotated rainbow trout genomes and synteny dataset provide vital resources for the aquaculture research community and for basic research on the physiology, genetics and evolution of rainbow trout.