World Aquaculture Singapore 2022

November 29 - December 2, 2022

Singapore

PARENTAGE ASSIGNMENT OF STRIPED CATFISH Pangasianodon hypophthalmus WITH SHALLOW WHOLE GENOME SEQUENCING DATA

Dao Minh Hai *, Duong Thuy Yen, Nguyen Thanh Phuong, Pham Thanh Liem, Bui Minh Tam, Do Thi Thanh Huong, Bui Thi Bich Hang, Vo Nam Son, Dang Quang Hieu, Wouter Coppieters, Mutien-Marie Garigliany, Patrick Kestemont, Nicolas Antoine-Moussiaux, Frédéric Farnir.

* FARAH/Sustainable Animal Production, Faculty of Veterinary Medicine, University of Liege (B43), 4000-Liege, Belgium

* Colleges of Aquaculture and Fisheries, Can Tho University, Viet Nam

Email: dmhai@ctu.edu.vn

 



Pedigree information is important in estimating genetic parameters in selective programs and hatchery management of aquaculture. Several previous studies on parentage assignment have focused on using genetic data generated from micro-satellite markers and more recently from SNP arrays. In this study, we evaluate the performance of using shallow whole genome sequencing (SWGS) data to analyze parentage assignment of striped catfish (Pangasianodon hypophthalmus) in lieu of traditional array data. To prepare genetic data, we performed whole genome deep sequencing of one catfish with high fold coverage (~ 144X), and used this information to establish a de novo draft reference genome. For 59 parents (30 males and 29 females, leading to 870 full-sib families) and 500 offspring, we used SWGS with fold coverage of ~ 1 to 2X (parents) and ~ 0.5 X (offspring) per individual. We mapped SWGS data on the draft reference genome to identify genomic variants, including SNPs, that we will use for parentage assignment. The use of SWGS data raises two challenges: First, for low-coverage (e.g., < 2X), confirmed genotypes for offspring and parents are in most case not available. Second, read errors are common in next generation sequencing. To address these issues, we have developed a new parentage assignment algorithm based on a likelihood approach to identify the most suitable (i.e likely) set of parents for each offspring. In order to test this likelihood approach, we have simulated data and used the likelihood approach to try to reconstruct the families. The results show that quasi-perfect assignment can be obtained (with accuracy: 0.993) in the conditions of our experiment if at least 5.000 SNPs are used (assuming an error rate of 0.01). For empirical data, we have extracted ~26.000 high quality SNPs from 5.900.000 genomic variants (obtained from mapping genomic data from the 59 parents on the draft reference genome) and used these SNPs to infer the parents of 500 offspring. Our results demonstrate that using SWGS data can enable to generate highly accurate pedigree information using an appropriate algorithm. We also make the algorithm used in this study available as a standalone program called Shallowped.