利用人类全基因组二代测序数据比较BWA-MEM和NovoAlign

Comparison of BWA-MEM and NovoAlign using human whole-genome next-generation sequencing reads

  • 摘要: 随着测序技术的发展,二代测序数据越来越多,将测序数据准确地比对到参考基因组是后续研究的基础.BWA-MEM和NovoAlign作为2个最常用的DNA序列比对软件,还没有评估其在基因组中不同结构区域的表现.本研究基于真实和模拟数据,对2个软件在人类基因组的低复杂度、片段性重复和其他区域进行了评估.结果显示:BWA-MEM将尽可能多的测序数据比对到基因组,且在低复杂度和片段性重复区域存在过度比对的现象,特别是在片段性重复区域的比对品质较低;而NovoAlign比对到基因组的序列数量低于BWA-MEM,但在片段性重复区域的比对品质较优,因此对于存在较多片段性重复区域的基因组来说,使用NovoAlign比对是一种更好的策略.

     

    Abstract: The development of sequencing technology has led to an increasing amount of next-generation sequencing data.To align numerous reads to reference genomes accurately is the basis for downstream analysis.BWA-MEM and NovoAlign, the most widely used DNA-seq alignment tools, have not been evaluated for applications to different structural regions in a genome.In the present work, we estimate the two alignment tools in low complexity region, segmental duplication region and the remaining region of the human genome using real and simulated data.BWA-MEM could align reads to reference genome, and even excessively align reads to low complexity regions and segmental duplication regions with low mapping quality.Compared with BWA-MEM, NovoAlign could align relatively fewer reads to reference genome, but most aligned reads have higher mapping quality.It is suggested that NovoAlign should be used when genomes are interspersed high segmental duplications.

     

/

返回文章
返回