Publications

Note: Co-first authored, co-authored papers, and editorialships are tagged. Corresponding, co-corresponding, and first authored papers are not. Please click on the link of each paper for a full list of authors.

2026

Clair3-RNA: A deep learning-based small variant caller for long-read RNA sequencing data
Nature Communications
(Guest editor) International Conference on Genome Informatics ISCB-Asia 2025 Abstract Book
Briefings in Bioinformatics

2025

Clair-Mosaic: A deep-learning method for long-read mosaic small variant calling
bioRxiv
Guess till correct: Gungnir codec enabling high error-tolerance and low-redundancy DNA storage through substantial computing power
bioRxiv
Optimizing long-term prevention of cardiovascular disease with reinforcement learning
npj Digital Medicine
Reconstruction of diploid higher-order human 3D genome interactions from noisy Pore-C data using Dip3D
Nature Structural & Molecular Biology
Primary Prevention Cardiovascular Disease Risk Prediction Model for Contemporary Chinese (1°P-CARDIAC): Model Derivation and Validation Using a Hybrid Statistical and Machine-Learning Approach
PLoS ONE
Assessing large-scale genomic language models in predicting personal gene expression: promises and limitations
bioRxiv
ClairS-TO: A deep-learning method for long-read tumor-only somatic small variant calling
Nature Communications
Real-time raw signal genomic analysis using fully integrated memristor hardware
Nature Computational Science
Toward owner governance in genomic data privacy with Governome
Cell Reports Methods
AutoPM3: Enhancing Variant Interpretation via LLM-driven PM3 Evidence Extraction from Scientific Literature
Bioinformatics
(co-authored) DECODE: Deep learning-based common deconvolution framework for various omics data
ResearchSquare
(co-authored) cuteFC: regenotyping structural variants through an accurate and efficient force-calling method
Genome Biology
(co-authored) Observational, causal relationship and shared genetic basis between cholelithiasis and gastroesophageal reflux disease: evidence from a cohort study and comprehensive genetic analysis
GigaScience

2024

ShiftCAM: A Time-Domain Content Addressable Memory Utilizing Shifted Hamming Distance for Robust Genome Analysis
ICCAD 2024
Unraveling the Genetic Susceptibility of Irritable Bowel Syndrome: integrative genome-wide analyses in 845,492 individuals: a diagnostic study
International Journal of Surgery
Repun: an accurate small variant representation unification method for multiple sequencing platforms
Briefings in Bioinformatics
EPInformer: A Scalable Deep Learning Framework for Gene Expression Prediction by Integrating Promoter-enhancer Sequences with Multimodal Epigenomic Data
bioRxiv
CellContrast: Reconstructing Spatial Relationships in Single-Cell RNA Sequencing Data via Deep Contrastive Learning
Patterns
Development and validation of a tool to stratify the treatment effect of low-dose aspirin in patients with cardiovascular disease: VISTA (Vascular Intervention Stratification Tool for Aspirin)
medRxiv
Protein Domain-Specific Genotype-Phenotype Correlation Study of Neurofibromatosis Type 1
SSRN Preprints
Development and validation of risk prediction model for recurrent cardiovascular events among Chinese: the Personalized CARdiovascular DIsease risk Assessment for Chinese model
European Heart Journal - Digital Health
ClusterV-Web: A User-Friendly Tool for Profiling HIV Quasispecies and Generating Drug Resistance Reports from Nanopore Long-Read Data
Bioinformatics Advances
Investigating shared genetic architecture between inflammatory bowel diseases and primary biliary cholangitis
JHEP Reports
MirtronStructDB: A Comprehensive Database of Mirtrons with Predicted Secondary Structures
F1000 Research
(co-authored) ARGNet: using deep neural networks for robust identification and classification of antibiotic resistance genes from sequences
Microbiome
(co-authored) Adverse clinical outcomes and immunosuppressive microenvironment of RHO-GTPase activation pattern in hepatocellular carcinoma
Journal of Translational Medicine
(co-authored) Assessing the reproducibility, stability, and biological interpretability of multimodal computed tomography image features for prognosis in advanced non-small cell lung cancer
iRadiology
(co-authored) PET/CT deep learning prognosis for treatment decision support in esophageal squamous cell carcinoma
Insights into Imaging
(co-authored) Sleep patterns, genetic susceptibility, and digestive diseases: A large-scale longitudinal cohort study
International Journal of Surgery
(co-authored) SARS-CoV-2 variants divergently infect and damage cardiomyocytes in vitro and in vivo
Cell & Bioscience
(co-authored) Unveiling promising drug targets for autism spectrum disorder: insights from genetics, transcriptomics, and proteomics
Briefings in Bioinformatics

2023

ClairS: a deep-learning method for long-read somatic small variant calling
bioRxiv
Large-scale Dataset and Effective Model for Variant-Disease Associations Extraction
ACM-BCB 2023
Exploring Pair-Aware Triangular Attention for Biomedical Relation Extraction
ACM-BCB 2023
Long-Read Sequencing with Hierarchical Clustering for Antiretroviral Resistance Profiling of Mixed Human Immunodeficiency Virus Quasispecies
Clinical Chemistry
Boosting variant-calling performance with multi-platform sequencing data using Clair3-MP
BMC Bioinformatics
Ultra-low coverage genome-wide association study - insights into gestational age using 17,844 embryo samples with preimplantation genetic testing
Genome Medicine
Evaluation of Mycobacterium Tuberculosis enrichment in metagenomic samples using ONT adaptive sequencing and amplicon sequencing for identification and variant calling
Scientific Reports
Integrated modeling framework reveals co-regulation of transcription factors, miRNAs and lncRNAs on cardiac developmental dynamics
Stem Cell Research & Therapy
(co-authored) The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models
Cell

2022

HKG: An open genetic variant database of 205 Hong Kong Cantonese exomes
NAR Genomics and Bioinformatics
ECNano: A Cost-Effective Workflow for Target Enrichment Sequencing and Accurate Variant Calling on 4,800 Clinically Significant Genes Using a Single MinION Flowcell
BMC Medical Genomics
SENSV: Detecting Structural Variations with Precise Breakpoints using Low-Depth WGS Data from a Single Oxford Nanopore MinION Flowcell
Scientific Reports
Symphonizing pileup and full-alignment for deep learning-based long-read variant calling
Nature Computational Science
Clair3-Trio: high-performance Nanopore long-read variant calling in family trios with Trio-to-Trio deep neural networks
Briefings in Bioinformatics
Assembly-free discovery of human novel sequences using long reads
DNA Research
Duet: SNP-Assisted Structural Variant Calling and Phasing Using Oxford Nanopore Sequencing
BMC Bioinformatics
(co-first authored) Streptococcus oriscaviae sp. nov. Infection Associated with Guinea Pigs
Microbiology Spectrum
(co-authored) Generalized radiograph representation learning via cross-supervision between images and free-text radiology reports
Nature Machine Intelligence
(co-authored) Same-Cell Co-Occurrence of RAS Hotspot and BRAF V600E Mutations in Treatment-Naive Colorectal Cancer
JCO Precision Oncology
(co-authored) Temporal Control of the WNT Signaling Pathway During Cardiac Differentiation Impacts Upon the Maturation State of Human Pluripotent Stem Cell Derived Cardiomyocytes
Frontiers in Molecular Biosciences
(co-authored) A self-blinking DNA probe for live-cell superresolution 3D imaging of hierarchical chromatin structures
bioRxiv

2021

Building a Chinese pan-genome of 486 individuals
Communications Biology
The applications and potentials of nanopore sequencing in the (epi)genome and (epi)transcriptome era
The Innovation
SARS‐CoV‐2 biology and variants: anticipation of viral evolution and what needs to be done
Environmental Microbiology
BioNumQA-BERT: Answering Biomedical Questions Using Numerical Facts with a Deep Language Representation Model
ACM-BCB 2021
RENET2: High-Performance Full-text Gene-Disease Relation Extraction with Iterative Training Data Expansion
NAR Genomics and Bioinformatics
DNA methylation affects pre-mRNA transcriptional initiation and processing in Arabidopsis
bioRxiv
(co-first authored) High Prevalence and Mechanism Associated With Extended Spectrum Beta-Lactamase-Positive Phenotype in Laribacter hongkongensis
Frontiers in Microbiology
(co-authored) Drug Repurposing for the Treatment of COVID-19: A Knowledge Graph Approach
Advanced Therapeutics
(co-authored) Distinct disease severity between children and older adults with COVID-19: Impacts of ACE2 expression, distribution, and lung progenitor cells
Clinical Infectious Disease
(co-authored) Clinical analysis and pluripotent stem cells-based model reveal possible impacts of ACE2 and lung progenitor cells on infacts vulnerable to COVID-19
Theranostics

2020

Exploring the limit of using a deep neural network on pileup data for germline variant calling
Nature Machine Intelligence
CONNET: Accurate Diploid Genome Consensus in de novo Assembly of Nanopore Sequencing Data via Deep Learning
iScience
Skyhawk: An Artificial Neural Network-based discriminator for reviewing clinically significant genomic variants
International Journal of Computational Biology and Drug Design
MegaPath-Nano: Accurate Compositional Analysis and Drug-level Antimicrobial Resistance Detection Software for Oxford Nanopore Long-read Metagenomics
IEEE BIBM 2020
ChromSeg: Two-Stage Framework for Overlapping Chromosome Segmentation and Reconstruction
IEEE BIBM 2020
Tracking cytosine depletion in SARS-CoV-2
bioRxiv
(co-authored) MegaPath: sensitive and rapid pathogen detection using metagenomic NGS data
BMC Genomics
(co-authored) High-quality bacterial genomes of a partial-nitritation/anammox system by an iterative hybrid assembly method
Microbiome
(co-authored) Identification of Cooperative Gene Regulation Among Transcription Factors, LncRNAs, and MicroRNAs in Diabetic Nephropathy Progression
Frontiers in Genetics
(co-authored) Translocator: local realignment and global remapping enabling accurate translocation detection using single-molecule sequencing long reads
ACM-BCB 2020
(co-authored) MC-Explorer: Analyzing and Visualizing Motif-Cliques on Large Networks
ICDE 2020

2019

Clairvoyante: a multi-task convolutional deep neural network for variant calling in Single Molecule Sequencing
Nature Communications
(co-first authored) RENET: A Deep Learning Approach for Extracting Gene-Disease Associations from Literature
RECOMB 2019

2018

Restricted Boltzmann Machine and its Potential to Better Predict Cancer Survival
Biomed J Sci & Tech Res
(co-authored) Transcriptome Analysis of Acute Phase Liver Graft Injury in Liver Transplantation
Biomedicines
(co-authored) AC-DIAMOND v1: Accelerating large-scale DNA-protein alignment
Bioinformatics
(co-authored) MegaPath: Low-Similarity Pathogen Detection from Metagenomic NGS Data (Extended Abstract)
ICCABS 2018

2017

First Draft Genome Sequence of the Pathogenic Fungus Lomentospora prolificans (formerly Scedosporium prolificans)
G3: Genes, Genomes, Genetics
LRSim: a Linked Reads Simulator generating insights for better genome partitioning
Computational and Structural Biotechnology Journal
16GT: a fast and sensitive variant caller using a 16-genotype probabilistic model
GigaScience
(co-first authored) Serine peptidase inhibitor Kazal type 1 (SPINK1) as novel downstream effector of the cadherin-17/β-catenin axis in hepatocellular carcinoma
Cellular Oncology
(co-authored) MegaGTA: a sensitive and accurate metagenomic gene-targeted assembler using iterative de Bruijn graphs
BMC Bioinformatics

2016

BASE: a practical de novo assembler for large genomes using long NGS reads
BMC Genomics
(co-first authored) MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices
Methods
(co-authored) AC-DIAMOND: Accelerating Protein Alignment via Better SIMD Parallelization and Space-Efficient Indexing
IWBBIO

2015

database.bio: a web application for interpreting human variations
Bioinformatics
MICA: A fast short-read aligner that takes full advantage of Intel Many Integrated Core Architecture (MIC)
BMC Bioinformatics
(co-first authored) De novo assembly of a haplotype-resolved human genome
Nature Biotechnology
(co-first authored) MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph
Bioinformatics
(co-authored) Genome-Wide Mapping of Structural Variations Reveals a Copy Number Variant That Determines Reproductive Morphology in Cucumber
Plant Cell

2014

BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU
PeerJ
(co-first authored) Exome sequencing of tumor cell lines: Optimizing for cancer variants
Cancer Research
(co-first authored) SOAPdenovo-Trans: De novo transcriptome assembly with short RNA-Seq reads
Bioinformatics
(co-authored) GPU-Accelerated BWT Construction for Large Collection of Short Reads
ArXiv

2013

SOAP3-dp: Fast, Accurate and Sensitive GPU-based Short Read Aligner
PLoS ONE

2012

SOAPdenovo2: An empirically improved memory-efficient short-read de novo assembler
GigaScience
COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly
Bioinformatics
(co-first authored) The oyster genome reveals stress adaptation and complexity of shell formation
Nature
(co-authored) Single-base resolution maps of cultivated and wild rice methylomes and regulatory roles of DNA methylation in plant gene expression
BMC Genomics
(co-authored) Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species
GigaScience
(co-authored) An integrated map of genetic variation from 1,092 human genome
Nature

2011

(co-first authored) Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly
Nature Biotechnology
(co-authored) Mapping copy number variation by population-scale genome sequencing
Nature
(co-authored) Assemblathon 1: A competitive assessment of de novo short read assembly methods
Genome Research

2010

(co-first authored) Building the sequence map of the human pan-genome
Nature Biotechnology
(co-authored) Sequencing of 50 Human Exomes Reveals Adaptation to High Altitude
Science
(co-authored) The DNA Methylome of Human Peripheral Blood Mononuclear Cells
PLoS Biology
(co-authored) International network of cancer genome projects
Nature
(co-authored) A map of human genome variation from population scale sequencing
Nature