Publications


Co-authored papers are tagged. Corresponding, co-corresponding, first, and co-first papers are not. Please click on the link of each paper for a full list of authors.
2024:
  • Optimizing long-term prevention of cardiovascular disease with reinforcement learning
    medRxiv.  [medRxiv]
  • Unraveling the Genetic Susceptibility of Irritable Bowel Syndrome: integrative genome-wide analyses in 845,492 individuals: a diagnostic study
    International Journal of Surgery.  [Int. J. Surg.]
  • Repun: an accurate small variant representation unification method for multiple sequencing platforms
    Briefing in Bioinformatics.  [Brief. Bioinform.]
  • Clair3-RNA: A deep learning-based small variant caller for long-read RNA sequencing data
    bioRxiv.  [bioRxiv]
  • AutoPM3: Enhancing Variant Interpretation via LLM-driven PM3 Evidence Extraction from Scientific Literature
    bioRxiv.  [bioRxiv]
  • EPInformer: A Scalable Deep Learning Framework for Gene Expression Prediction by Integrating Promoter-enhancer Sequences with Multimodal Epigenomic Data
    bioRxiv.  [bioRxiv]
  • Towards a new standard in genomic data privacy: a realization of owner-governance
    bioRxiv.  [bioRxiv]
  • CellContrast: Reconstructing Spatial Relationships in Single-Cell RNA Sequencing Data via Deep Contrastive Learning
    Patterns.  [Patterns]
  • Development and validation of a tool to stratify the treatment effect of low-dose aspirin in patients with cardiovascular disease: VISTA (Vascular Intervention Stratification Tool for Aspirin)
    medRxiv.  [medRxiv]
  • Protein Domain-Specific Genotype-Phenotype Correlation Study of Neurofibromatosis Type 1
    SSRN Preprints.  [SSRN Preprints]
  • Development and validation of risk prediction model for recurrent cardiovascular events among Chinese: the Personalized CARdiovascular DIsease risk Assessment for Chinese model
    European Heart Journal - Digital Health.  [Eur. Heart J. Digit. Health] [medRxiv]
  • ClusterV-Web: A User-Friendly Tool for Profiling HIV Quasispecies and Generating Drug Resistance Reports from Nanopore Long-Read Data
    Bioinformatics Advances.  [Bioinform. Adv.]
  • Investigating shared genetic architecture between inflammatory bowel diseases and primary biliary cholangitis
    JHEP Reports.  [JHEP Rep.]
  • MirtronStructDB: A Comprehensive Database of Mirtrons with Predicted Secondary Structures
    F1000 Research.  [F1000 Res.]
  • (co-authored) ARGNet: using deep neural networks for robust identification and classification of antibiotic resistance genes from sequences
    Microbiome.  [Microbiome]
  • (co-authored) Adverse clinical outcomes and immunosuppressive microenvironment of RHO-GTPase activation pattern in hepatocellular carcinoma
    Journal of Translational Medicine.  [J. Transl. Med.]
  • (co-authored) Assessing the reproducibility, stability, and biological interpretability of multimodal computed tomography image features for prognosis in advanced non-small cell lung cancer
    iRadiology.  [iRadiology]
  • (co-authored) PET/CT deep learning prognosis for treatment decision support in esophageal squamous cell carcinoma
    Insights into imaging.  [Insights Imaging]
  • (co-authored) Sleep patterns, genetic susceptibility, and digestive diseases: A large-scale longitudinal cohort study
    International Journal of Surgery.  [Int. Surg. J.]
  • (co-authored) SARS-CoV-2 variants divergently infect and damage cardiomyocytes in vitro and in vivo
    Cell & Bioscience.  [Cell Biosci]
  • (co-authored) Unveiling promising drug targets for autism spectrum disorder: insights from genetics, transcriptomics, and proteomics
    Briefing in Bioinformatics.  [Brief. Bioinform.]
2023:
  • ClairS: a deep-learning method for long-read somatic small variant calling
    bioRxiv.  [bioRxiv]
  • Primary Prevention Cardiovascular Disease Risk Prediction Model for Contemporary Chinese (1°P-CARDIAC): Model Derivation and Validation Using a Hybrid Statistical and Machine-Learning Approach
    SSRN Preprints.  [SSRN Preprints]
  • High-resolution diploid 3D genome reconstruction using Pore-C data
    bioRxiv.  [bioRxiv]
  • Large-scale Dataset and Effective Model for Variant-Disease Associations Extraction
    ACM-BCB 2023.  [ACM-BCB 2023]
  • Exploring Pair-Aware Triangular Attention for Biomedical Relation Extraction
    ACM-BCB 2023.  [ACM-BCB 2023]
  • Long-Read Sequencing with Hierarchical Clustering for Antiretroviral Resistance Profiling of Mixed Human Immunodeficiency Virus Quasispecies
    Clinical Chemistry.  [Clin. Chem] [GitHub]
  • Boosting variant-calling performance with multi-platform sequencing data using Clair3-MP
    BMC Bioinformatics.  [BMC Bioinform.] [GitHub]
  • Ultra-low coverage genome-wide association study - insights into gestational age using 17,844 embryo samples with preimplantation genetic testing
    Genome Medicine.  [Genome Med.] [medRxiv]
  • Evaluation of Mycobacterium Tuberculosis enrichment in metagenomic samples using ONT adaptive sequencing and amplicon sequencing for identification and variant calling
    Scientific reports.  [Sci. Rep.] [GitHub]
  • Integrated modeling framework reveals co-regulation of transcription factors, miRNAs and lncRNAs on cardiac developmental dynamics
    Stem Cell Research & Therapy.  [Stem Cell Res. Ther]
  • (co-authored) The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models
    Cell.  [Cell]
2022:
  • HKG: An open genetic variant database of 205 Hong Kong Cantonese exomes
    NAR Genomics and Bioinformatics.  [NARGB] [HKG database]
  • ECNano: A Cost-Effective Workflow for Target Enrichment Sequencing and Accurate Variant Calling on 4,800 Clinically Significant Genes Using a Single MinION Flowcell
    BMC Medical Genomics.  [BMC Medical Genom] [GitHub]
  • SENSV: Detecting Structural Variations with Precise Breakpoints using Low-Depth WGS Data from a Single Oxford Nanopore MinION Flowcell
    Scientific Reports.  [Sci. Rep] [GitHub]
  • Streptococcus oriscaviae sp. nov. Infection Associated with Guinea Pigs
    Microbiology Spectrum.  [Microbiol. Spectr.]
  • Symphonizing pileup and full-alignment for deep learning-based long-read variant calling
    Nature Computational Science.  [NatComputSci] [GitHub]
  • Clair3-Trio: high-performance Nanopore long-read variant calling in family trios with Trio-to-Trio deep neural networks
    Briefings in Bioinformatics  [Brief. Bioinform.] [GitHub]
  • Assembly-free discovery of human novel sequences using long reads
    DNA Research.  [DNA Res.]
  • Duet: SNP-Assisted Structural Variant Calling and Phasing Using Oxford Nanopore Sequencing
    BMC Bioinformatics.  [BMC Bioinform.] [GitHub]
  • (co-authored) Generalized radiograph representation learning via cross-supervision between images and free-text radiology reports
    Nature Machine Intelligence.  [Nat. Mach. Intell.] [PDF] [GitHub]
  • (co-authored) Same-Cell Co-Occurrence of RAS Hotspot and BRAF V600E Mutations in Treatment-Naive Colorectal Cancer
    JCO Precision Oncology.  [JCO Precis. Oncol.]
  • (co-authored) Temporal Control of the WNT Signaling Pathway During Cardiac Differentiation Impacts Upon the Maturation State of Human Pluripotent Stem Cell Derived Cardiomyocytes
    Frontiers in Molecular Biosciences.  [Front Mol Biosci.]
  • (co-authored) A self-blinking DNA probe for live-cell superresolution 3D imaging of hierarchical chromatin structures
    bioRxiv.  [bioRxiv]
2021:
  • Building a Chinese pan-genome of 486 individuals
    Communications Biology.  [Commun. Biol.]
  • The applications and potentials of nanopore sequencing in the (epi)genome and (epi)transcriptome era
    The Innovation.  [The Innovation]
  • SARS‐CoV‐2 biology and variants: anticipation of viral evolution and what needs to be done
    Environmental Microbiology.  [Environ. Microbiol.] [Traditional Chinese Translation] [Simplified Chinese Translation] 
    [SARS-CoV-2 Cytosine Attenuation Tracking]
  • High Prevalence and Mechanism Associated With Extended Spectrum Beta-Lactamase-Positive Phenotype in Laribacter hongkongensis
    Frontiers in Microbiology.  [Front. Microbiol.]
  • BioNumQA-BERT: Answering Biomedical Questions Using Numerical Facts with a Deep Language Representation Model
    ACM-BCB 2021.  [PDF] [Conference]
  • RENET2: High-Performance Full-text Gene-Disease Relation Extraction with Iterative Training Data Expansion
    NAR Genomics and Bioinformatics.  [NARGB] [GitHub]
  • DNA methylation affects pre-mRNA transcriptional initiation and processing in Arabidopsis
    bioRxiv.  [bioRxiv]
  • (co-authored) Drug Repurposing for the Treatment of COVID-19: A Knowledge Graph Approach
    Advanced Therapeutics.  [Adv. Ther.]
  • (co-authored) Distinct disease severity between children and older adults with COVID-19: Impacts of ACE2 expression, distribution, and lung progenitor cells
    Clinical Infectious Disease.  [CID]
  • (co-authored) Clinical analysis and pluripotent stem cells-based model reveal possible impacts of ACE2 and lung progenitor cells on infacts vulnerable to COVID-19
    Theranostics.  [Theranostics]
2020:
  • Exploring the limit of using a deep neural network on pileup data for germline variant calling
    Nature Machine Intelligence.  [Nat. Mach. Intell.] [PDF] [GitHub]
  • CONNET: Accurate Diploid Genome Consensus in de novo Assembly of Nanopore Sequencing Data via Deep Learning
    iScience.  [iScience] [GitHub]
  • Skyhawk: An Artificial Neural Network-based discriminator for reviewing clinically significant genomic variants
    International Journal of Computational Biology and Drug Design.  [IJCBDD] [PDF] [GitHub]
  • MegaPath: sensitive and rapid pathogen detection using metagenomic NGS data
    BMC Genomics  [BMC Genomics] [SourceForge]
  • MegaPath-Nano: Accurate Compositional Analysis and Drug-level Antimicrobial Resistance Detection Software for Oxford Nanopore Long-read Metagenomics
    IEEE BIBM 2020.  [PDF] [Conference]
  • ChromSeg: Two-Stage Framework for Overlapping Chromosome Segmentation and Reconstruction
    IEEE BIBM 2020.  [PDF] [Conference]
  • Tracking cytosine depletion in SARS-CoV-2
    bioRxiv.  [bioRxiv] [Website]
  • (co-authored) High-quality bacterial genomes of a partial-nitritation/anammox system by an iterative hybrid assembly method
    Microbiome.  [Microbiome]
  • (co-authored) Identification of Cooperative Gene Regulation Among Transcription Factors, LncRNAs, and MicroRNAs in Diabetic Nephropathy Progression
    Frontiers in Genetics.  [Front. Genet.]
  • (co-authored) Translocator: local realignment and global remapping enabling accurate translocation detection using single-molecule sequencing long reads
    ACM-BCB 2020.  [PDF] [Conference]
  • (co-authored) MC-Explorer: Analyzing and Visualizing Motif-Cliques on Large Networks
    ICDE 2020.  [PDF] [Demo]
2019:
  • RENET: A Deep Learning Approach for Extracting Gene-Disease Associations from Literature
    RECOMB 2019.  [Springer]
  • Clairvoyante: a multi-task convolutional deep neural network for variant calling in Single Molecule Sequencing
    Nature Communications.  [Nat. Comm.] [GitHub]
2018:
  • Restricted Boltzmann Machine and its Potential to Better Predict Cancer Survival
    Biomed J Sci & Tech Res.  [PDF]
  • (co-authored) Transcriptome Analysis of Acute Phase Liver Graft Injury in Liver Transplantation
    Biomedicines.  [PubMed]
  • (co-authored) AC-DIAMOND v1: Accelerating large-scale DNA-protein alignment
    Bioinformatics.  [PubMed] [GitHub]
  • (co-authored) MegaPath: Low-Similarity Pathogen Detection from Metagenomic NGS Data (Extended Abstract)
    ICCABS 2018.  [IEEE]
2017:
  • First Draft Genome Sequence of the Pathogenic Fungus Lomentospora prolificans (formerly Scedosporium prolificans)
    G3: Genes, Genomes, Genetics.  [PubMed]
  • Serine peptidase inhibitor Kazal type 1 (SPINK1) as novel downstream effector of the cadherin-17/β-catenin axis in hepatocellular carcinoma
    Cellular Oncology.  [PubMed]
  • LRSim: a Linked Reads Simulator generating insights for better genome partitioning
    Computational and Structural Biotechnology Journal.  [PubMed] [GitHub]
  • 16GT: a fast and sensitive variant caller using a 16-genotype probabilistic model
    GigaScience.  [PubMed] [GitHub]
  • (co-authored) MegaGTA: a sensitive and accurate metagenomic gene-targeted assembler using iterative de Bruijn graphs
    BMC Bioinformatics.  [PubMed]
2016:
  • MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices
    Methods.  [PubMed]
  • BASE: a practical de novo assembler for large genomes using long NGS reads
    BMC Genomics.  [PubMed]
  • (co-authored) AC-DIAMOND: Accelerating Protein Alignment via Better SIMD Parallelization and Space-Efficient Indexing
    IWBBIO.  [Springer]
2015:
  • database.bio: a web application for interpreting human variations
    Bioinformatics.  [PubMed]
  • De novo assembly of a haplotype-resolved human genome
    Nature Biotechnology.  [PubMed]
  • MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph
    Bioinformatics.  [PubMed] [GitHub]
  • MICA: A fast short-read aligner that takes full advantage of Intel Many Integrated Core Architecture (MIC)
    BMC Bioinformtics.  [PubMed] [SourceForge] [GitHub]
  • (co-authored) Genome-Wide Mapping of Structural Variations Reveals a Copy Number Variant That Determines Reproductive Morphology in Cucumber
    Plant Cell.  [PubMed]
2014:
  • SOAPdenovo-Trans: De novo transcriptome assembly with short RNA-Seq reads
    Bioinformatics.  [PubMed] [SourceForge] [GitHub]
  • BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU
    PeerJ.  [PubMed] [SourceForge]
  • Exome sequencing of tumor cell lines: Optimizing for cancer variants
    Cancer Research.  [AACR]
  • GPU-Accelerated BWT Construction for Large Collection of Short Reads
    ArXiv.  [PDF]
2013:
  • SOAP3-dp: Fast, Accurate and Sensitive GPU-based Short Read Aligner
    PLoS ONE.  [PubMed] [GitHub]
2012:
  • SOAPdenovo2: An empirically improved memory-efficient short-read de novo assembler
    GigaScience.  [PubMed] [SourceForge] [GitHub]
  • COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly
    Bioinformatics.  [PubMed] [SourceForge]
  • The oyster genome reveals stress adaptation and complexity of shell formation
    Nature.  [PubMed]
  • (co-authored) Single-base resolution maps of cultivated and wild rice methylomes and regulatory roles of DNA methylation in plant gene expression
    BMC Genomics.  [PubMed]
  • (co-authored) Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species
    GigaScience.  [PubMed]
  • (co-authored) An integrated map of genetic variation from 1,092 human genome
    Nature.  [PubMed]
2011:
  • Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly
    Nature Biotechnology.  [PubMed]
  • (co-authored) Mapping copy number variation by population-scale genome sequencing
    Nature.  [PubMed]
  • (co-authored) Assemblathon 1: A competitive assessment of de novo short read assembly methods
    Genome Research.  [PubMed]
2010:
  • Building the sequence map of the human pan-genome
    Nature Biotechnology.  [PubMed]
  • (co-authored) Sequencing of 50 Human Exomes Reveals Adaptation to High Altitude
    Science.  [PubMed]
  • (co-authored) The DNA Methylome of Human Peripheral Blood Mononuclear Cells
    PLoS Biology.  [PubMed]
  • (co-authored) International network of cancer genome projects
    Nature.  [PubMed]
  • (co-authored) A map of human genome variation from population scale sequencing
    Nature.  [PubMed]