Co-authored papers are tagged. Corresponding, co-corresponding, first, and co-first papers are not. Please click on the link of each paper for a full list of authors.
2024:
- Optimizing long-term prevention of cardiovascular disease with reinforcement learning
medRxiv. [medRxiv]
- Unraveling the Genetic Susceptibility of Irritable Bowel Syndrome: integrative genome-wide analyses in 845,492 individuals: a diagnostic study
International Journal of Surgery. [Int. J. Surg.]
- Repun: an accurate small variant representation unification method for multiple sequencing platforms
Briefing in Bioinformatics. [Brief. Bioinform.]
- Clair3-RNA: A deep learning-based small variant caller for long-read RNA sequencing data
bioRxiv. [bioRxiv]
- AutoPM3: Enhancing Variant Interpretation via LLM-driven PM3 Evidence Extraction from Scientific Literature
bioRxiv. [bioRxiv]
- EPInformer: A Scalable Deep Learning Framework for Gene Expression Prediction by Integrating Promoter-enhancer Sequences with Multimodal Epigenomic Data
bioRxiv. [bioRxiv]
- Towards a new standard in genomic data privacy: a realization of owner-governance
bioRxiv. [bioRxiv]
- CellContrast: Reconstructing Spatial Relationships in Single-Cell RNA Sequencing Data via Deep Contrastive Learning
Patterns. [Patterns]
- Development and validation of a tool to stratify the treatment effect of low-dose aspirin in patients with cardiovascular disease: VISTA (Vascular Intervention Stratification Tool for Aspirin)
medRxiv. [medRxiv]
- Protein Domain-Specific Genotype-Phenotype Correlation Study of Neurofibromatosis Type 1
SSRN Preprints. [SSRN Preprints]
- Development and validation of risk prediction model for recurrent cardiovascular events among Chinese: the Personalized CARdiovascular DIsease risk Assessment for Chinese model
European Heart Journal - Digital Health. [Eur. Heart J. Digit. Health] [medRxiv]
- ClusterV-Web: A User-Friendly Tool for Profiling HIV Quasispecies and Generating Drug Resistance Reports from Nanopore Long-Read Data
Bioinformatics Advances. [Bioinform. Adv.]
- Investigating shared genetic architecture between inflammatory bowel diseases and primary biliary cholangitis
JHEP Reports. [JHEP Rep.]
- MirtronStructDB: A Comprehensive Database of Mirtrons with Predicted Secondary Structures
F1000 Research. [F1000 Res.]
- (co-authored) ARGNet: using deep neural networks for robust identification and classification of antibiotic resistance genes from sequences
Microbiome. [Microbiome]
- (co-authored) Adverse clinical outcomes and immunosuppressive microenvironment of RHO-GTPase activation pattern in hepatocellular carcinoma
Journal of Translational Medicine. [J. Transl. Med.]
- (co-authored) Assessing the reproducibility, stability, and biological interpretability of multimodal computed tomography image features for prognosis in advanced non-small cell lung cancer
iRadiology. [iRadiology]
- (co-authored) PET/CT deep learning prognosis for treatment decision support in esophageal squamous cell carcinoma
Insights into imaging. [Insights Imaging]
- (co-authored) Sleep patterns, genetic susceptibility, and digestive diseases: A large-scale longitudinal cohort study
International Journal of Surgery. [Int. Surg. J.]
- (co-authored) SARS-CoV-2 variants divergently infect and damage cardiomyocytes in vitro and in vivo
Cell & Bioscience. [Cell Biosci]
- (co-authored) Unveiling promising drug targets for autism spectrum disorder: insights from genetics, transcriptomics, and proteomics
Briefing in Bioinformatics. [Brief. Bioinform.]
2023:
- ClairS: a deep-learning method for long-read somatic small variant calling
bioRxiv. [bioRxiv]
- Primary Prevention Cardiovascular Disease Risk Prediction Model for Contemporary Chinese (1°P-CARDIAC): Model Derivation and Validation Using a Hybrid Statistical and Machine-Learning Approach
SSRN Preprints. [SSRN Preprints]
- High-resolution diploid 3D genome reconstruction using Pore-C data
bioRxiv. [bioRxiv]
- Large-scale Dataset and Effective Model for Variant-Disease Associations Extraction
ACM-BCB 2023. [ACM-BCB 2023]
- Exploring Pair-Aware Triangular Attention for Biomedical Relation Extraction
ACM-BCB 2023. [ACM-BCB 2023]
- Long-Read Sequencing with Hierarchical Clustering for Antiretroviral Resistance Profiling of Mixed Human Immunodeficiency Virus Quasispecies
Clinical Chemistry. [Clin. Chem] [GitHub]
- Boosting variant-calling performance with multi-platform sequencing data using Clair3-MP
BMC Bioinformatics. [BMC Bioinform.] [GitHub]
- Ultra-low coverage genome-wide association study - insights into gestational age using 17,844 embryo samples with preimplantation genetic testing
Genome Medicine. [Genome Med.] [medRxiv]
- Evaluation of Mycobacterium Tuberculosis enrichment in metagenomic samples using ONT adaptive sequencing and amplicon sequencing for identification and variant calling
Scientific reports. [Sci. Rep.] [GitHub]
- Integrated modeling framework reveals co-regulation of transcription factors, miRNAs and lncRNAs on cardiac developmental dynamics
Stem Cell Research & Therapy. [Stem Cell Res. Ther]
- (co-authored) The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models
Cell. [Cell]
2022:
- HKG: An open genetic variant database of 205 Hong Kong Cantonese exomes
NAR Genomics and Bioinformatics. [NARGB] [HKG database]
- ECNano: A Cost-Effective Workflow for Target Enrichment Sequencing and Accurate Variant Calling on 4,800 Clinically Significant Genes Using a Single MinION Flowcell
BMC Medical Genomics. [BMC Medical Genom] [GitHub]
- SENSV: Detecting Structural Variations with Precise Breakpoints using Low-Depth WGS Data from a Single Oxford Nanopore MinION Flowcell
Scientific Reports. [Sci. Rep] [GitHub]
- Streptococcus oriscaviae sp. nov. Infection Associated with Guinea Pigs
Microbiology Spectrum. [Microbiol. Spectr.]
- Symphonizing pileup and full-alignment for deep learning-based long-read variant calling
Nature Computational Science. [NatComputSci] [GitHub]
- Clair3-Trio: high-performance Nanopore long-read variant calling in family trios with Trio-to-Trio deep neural networks
Briefings in Bioinformatics [Brief. Bioinform.] [GitHub]
- Assembly-free discovery of human novel sequences using long reads
DNA Research. [DNA Res.]
- Duet: SNP-Assisted Structural Variant Calling and Phasing Using Oxford Nanopore Sequencing
BMC Bioinformatics. [BMC Bioinform.] [GitHub]
- (co-authored) Generalized radiograph representation learning via cross-supervision between images and free-text radiology reports
Nature Machine Intelligence. [Nat. Mach. Intell.] [PDF] [GitHub]
- (co-authored) Same-Cell Co-Occurrence of RAS Hotspot and BRAF V600E Mutations in Treatment-Naive Colorectal Cancer
JCO Precision Oncology. [JCO Precis. Oncol.]
- (co-authored) Temporal Control of the WNT Signaling Pathway During Cardiac Differentiation Impacts Upon the Maturation State of Human Pluripotent Stem Cell Derived Cardiomyocytes
Frontiers in Molecular Biosciences. [Front Mol Biosci.]
- (co-authored) A self-blinking DNA probe for live-cell superresolution 3D imaging of hierarchical chromatin structures
bioRxiv. [bioRxiv]
2021:
- Building a Chinese pan-genome of 486 individuals
Communications Biology. [Commun. Biol.]
- The applications and potentials of nanopore sequencing in the (epi)genome and (epi)transcriptome era
The Innovation. [The Innovation]
- SARS‐CoV‐2 biology and variants: anticipation of viral evolution and what needs to be done
Environmental Microbiology. [Environ. Microbiol.] [Traditional Chinese Translation] [Simplified Chinese Translation]
[SARS-CoV-2 Cytosine Attenuation Tracking]
- High Prevalence and Mechanism Associated With Extended Spectrum Beta-Lactamase-Positive Phenotype in Laribacter hongkongensis
Frontiers in Microbiology. [Front. Microbiol.]
- BioNumQA-BERT: Answering Biomedical Questions Using Numerical Facts with a Deep Language Representation Model
ACM-BCB 2021. [PDF] [Conference]
- RENET2: High-Performance Full-text Gene-Disease Relation Extraction with Iterative Training Data Expansion
NAR Genomics and Bioinformatics. [NARGB] [GitHub]
- DNA methylation affects pre-mRNA transcriptional initiation and processing in Arabidopsis
bioRxiv. [bioRxiv]
- (co-authored) Drug Repurposing for the Treatment of COVID-19: A Knowledge Graph Approach
Advanced Therapeutics. [Adv. Ther.]
- (co-authored) Distinct disease severity between children and older adults with COVID-19: Impacts of ACE2 expression, distribution, and lung progenitor cells
Clinical Infectious Disease. [CID]
- (co-authored) Clinical analysis and pluripotent stem cells-based model reveal possible impacts of ACE2 and lung progenitor cells on infacts vulnerable to COVID-19
Theranostics. [Theranostics]
2020:
- Exploring the limit of using a deep neural network on pileup data for germline variant calling
Nature Machine Intelligence. [Nat. Mach. Intell.] [PDF] [GitHub]
- CONNET: Accurate Diploid Genome Consensus in de novo Assembly of Nanopore Sequencing Data via Deep Learning
iScience. [iScience] [GitHub]
- Skyhawk: An Artificial Neural Network-based discriminator for reviewing clinically significant genomic variants
International Journal of Computational Biology and Drug Design. [IJCBDD] [PDF] [GitHub]
- MegaPath: sensitive and rapid pathogen detection using metagenomic NGS data
BMC Genomics [BMC Genomics] [SourceForge]
- MegaPath-Nano: Accurate Compositional Analysis and Drug-level Antimicrobial Resistance Detection Software for Oxford Nanopore Long-read Metagenomics
IEEE BIBM 2020. [PDF] [Conference]
- ChromSeg: Two-Stage Framework for Overlapping Chromosome Segmentation and Reconstruction
IEEE BIBM 2020. [PDF] [Conference]
- Tracking cytosine depletion in SARS-CoV-2
bioRxiv. [bioRxiv] [Website]
- (co-authored) High-quality bacterial genomes of a partial-nitritation/anammox system by an iterative hybrid assembly method
Microbiome. [Microbiome]
- (co-authored) Identification of Cooperative Gene Regulation Among Transcription Factors, LncRNAs, and MicroRNAs in Diabetic Nephropathy Progression
Frontiers in Genetics. [Front. Genet.]
- (co-authored) Translocator: local realignment and global remapping enabling accurate translocation detection using single-molecule sequencing long reads
ACM-BCB 2020. [PDF] [Conference]
- (co-authored) MC-Explorer: Analyzing and Visualizing Motif-Cliques on Large Networks
ICDE 2020. [PDF] [Demo]
2019:
- RENET: A Deep Learning Approach for Extracting Gene-Disease Associations from Literature
RECOMB 2019. [Springer]
- Clairvoyante: a multi-task convolutional deep neural network for variant calling in Single Molecule Sequencing
Nature Communications. [Nat. Comm.] [GitHub]
2018:
- Restricted Boltzmann Machine and its Potential to Better Predict Cancer Survival
Biomed J Sci & Tech Res. [PDF]
- (co-authored) Transcriptome Analysis of Acute Phase Liver Graft Injury in Liver Transplantation
Biomedicines. [PubMed]
- (co-authored) AC-DIAMOND v1: Accelerating large-scale DNA-protein alignment
Bioinformatics. [PubMed] [GitHub]
- (co-authored) MegaPath: Low-Similarity Pathogen Detection from Metagenomic NGS Data (Extended Abstract)
ICCABS 2018. [IEEE]
2017:
- First Draft Genome Sequence of the Pathogenic Fungus Lomentospora prolificans (formerly Scedosporium prolificans)
G3: Genes, Genomes, Genetics. [PubMed]
- Serine peptidase inhibitor Kazal type 1 (SPINK1) as novel downstream effector of the cadherin-17/β-catenin axis in hepatocellular carcinoma
Cellular Oncology. [PubMed]
- LRSim: a Linked Reads Simulator generating insights for better genome partitioning
Computational and Structural Biotechnology Journal. [PubMed] [GitHub]
- 16GT: a fast and sensitive variant caller using a 16-genotype probabilistic model
GigaScience. [PubMed] [GitHub]
- (co-authored) MegaGTA: a sensitive and accurate metagenomic gene-targeted assembler using iterative de Bruijn graphs
BMC Bioinformatics. [PubMed]
2016:
- MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices
Methods. [PubMed]
- BASE: a practical de novo assembler for large genomes using long NGS reads
BMC Genomics. [PubMed]
- (co-authored) AC-DIAMOND: Accelerating Protein Alignment via Better SIMD Parallelization and Space-Efficient Indexing
IWBBIO. [Springer]
2015:
- database.bio: a web application for interpreting human variations
Bioinformatics. [PubMed]
- De novo assembly of a haplotype-resolved human genome
Nature Biotechnology. [PubMed]
- MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph
Bioinformatics. [PubMed] [GitHub]
- MICA: A fast short-read aligner that takes full advantage of Intel Many Integrated Core Architecture (MIC)
BMC Bioinformtics. [PubMed] [SourceForge] [GitHub]
- (co-authored) Genome-Wide Mapping of Structural Variations Reveals a Copy Number Variant That Determines Reproductive Morphology in Cucumber
Plant Cell. [PubMed]
2014:
- SOAPdenovo-Trans: De novo transcriptome assembly with short RNA-Seq reads
Bioinformatics. [PubMed] [SourceForge] [GitHub]
- BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU
PeerJ. [PubMed] [SourceForge]
- Exome sequencing of tumor cell lines: Optimizing for cancer variants
Cancer Research. [AACR]
- GPU-Accelerated BWT Construction for Large Collection of Short Reads
ArXiv. [PDF]
2013:
- SOAP3-dp: Fast, Accurate and Sensitive GPU-based Short Read Aligner
PLoS ONE. [PubMed] [GitHub]
2012:
- SOAPdenovo2: An empirically improved memory-efficient short-read de novo assembler
GigaScience. [PubMed] [SourceForge] [GitHub]
- COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly
Bioinformatics. [PubMed] [SourceForge]
- The oyster genome reveals stress adaptation and complexity of shell formation
Nature. [PubMed]
- (co-authored) Single-base resolution maps of cultivated and wild rice methylomes and regulatory roles of DNA methylation in plant gene expression
BMC Genomics. [PubMed]
- (co-authored) Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species
GigaScience. [PubMed]
- (co-authored) An integrated map of genetic variation from 1,092 human genome
Nature. [PubMed]
2011:
- Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly
Nature Biotechnology. [PubMed]
- (co-authored) Mapping copy number variation by population-scale genome sequencing
Nature. [PubMed]
- (co-authored) Assemblathon 1: A competitive assessment of de novo short read assembly methods
Genome Research. [PubMed]
2010:
- Building the sequence map of the human pan-genome
Nature Biotechnology. [PubMed]
- (co-authored) Sequencing of 50 Human Exomes Reveals Adaptation to High Altitude
Science. [PubMed]
- (co-authored) The DNA Methylome of Human Peripheral Blood Mononuclear Cells
PLoS Biology. [PubMed]
- (co-authored) International network of cancer genome projects
Nature. [PubMed]
- (co-authored) A map of human genome variation from population scale sequencing
Nature. [PubMed]