Single-linkage molecular clustering of viral pathogens

Published in bioRxiv, 2023

Recommended citation: Soto Miranda M, Narváez Romo R, Moshiri N (2023). "Single-linkage molecular clustering of viral pathogens." bioRxiv. doi:10.1101/2023.08.03.551813


Public health faces the ongoing mission of safeguarding the population’s health against various infectious diseases caused by a great number of pathogens. Epidemiology is an essential discipline in this field. With the rise of more advanced technologies, new tools are emerging to enhance the capability to intervene and control an epidemic. Among these approaches, molecular clustering comes forth as a promising option. However, appropriate genetic distance thresholds for defining clusters are poorly explored in contexts outside of Human Immunodeficiency Virus-1 (HIV-1).


In this work, using the well-used pairwise Tamura-Nei 93 (TN93) distance threshold of 0.015 for HIV-1 as a point of reference for molecular cluster properties of interest, we perform molecular clustering on whole genome sequence datasets from HIV-1, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), Zaire ebolavirus, and Mpox virus, to explore potential pairwise distances thresholds for these other viruses.


We found the following pairwise TN93 distance thresholds as potential candidates for use in molecular clustering: 0.00016 (3 mutations) for Ebola, 0.00014 (4 mutations) for SARS-CoV-2, and 0.0000051 (1 mutation) for Mpox.


This study provides valuable information for epidemic control strategies, and public health efforts in managing infectious diseases caused by these viruses. The identified pairwise distance thresholds for molecular clustering can serve as a foundation for future research and intervention to combat epidemics effectively.

Availability and implementation

All relevant data and results can be found in the following repository: