A machine-learning approach to define the component parts of bacteriophage virions
Bacteriophages (phages) are currently under consideration as a means to treat a wide range of bacterial infections, including those caused by drug-resistant “superbugs”. Successful phage therapy protocols require diverse phage in a phage cocktail, with the prospective need to recognize features of diverse phage from under-sampled environments. The effective use of these viruse for therapy depends on a number of factors, not least of which is the sequence-based choices that must be made to identify new phages for development into phage therapy.
Phage virions, i.e. the physical form of the phage that would be delivered to the site of infection, conform to a blue-print that consists of a protein capsid housing the viral genome, and a multicomponent tail. We view these virions as molecular machines, and the machinery of the tail machinery is complex. First and foremost, elements within the tail function to engage a species-specific component on the surface of the host bacterium, thereby initiating the infection cascade. The tail machinery is also responsible for penetrating through the bacterial cell wall, in order that the tip of the tail can enter the bacterial cytoplasm. Then, and only then, is a signal transmitted to the portal at the proximal end of the tail, enabling release of the phage DNA into the tail lumen to permit DNA translocation into the bacterial cell cytoplasm, resulting in bacterial death.
We have developed an ensemble predictor called STEP3 that uses machine-learning algorithms to characterize the components of the machinery in phage virions. STEP3 can be used to understand the universal features of the machinery in phage tails, by accurately classifying proteins with conserved features together into groupings that are not dependent on the ill-considered annotations that currently confuse phage genome data. In the development of STEP3, various types of evolutionary features were sampled, features that were extracted from Position-Specific Scoring Matrix (PSSM), to draw on relationships underpinning the evolutionary history of the various proteins making up the phage virions. Considering the high evolution rates of phage proteins, these features are particularly suitable to detect virion proteins with only distantly related homologies. STEP3 integrated these features into an ensemble framework to achieve a stable and robust prediction performance. The final ensemble model showed a significant improvement in terms of prediction accuracy over current state-of-the-art phage virion protein predictors on extensive 5-fold cross-validation and independent tests.
- The following browsers are supported by this website:
- Windows: Chrome, Firefox,Internet Explorer 9+,Opera
- Mac: Chrome, Firefox,Opera,Safari
- Linux: Chrome, Firefox
- Thung T.Y, White M, Dai W et al. A machine-learning approach to define the component parts of bacteriophage virions. 2020, to be submitted.
Lithgow Group
Infection and Immunity Program
Biomedicine Discovery Institute
Faculty of Medicine, Nursing and Health Sciences
Monash University
Contact Us