With the pandemic changing our lifestyle significantly, it was inevitable to see many student researchers had to go in silico with their research. The majority of the educational institutions around the world shifted to the distance learning mode temporarily while institutes like Harvard announced to go online for the complete academic year.
Since most of the students are trained to work in the lab and have inadequate knowledge when it comes to biotechnology software, in silico phase of the research can be a tough nut to crack for them. The good thing is it is never too late to explore the apps and software that can help you get the hang of handling things in silico.
Having a list of apps and software for different tasks ranging from aligning nucleotide sequences, designing primers, protein docking to creating evolutionary trees could be a real blessing for biotechnologists. And that is what this article is dedicated to.
Let’s get to the list then:
For protein visualization
1. Chimera
Packed with features, Chimera is largely used for the interactive display and analysis of molecular structures and related data, like maps of density, supramolecular assemblies, alignments of sequences, findings of docking, geometries, and assemblies of conformations. This software is being developed at the University of California, San Francisco through the Resource for Biocomputing, Visualization, and Computing (RBVI). This software is used for the development of high-resolution images for research purposes, publications, analysis of multiple sequence alignments, and multi-model analysis.
Distinct features:
- Alignments of sequence can be built and manipulated
- High degree hydrogen bonding, interaction, and tracking of clashes
- Automatic atomic recognition
- High-quality images, including nonmolecular objects.
Fields in which it can help:
- Biochemistry
- Drugs discovery
- Molecular Biology
2. PyMol
PyMol is a molecular visualization tool that can be used universally both for scientific and educational purposes. The Py section of the name of the software refers to the program having been written in the programming language Python. The most prominent feature of this software is the generation of high-quality 3D images of all the micro and macromolecules like proteins and nucleic acid, and electron densities. By 2009, approximately one-third of all published illustrations of 3D protein structures in the scientific literature were created using PyMOL. This software has the ability to edit molecules, ray tracing, and movie making.
Distinct features:
- It can read multiple format files, e.g., PDB, SDF, mol, and mol2, and supports map format files including xplor, ccp4, phi.
- Scientific communities around the world mostly rely on PyMol because it can be used to prepare publication-quality figures or to produce animations.
- The platform for computational drug design
Fields in which it can help:
- Protein chemistry and protein modeling
- Drug discovery
- Structure biology
3. I-TASSER
Proteins are an essential part of our life so it is necessary to understand the structure and function of a protein as it provides valuable guidelines for interpreting the values of life and designing novel therapies to control life processes. I-Tasser is a bioinformatics tool used by researchers for finding the primary, secondary and tertiary or 3D structure of a protein from its amino acid sequence. Further, through the fold recognition or threading technique, it can predict structure templates from the protein data bank. The necessary information that is needed for running this software is the sequence of amino acids of the target protein. After completion, the proposed structure and function of a protein are displayed in the form of a web page or URL, which is sent to the user via email after completion of the task.
Distinct features:
- It shows the structure and active sites in a protein and also predicts the length of the protein.
- Users can automatically produce protein structure and function predictions using the I-TASSER server.
Fields in which it can help:
- Enzyme Classification
- Protein Chemistry
- Gene Ontology
4. Clustal X
Clustal X is a multiple sequence alignment tool. It presents an interactive forum for several sequences and profile alignments to be done and the effects evaluated. It is the updated version of the Clustal W, with the most updated version being Clustal X2. This program is in C++ language and is one of the most widely used tools for pairwise sequence alignment. The input file is in the FASTA format, EMBL or NBRF. After completion of the task, the results are displayed in different colors on the screen, and the sequence order can not be edited. The sequence order can be changed in the alignment by cutting and pasting the sequence. Clustal X can also tell about the duplicate FASTA sequence if the input file contains the same accession number multiple times.
Distinct features:
- Multiple sequences can be aligned.
- Results of the analysis are shown in different colors.
- It can also improve the alignment.
- It is very easy to use, and small data can be run accurately.
Fields in which it can help:
- Evolution
- Genomics
- Virology
5. AutoDock Vina
AutoDock Vina is an open-source program for doing molecular docking. It was designed and implemented by Dr. Oleg Trott in the Molecular Graphics Lab at Scripps Research Institute. It is an open web server program written in C ++. It improves speed and accuracy with new scoring functions, efficient optimization, and multithreading. It is used to perform molecular docking of small molecules such as proteins, DNA, RNA, and many other macromolecules. AutoDock Vina has the ability to calculate its own grid maps quickly and automatically and does not store them on the disk. It also clusters and ranks the results without exposing the user to intermediate detail. (Trott .,2010)
Distinct features:
- For input and output, AutoDock Vina accepts the same PDBQT molecular structure file format. PDBQT files can be generated and viewed using MGL Tools. It is highly optimized to perform docking experiments using well-tested default methods. It is a very fast and effective tool.
Fields in which it can help:
- Structure-based drug design
- Bioinformatics
- Structure Biology
- Molecular Docking
6. SWISS-MODEL:
It is a server for automated comparative modeling of three-dimensional (3D) protein structures. It was the first fully automated protein homology modeling server and has been continuously improved during the last 25 years. It gives guidelines to users about building homology models at different levels of complexity. Homology modeling includes Identification of the template‘s structure, building model, alignment of target sequence and template, and also evaluation of build model. As input data, The amino acid sequence of the target protein can be submitted either as plain text or in FASTA format. If the sequence of the protein is deposited in the UniProtKB database, the identifier is immediately validated and replaced with the corresponding sequence. Recently, its modeling functionality has been extended, which includes modeling of complexes (Homo and hetero), given sequence of amino acids of interacting partners as starting point. (Andrew., 2018)
Distinct features:
The SWISS-MODEL server is designed to work with a minimum of user input, i.e., in the simplest case, only the amino acid sequence of a target protein. Template selection, alignment, and model building are automated. Complex modeling tasks can be handled with the ‘project mode using DeepView (Swiss-PdbViewer), an integrated sequence-to-structure workbench.
Fields in which it can help:
- Protein structure homology
- Bioinformatics
- Protein Chemistry and Protein modeling
7. AutoDock:
AutoDock is a computational docking program based on an empirical free energy force field and rapid Lamarckian genetic algorithm search method. It uses a simplified representation of the molecules, which is included in a modified Protein Data Bank (PDB) file format, termed PDBQT. AutoDock is designed to be generic computational docking tools, accepting coordinate files for receptor and ligand and predicting optimal docked conformations. (Sellinger., 2010)
Distinct Features:
It includes the polar hydrogen atoms. It uses polar hydrogen coordinates during docking.
AutoDock uses Gasteiger-Marsili atomic charges for the calculation of electrostatic interactions and desolvation energies. AutoDock requires the user to specify the torsional degrees of freedom in ligand molecules and in any receptor side chains that are to be flexible. AutoDock allows limited flexibility of selected receptor sidechains.
Fields in which it can help:
- Drug discovery
- Protein Modeling
- Bioinformatics
For the generation of a phylogenetic tree
1. MegaX
One of the most widely used software when it comes to creating evolutionary trees, MegaX has had a number of updates since it was first introduced. The veteran software is used to mine databases. The software can handle large sets of data and has an intuitive interface. You can use MegaX if the required function is sequence analysis of the data, interpretation using statistical methods, or visualizing data in the form of trees.
Distinct features:
- Covers all major statistical methods, including Maximum likelihood, Distance methods, and Bayesian analysis.
- It includes Gene duplication and tree wizards.
- MUSCLE, one of the widely used tools for sequence alignments, comes integrated into its sequence alignment feature.
Fields in which it can help:
- Plant systematics
- Virology
- Genomics
For sequencing and alignment
1. Geneious Prime
Geneious Prime is one of those few software that offers a range of handy features on its own. This genius software is widely used for sequence analysis and annotation. The software can also be used to visualize the different components of a nucleotide sequence. The users can even tweak the color key for different regions of a sequence.
You can use Geneious Prime for sequence analysis, annotation, BLAST, and tree building.
Distinct features:
- It offers BLAST integration that lets you search the database without having to open the browser separately for it.
- It offers expression analysis.
- Data can be organized into folders to avoid data mix-up.
Fields in which it can help:
- Molecular Biology
- Data management
- Genomics
2. MultAlin
MultAlin, as the name suggests, is a multiple alignment tool that every biotechnologist has used at some phase of their research. The tool allows the users to align a number of sequences simultaneously. The software can work with a number of file formats, including FASTA, EMBL-SwissProt, and GenBank. It is one of the easiest to use software one can come across. You should use it if the required function is multiple sequence alignment with certain parameters.
Distinct features:
- The nucleotide or protein sequence parameters can be edited as per the requirement.
- The alignment comes with hierarchical clustering.
Fields in which it can help:
- Bioinformatics
- Genomics
- Proteomics
3. SDT
SDT stands for Sequence demarcation tool. It is a free tool used for the classification of viruses. This tool is used to accurately classify the viruses into their specific taxonomic units like species, strains, or genera. SDT can classify viruses based on their pairwise identity score. This tool takes an input file of aligned or unaligned FASTA sequences of DNA or protein and aligns each unique sequence of DNA or protein. It calculates pairwise similarity scores and displays these scores as a color-coded matrix. After completion of the FASTA sequences of the virus, it produces two types of files, one containing the matrices identity scores and the other containing the analysis result. The basic purpose of this method has been to enable general virologists to identify the whole genome sequences of new viruses according to ICTV’s gene identification, type, and strain demarcation guidelines.
Distinct features:
- The main consistency of SDT is their pairwise alignment rather than multiple sequence alignment.
- SDT results in a pairwise score as a color-coded box or cell.
- The percent identity score between two viruses represented by different colors.
Fields in which it can help:
- Virology
- Genetics
For Primer designing
1. Primer-BLAST (Primer3)
Previously known as Primer 0.5, Primer3 is an updated version of the ancestor software packed with more features. It is one of the most widely used tools for primer designing. Primer designing is one of the prerequisites of all the Polymerase Chain Reactions performed in biotech labs. This tool makes primer designing a piece of cake for those who are always dreading it. This easy-to-use web service is available for free in almost every region around the globe.
Distinct features:
- The software can predict the melting temperature of the primers and the formation of primer dimers.
- It can be used to analyze the specificity of a primer to a template.
- It provides a visualization of the binding sites of the primers.
Fields in which it can help:
- Genomics
- Bioinformatics
- Plant Phylogenetics
2. IDT OligoAnalyzer
IDT OligoAnalyzer is a great tool for primer designing for those of you who are new to the concept. The tool takes a very straightforward approach to primer designing, and the users do not need to have prior knowledge or computer skills to nail it. You can use IDT OligoAnalyzer for determining the physical properties of the nucleotide sequences for designing primers. Furthermore, the tool also visualizes the hairpin structures within the sequence.
Distinct features:
- It enables you to target the GC range for primer designing.
- It offers the visualization of hairpin structures.
- The tool also includes physical properties like optical density and extinction coefficient.
Fields in which it can help:
- Genomics
- Bioinformatics
- Plant Phylogenetics
Conclusion
Bioinformatics and Computational Biology are transcending the conventional way of doing research. The digital revolution has done wonders for scientists facilitating them and saving their time and resources in the meanwhile. Had it not been for these apps and software, the scientific community would have been stuck with the dilemma of whether to effectively utilize resources; cherry-pick the projects they want to work on, or further their science in every field while taking a few risks.
While the benefits of the biotech software and apps are innumerable, there is also room for error with the machines. The algorithms and concepts these software work on are statistical, which is subjected to error; the predictions may not always turn out to be correct. It is the reason why tests to confirm the results of more critical importance are performed. Nonetheless, thanks to bioinformatics that we as researchers have come a long way.
Note: This article was written in collaboration with Shah Rukh Khan and Sidra Fatima, Masters students from ASAB, NUST.
References
- Trott O, Olson AJ. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry. 2010 Jan 30;31(2):455-61
- Andrew Waterhouse, Martino Bertoni, Stefan Bienert, Gabriel Studer, Gerardo Tauriello, Rafal Gumienny, Florian T Heer, Tjaart A P de Beer, Christine Rempfer, Lorenza Bordoli, Rosalba Lepore, Torsten Schwede, SWISS-MODEL: homology modeling of protein structures and complexes, Nucleic Acids Research, Volume 46, Issue W1, 2 July 2018, Pages W296–W303
- Seeliger, D., de Groot, B.L. Ligand docking and binding site analysis with PyMOL and Autodock/Vina. J Comput Aided Mol Des 24, 417–422 (2010)