CSE – 535

Bioinformatics Computation

Mid Term Exam Content

Date: 07/02/2020

Bioinformatics: Bioinformatics is the science of storing, retrieving and analyzing of biological information.

  • Bioinformatics is a highly interdisciplinary field involving many different types of specialists including biologists, molecular life scientists, computer scientists and mathematicians.
  • The term ‘Bioinformatics’ was coined by Pouline Hogeweg and Ben Hesper to describe the ‘Study of informatic progresses is biotic system’. But Margaret O. Dayhoff was a pioneer in the field of Bioinformatics.
  • Bioinformatics includes biological studies that use computer programming as part of their methodology as well as specific analysis “pipelines” that are repeatedly used particularly in the field of genomics.
  • Bioinformatics also tries to understand the organizational principles within nucleic acid and protein sequences called proteomics

Major research areas of Bioinformatics:

  1. Sequence Analysis.
  2. Genome Analysis.
  3. Computational evolutionary biology.
  4. Literature Analysis.
  5. Analysis of Gene expression.
  6. Analysis of Regulation.
  7. Analysis of protein expression.
  8. Analysis of mutation in cancer.
  9. Comparative Genomics
  10. Hi-throughput Image Analysis.

Date: 14/02/2020

Applications of Bioinformatics:

  1. Molecular Medicine
  2. Personalized Medicine
  3. Preventive Medicine
  4. Gene therapy
  5. Drug development
  6. Microbial genome application
  7. Waste clean up
  8. Climate change studies
  9. Alternative energy sources
  10. Biotechnology
  11. Antibiotic resistance
  12. Forensic analysis of microbes
  13. Bio-weapon creation
  14. Evolutionary studies
  15. Crops improvement
  16. Insect resistance
  17. Improve nutritional quality
  18. Development of drought resistance varieties
  19. Veterinary  sequence

List of Protein Databases:

# Protein Databases
  • PDB (www.rcsb.org/pdb): A database for solved protein structure.
  • Uniprot (https://www.uniprot.org/): A protein information database.
  • CATH (www.cathdb.info/): It is a protein structure classification database.
# Disease Database
  • OMIM (www.ncbi.nlm.nih.gov/omim): A database for genetic diseases.
  • IEDB (www.iedb.org): An epitope database and prediction source.
# Metabolic Database
  • HMDB (http://www.hmdb.ca/): A database for small molecules metabolitics found in human body.
  • ECMDB (www.ecmdb.ca/): A database for metabolitics found E-coli.
# Literature Database
  • PubMed (https://www.ncbi.nlm.nih.gov/pubmed/)
# Sequence Database
  • Genbank (www.ncbi.nlm.nih.gov/genbank): A sequence database.
  • EMBL (www.embl.org/): A nucleotide sequence database.
# Pathway Database
  • KEGG (Kyoto encyclopedia of Genomes and Genomic): This is an interaction network database.
  • MINT (https://mint.bio.uniroma2.it/): The Molecular interaction database.
  • BioGRID (www.thebiogrid.org/): A database for protein-protein interaction, genetic interaction, chemical interactions and post-translational modifications.

Date: 22/02/2020

Drug Discovery Process:

Historical Milestones in the field of Bioinformatics:

1965 – Margaret Dayhoff – Atlas of protein sequence
1970 – Needleman Wunsch algorithm
1977 – DNA sequencing & software to analyze it.
1981 – The concept of sequence motif
1981 – Smith waterman algorithm developed
1982 – Genbank release 3 made public
1982 – Phage Lambda genome sequenced
1983 – Sequence database searching algorithm
1985 – FASTP/FASTN: Fast sequencing similarity searching
1988 РNational center for Biotechnology information  (NCBI) established at NIH/NLM
1988 – EMBnet network for database distributor
1990 – BLAST: Fast sequence similarity searching
1991 – EST. expressed sequence tag sequencing
1993 – Sanger center, Hinxton, UK.
1994 – EMBL: European Bioinformatics Institute, Hinxton, UK.
1995 – First Bacterial genomes completely sequenced
1996 – Yeast genome completely sequenced
1997 – PSA – BLAST
1998 – Worm (Multi Cellular) genome completely sequenced
1999 – Fly genome completely sequenced

Classical Tools in Bioinformatics:

# Database interface
– Genbank /EMBL/DDBJ, Medline, Swissprot, PDB
# Sequence Alignment
– BLAST, FASTA, Claustral, MultAlin, DiAlign
# Structure Prediction
– Swiss Modeler
# Gene Finding
– Genscan, Genome scan, Genemark, Grail
# Protein Domain Analysis
– Pfam, BLOCKS, ProDom
# Pattern Identification/Characterization
– Gibbs sampler, AlignACE, MEME
# Protein folding prediction
– PredictProtein, Swiss Modeler

Final Exam Content

Date: 29/02/2020

Classification of Databases:

Chemical structure of Nucleic Acid: