Page 1 :
Over the past few decades, major advances in the, field of molecular biology, coupled with advances, in genomic technologies, have led to an explosive, growth in biological information generated by the, scientific community. Biological databases are, libraries of life sciences information, collected, from scientific experiments, published literature,, high-throughput experiment technology, and, computational analyses. Information contained in, biological databases includes gene function,, structure, localization, clinical effects of, mutations as well as similarities of biological, , sequences and structures. Te alin Garo tos
Page 2 :
What Is a database?, , In simple terms, o database is defined as an organized collection of data or information that, is electronically stored and accessible from a computer system. The organized nature of the, database makes it easy to access, manage, periodically update, and rapidly search the required, data/information from a suitable computer system [1]., , Biological databases and their tmportance, , Among various types of databases, the ones constituting the datasets relevant to biological, sciences such as molecular biology and bioinformatics are called biological databases, In the, current scenario, the importance of biological databases con be understood from the following, points [2]:, , 1, Due to rapidly advancing molecular biology, proteomics, and low-cost high-throughput, genome sequencing technologies, huge amounts of biological information such as raw, sequencing datasets, proteomes, etc, are being generated at a very repid rate. Thus, the, storage and handling of this staggering information are the major challenges of the current, genomics era., , 2. In addition to generation, data analysis and drawing of meaningful conclusions are also, important parts of any scientific research, This oflen requires data sharing within the diverse, scientific community, In this context, the biological database enables the scientists to access, and retrieve the biologically relevant data including the raw data, genome sequences,, analyzed datasets, and annotations in easily manageable/organized formats,, , 3. Biological databases also allow data indexing as well as help remove the dala redundancy., , 4. At present, biological databases have become the central component of bioinformatics,, Through the various data mining tools, all biological information can be easily accessed;, thus saving time, resources, and efforts,, , Components of blological database, , Similar to other databases, a biological database also has certain basic components (Fig. 1),, , These are:, , Scanned with CamScanner
Page 3 :
a. Entity - An entity refers to the thing we want to store m a database. Eg. DNA sequences,, Genes, Bibliographic references, etc., bh, Flelds - The properties of an entity are called fields. Eg. Gene name, gene sequence,, mutation (if any), etc., c. Records - A record typical refers to a combination of all the fields for a given entity. For eg., Record for gene BRCA1 in GenBank, d, Identifier - The unique name which identifies a record., In the case of @ simple database, a single file contains multiple records. Among these, records, each one can have the same set of information (fields) along with a unique identifier., Various components of & database could be easily understood from the below-mentioned, example of a database of "Selected movies of Indian Cinema”. In the below mentioned Fig. 1:, ¢ The entities stored are movies., « The records are each row of the table including the movie name., ¢ The field refers to the columns of the table i.e., Title, Year, Director, © The unique identifiers are moviel, movie2, etc,, , Scanned with CamScanner
Page 4 :
a. Entity - An entity refers to the thing we want to store in a database. Eg. DNA sequences,, Genes, Bibliographic references, etc., b. Flelds - The properties of an entity are called fields. Eg, Gene name, gene sequence,, mutation (if any), etc., Records - A record typical refers to a combination of all the fields for a given entity, For eg,, Record for gene BRCAI in GenBank, d, Identifier - The unique name which identifies a record,, In the case of a simple database, a single file contains multiple records. Among these, tecords, each one can have the same sel of information (fields) along with a unique identifier,, Various components of a database could be easily understood from the below-mentioned, example of a database of "Selected movies of Indian Cinema®. In the below mentioned Fig, 1:, « The entities stored are movies,, ¢ The records are each row of the table including the movie name., * The field refers to the columns of the table ie., Title, Year, Director, © The unique identifiers are moviel, movie2, ete., , , , , , Entity - Movies, , , , , , ROD, , [Ident fn) Section 375 2019 Alay Bahl, , , , , , ["“" Har Wisse Ke Hisse: 2020 Hardik Metta, 2 Kaarryaab, , movie A Wednesday 2008 Neoraj Pandey, 3, , movie Pink 2016 Aniruddha Roy, 4 Chowdhury, , movie Parched 2016 Leena Yadav, 5s, , , , Fig. 1: Example of a typical tabular database with each row containing a separate record, along with distinct Nelds/atiributes In the columns,, , DBT Sponsored e-Training ot GADVASU, , Scanned with CamScanner
Page 5 :
Ty pes of biological databases, Based on their content, the biological databases can be classified into the following types, BI, a. Primary databases, Primary databases, also known as the archival databases, basically contain experimentally, dertved datasets sack as nucleotide and protetn sequences as well as the structural information of, macromolecules This basic information can be accompanied by functional annotation,, bibliographies. and links to other databases The data to the primary database ts directly submitted, by researchers. Once submitted, the data is assigned an accession number, which ts permanent and, becomes a part of the scientific record [2]. The followings are examples of primary databases:, Ls Primary nucikeotide sequence databases - The European Nucleotide Archive (ENA), The, National Center for Biotechnology Information GenBank (NC BI GenBank), and The DNA, Data Bank of Japan (DDBJ), etc., tf = Microarray Functional genomics databases - Gene Expression Omnibus (GEO) and, Array Express Archives ete,, ti, = Protein sequences and structure databases - Swiss-Prot and Protein Information, Resource (PIR) for protein sequences, Protein Databank (PDB) for prove structure, b. Secondary databases, Secondary databases store the information derived from the analysis of primary datasets., Secondary databases contain highly curated information derived from complex computational as, well as manual analysis of primary resources and scientific literature These databases often store, information about conserved domain structure/sequences, signal sequences, and active site resodues, [2], 1. Protein families, domains and structure databases - InterPro, PROSITE, SCOP, CATH, and NCBI Conserved Domain Database (CDD), ti. Protein sequences and functional information databases - UniProt Knowledgebase, (UniProtKB), i, Nucleotide (Genes Genomes) sequence and annotation databases - NCBI UniGene, The, European Bioinformatics Institute (EBI) Genomes (EBI Genomes), and Ensembl, etc, . Specialized databases, These databases cater to the meds of specific research interests. Eg Ribosomal Project, Database (RDP), HIV sequence database, The Saccharomyces Genome Database (SGD), Mouse, Genome Database (MGD), and Antibiotic Resistance Genes Databaw (ARDS) etc, , Scanned with CamScanner