Biomedical Informatics Services

Data analysis and database access services

These help HICCC investigators create preliminary data for research grant proposals and final data for publications and presentations. For instance, an often requested service includes normalizing and analyzing microarray data (expression profiles, ChIP on chip, genotyping, etc.) and then uploading it standard repositories, such as GEO (Gene Expression Omnibus). Use of appropriate statistical tests and data mining tools, reflecting accepted best practices, is a key element of success in proposal and publication submissions. Other services include gene identification and molecular modeling.

Access to a large set of locally maintained biological databases

Databases are generally updated with each full new release. This service is critical because access to these databases via the web is extremely time consuming and not practical in the context of a high throughput analysis workflow. The BISR provides services to help HICCC investigators deal with the great variety of data formats and data representation standards across these databases. The BISR staff also process requests for local storage of additional public databases.

Sequence DatabasesData TypePurpose
Genbank Nucleic Acid Sequences Homolog Identification and Sequence Retrieval
Genpept Protein Sequences Homolog Identification and Sequence Retrieval
Uniprot (includes Swissprot) Protein Sequences Homolog Identification and Sequence Retrieval
Prosite Sequence Patterns of Structural and Functional Motifs Identification of Regions of Protein Sequence with Structures and Functions
Pfam Statistical Models of Protein Domain Identification of Structural and Functional Domains from Protein Sequence
Rebase Patterns of Restriction Enzyme Binding and Cutting Identification of Restriction Enzyme Cut Sites for Mapping and DNA Application
Genomic Sequence DatabasesData TypePurpose
Human Whole Genomic Nucleic Acid Sequence Accessing Gene and Intergenic Sequences in their Genomic Context
Chimp Whole Genomic Nucleic Acid Sequence Accessing Gene and Intergenic Sequences in their Genomic Context
Mouse Whole Genomic Nucleic Acid Sequence Accessing Gene and Intergenic Sequences in their Genomic Context
Rat Whole Genomic Nucleic Acid Sequence Accessing Gene and Intergenic Sequences in their Genomic Context
Zebrafish Whole Genomic Nucleic Acid Sequence Accessing Gene and Intergenic Sequences in their Genomic Context
Fly Whole Genomic Nucleic Acid Sequence Accessing Gene and Intergenic Sequences in their Genomic Context
Worm Whole Genomic Nucleic Acid Sequence Accessing Gene and Intergenic Sequences in their Genomic Context
Yeast Whole Genomic Nucleic Acid Sequence Accessing Gene and Intergenic Sequences in their Genomic Context
Many Microorganism Genomes Whole Genomic Nucleic Acid Sequence Accessing Gene and Intergenic Sequences in their Genomic Context
Structural DatabasesData TypePurpose
PDB - Protein Data Bank 3D Protein Structural Coordinates Protein Structure Data. Structural Analysis, and Structure-based Searching
SCOP Protein Structure Analysis Provides Protein Structure Classification
DSSPcontDB Secondary Structure of Proteins Assigned According to the Continuum Method Protein Structure Analysis
TargetDB Proteins whose Structures are being Determined by Structural Genomics Proteins To See the Structure of a Protein of Interest is Being Determined Experimentally
PEP PEP - Prediction of Protein Structural Characteristics of Entire Proteomes Protein Structure Analysis
Epitome Sequences and Structures of Antigens and their Antibodies Antibody-Antigen Complex Structural Analysis
Functional DatabasesData TypePurpose
GeneWays Pathway Information Automatically Extracted from the Literature Finding a Gene's Interaction Partners and Pathway Context
CNKB B Cell Molecular Interaction Database B Cell Network Knowledge Base
Functional Annotation Database Annotation of Protein Functions Finding a Gene's Functions
MINT Molecular Interaction Pairs Finding a Gene's Interaction Partners
GO (Gene Ontology) Hierarchical Protein Functional Categories Finding a Gene's Functions and How These Functions Fit into a Functional Hierarchy
KEGG Pathway Canonical Pathways Finding pathways which determine differences in phenotypes based upon gene expression profiles
IPA: Ingenuity Pathway Analyst Curated Pathways from the Literature Interpretation of experimental results in terms of prior knowledge

 

Access to and support/consulting for a variety of public and commercial bioinformatics tools maintained by C2B2/MAGNet and HICCC staff

The most frequently used tools and their access license are shown below. Which tools (or tool sub-functionality) to use to address a specific problem is a complex question, with answers that are in constant evolution. The BISR personnel help investigators select from a vast array of available tools the ones that best fit their requirements and reflect accepted best practices or specific publication requirements.

Custom bioinformatics workflow development

While the typical experimental lab may have the ability to use bioinformatics software, given appropriate access and training, the ability to program customized data analysis workflows, to integrate multiple tools, is extremely rare. A typical example is the intersection of ChIP-on-chip and gene expression profile data, followed by sequence analysis to identify functionally active DNA binding sites for a given transcription factor. The BISR addresses this need in two ways: first by providing access and training for the geWorkbench platform, where custom workflows can often be assembled with little or no programming; second, for more complex project involving for instance custom access to external databases and integration of multiple tools not available within geWorkbench, the BISR provides custom programming expertise. This service, for instance, was recently used by the Dalla Favera and Califano labs for the identification of novel miRNAs in normal and malignant B cell populations.

Custom biological database development

An increasing number of HICCC investigators are producing high-throughput biological data, which often require stratification according to phenotypic information in clinical databases. The BISR helps investigators to develop and to host their custom databases and to integrate them with those containing clinical data.

Custom web site development for data sharing and dissemination

Based on the data sharing plan and on increasingly strict requirements from journals, HICCC investigators must share and disseminate their results to the research community. This is usually done using either an access controlled or an unrestricted access web portal. BISR personnel help design, develop, deploy, and host these portals in a more efficient and cost effective way than if they had to be developed and maintained by the individual HICCC investigator.

Access to high performance computing equipment for computationally intensive analyses

New generation bioinformatics tools, especially in the area of systems biology or when large genomic data scans are required, may require substantial computational power and storage requirements. Traditionally, research labs would either purchase and maintain their own hardware or seek collaboration with bioinformatics collaborators who can run these computationally intensive programs. The BISR has access to one of the largest academic computational clusters dedicated to research in molecular and systems biology. Access to this resource is provided via fees that are determined based on the total number of CPU hours required for the analysis.