Data analysis and database access services
These help HICCC investigators create preliminary data for research grant proposals and final data for publications and presentations. For instance, an often requested service includes normalizing and analyzing microarray data (expression profiles, ChIP on chip, genotyping, etc.) and then uploading it standard repositories, such as GEO (Gene Expression Omnibus). Use of appropriate statistical tests and data mining tools, reflecting accepted best practices, is a key element of success in proposal and publication submissions. Other services include gene identification and molecular modeling.
Access to a large set of locally maintained biological databases
Databases are generally updated with each full new release. This service is critical because access to these databases via the web is extremely time consuming and not practical in the context of a high throughput analysis workflow. The BISR provides services to help HICCC investigators deal with the great variety of data formats and data representation standards across these databases. The BISR staff also process requests for local storage of additional public databases.
|Sequence Databases||Data Type||Purpose|
|Genbank||Nucleic Acid Sequences||Homolog Identification and Sequence Retrieval|
|Genpept||Protein Sequences||Homolog Identification and Sequence Retrieval|
|Uniprot (includes Swissprot)||Protein Sequences||Homolog Identification and Sequence Retrieval|
|Prosite||Sequence Patterns of Structural and Functional Motifs||Identification of Regions of Protein Sequence with Structures and Functions|
|Pfam||Statistical Models of Protein Domain||Identification of Structural and Functional Domains from Protein Sequence|
|Rebase||Patterns of Restriction Enzyme Binding and Cutting||Identification of Restriction Enzyme Cut Sites for Mapping and DNA Application|
|Genomic Sequence Databases||Data Type||Purpose|
|Human||Whole Genomic Nucleic Acid Sequence||Accessing Gene and Intergenic Sequences in their Genomic Context|
|Chimp||Whole Genomic Nucleic Acid Sequence||Accessing Gene and Intergenic Sequences in their Genomic Context|
|Mouse||Whole Genomic Nucleic Acid Sequence||Accessing Gene and Intergenic Sequences in their Genomic Context|
|Rat||Whole Genomic Nucleic Acid Sequence||Accessing Gene and Intergenic Sequences in their Genomic Context|
|Zebrafish||Whole Genomic Nucleic Acid Sequence||Accessing Gene and Intergenic Sequences in their Genomic Context|
|Fly||Whole Genomic Nucleic Acid Sequence||Accessing Gene and Intergenic Sequences in their Genomic Context|
|Worm||Whole Genomic Nucleic Acid Sequence||Accessing Gene and Intergenic Sequences in their Genomic Context|
|Yeast||Whole Genomic Nucleic Acid Sequence||Accessing Gene and Intergenic Sequences in their Genomic Context|
|Many Microorganism Genomes||Whole Genomic Nucleic Acid Sequence||Accessing Gene and Intergenic Sequences in their Genomic Context|
|Structural Databases||Data Type||Purpose|
|PDB - Protein Data Bank||3D Protein Structural Coordinates||Protein Structure Data. Structural Analysis, and Structure-based Searching|
|SCOP||Protein Structure Analysis||Provides Protein Structure Classification|
|DSSPcontDB||Secondary Structure of Proteins Assigned According to the Continuum Method||Protein Structure Analysis|
|TargetDB||Proteins whose Structures are being Determined by Structural Genomics Proteins||To See the Structure of a Protein of Interest is Being Determined Experimentally|
|PEP||PEP - Prediction of Protein Structural Characteristics of Entire Proteomes||Protein Structure Analysis|
|Epitome||Sequences and Structures of Antigens and their Antibodies||Antibody-Antigen Complex Structural Analysis|
|Functional Databases||Data Type||Purpose|
|GeneWays||Pathway Information Automatically Extracted from the Literature||Finding a Gene's Interaction Partners and Pathway Context|
|CNKB||B Cell Molecular Interaction Database||B Cell Network Knowledge Base|
|Functional Annotation Database||Annotation of Protein Functions||Finding a Gene's Functions|
|MINT||Molecular Interaction Pairs||Finding a Gene's Interaction Partners|
|GO (Gene Ontology)||Hierarchical Protein Functional Categories||Finding a Gene's Functions and How These Functions Fit into a Functional Hierarchy|
|KEGG Pathway||Canonical Pathways||Finding pathways which determine differences in phenotypes based upon gene expression profiles|
|IPA: Ingenuity Pathway Analyst||Curated Pathways from the Literature||Interpretation of experimental results in terms of prior knowledge|
Access to and support/consulting for a variety of public and commercial bioinformatics tools maintained by C2B2/MAGNet and HICCC staff
The most frequently used tools and their access license are shown below. Which tools (or tool sub-functionality) to use to address a specific problem is a complex question, with answers that are in constant evolution. The BISR personnel help investigators select from a vast array of available tools the ones that best fit their requirements and reflect accepted best practices or specific publication requirements.
- geWorkbench Integrative Genomics Platform, Contains many programs for microarray and sequence analysis.
- Links to Columbia Center for Computational Biology and Bionformatics Web-sites and resources.
- GCG (Wisconsin Package) and other Unix/based bioinformatics tools. For an account contact Janie Weiss.
- GeneSpring is a package for microarray analysis. For account information contact contact Renu Pandita.
- Partek is a package for microarray analysis. For license information contact Renu Pandita.
- Ingenuity Pathway Analysis is a package for pathway datamining. For license information, please contact Sadie Maloof.
Custom bioinformatics workflow development
While the typical experimental lab may have the ability to use bioinformatics software, given appropriate access and training, the ability to program customized data analysis workflows, to integrate multiple tools, is extremely rare. A typical example is the intersection of ChIP-on-chip and gene expression profile data, followed by sequence analysis to identify functionally active DNA binding sites for a given transcription factor. The BISR addresses this need in two ways: first by providing access and training for the geWorkbench platform, where custom workflows can often be assembled with little or no programming; second, for more complex project involving for instance custom access to external databases and integration of multiple tools not available within geWorkbench, the BISR provides custom programming expertise. This service, for instance, was recently used by the Dalla Favera and Califano labs for the identification of novel miRNAs in normal and malignant B cell populations.
Custom biological database development
An increasing number of HICCC investigators are producing high-throughput biological data, which often require stratification according to phenotypic information in clinical databases. The BISR helps investigators to develop and to host their custom databases and to integrate them with those containing clinical data.
Custom web site development for data sharing and dissemination
Based on the data sharing plan and on increasingly strict requirements from journals, HICCC investigators must share and disseminate their results to the research community. This is usually done using either an access controlled or an unrestricted access web portal. BISR personnel help design, develop, deploy, and host these portals in a more efficient and cost effective way than if they had to be developed and maintained by the individual HICCC investigator.
Access to high performance computing equipment for computationally intensive analyses
New generation bioinformatics tools, especially in the area of systems biology or when large genomic data scans are required, may require substantial computational power and storage requirements. Traditionally, research labs would either purchase and maintain their own hardware or seek collaboration with bioinformatics collaborators who can run these computationally intensive programs. The BISR has access to one of the largest academic computational clusters dedicated to research in molecular and systems biology. Access to this resource is provided via fees that are determined based on the total number of CPU hours required for the analysis.