Biomedical Informatics Infrastructure

The infrastructure of the combined C2B2 and HICCC (Herbert Irving Comprehensive Cancer Center) facilities is available to meet needs ranging from those of the bench scientist to the most sophisticated computational scientist. It comprises one of the largest academic computing centers devoted to molecular and systems biology. Capabilities include microarray data analysis and storage, sequence and pathway analysis, and extend all the way to the most recent algorithms for regulatory network reverse engineering. The equipment is housed in several modern datacenters located in the Herbert Irving Cancer Center and in the Russ Berrie Pavilion. The infrastructure described here can be used directly by interested scientists, or through C2B2/HICCC bioinformatics consultants.

Through a twelve million dollar grant from Empire State Development Corporation (ESDC), the C2B2 constructed a state of the art data center in the Irving Cancer Research Center (ICRC) to support biology related research computing at Columbia University. The data center is a 2000 ft.2 server room capable of housing up to 86 server racks, including 22 high-density racks which can house the dense, power-consuming hardware used by high performance computing clusters. The data center is configured with a 1 MW battery backup unit (UPS) and a highly available 10 Gbps core network infrastructure.

As a part of the ESDC grant, a high performance compute cluster (code-named “titan”) as well as high performance, reliable storage have been installed in the data center. The compute cluster consisting of 466 systems (3728 processor cores) is presently ranked #124 in the international Top500 Supercomputers list, recognizing it as one of the most powerful computers in the world. An Isilon® clustered Network Attached Storage (NAS) solution provides a scalable, reliable, high-speed 168 TB of storage to the compute cluster to facilitate the manipulation of very large data sets required by biology related research computing. The research computing environment is augmented with a high capacity, scalable backup infrastructure and a robust computing infrastructure offering database, web, mail/collaboration, and authentication services.

The new ICRC Data Center provides C2B2 and Columbia University with one of the most advanced technology centers dedicated to biological computing in the world.

Hardware: State-of-the-art clusters are available for large-scale computational jobs, and are supported by a full set of servers for databases, web applications, large memory SMP jobs, grid-computing, and automated backup. The clusters are used both to run existing software and for custom development, for which a full set of compilers is available.

Software: A variety of both licensed and open-source packages are supported that span the full range of bioinformatics tasks.

Databases: C2B2 members themselves creates and maintain a number of widely-used databases. In addition, all important sequence and structure databases are maintained centrally. This allows direct large-scale searches, if need be using custom algorithms and cluster computing.

geWorkbench: A growing number of local services and databases can be accessed through geWorkbench, an integrated genomics platform written in Java.

Communications: C2B2 provides a dedicated room for data analysis, which also includes modern wide-screen video conferencing capabilities.

Support: A professional systems support team designs and maintains the datacenters in which the servers are housed, and support C2B2/HICCC members in both desktop and cluster computing.

Hosting services: C2B2 offers a variety of collocation & hosting services including cluster time, storage, backup, mail & collaborative tools, web hosting, desktop support and server equipment hosting to qualifying research groups.

Access and Contacts

Many of the databases created and maintained by C2B2 are publicly available through the web. However, some databases and programs are only available by directly logging on to local machines or through Facility staff, or require special accounts.

  1. All logins to C2B2 Linux and Unix computers are made using SSH to the gateway node adgate.c2b2.columbia.edu. From there you can use SSH to log into a specific computational server.
  2. For GeneSpring and Partek, please contact Renu Pandita, at rp2185@columbia.edu.
  3. For Systems support information, please see the Systems group page at http://wiki.c2b2.columbia.edu/systems/index.php/Main_Page.