| |
Integrative Health Sciences Facility Core
| General Description and Rationale
The overall goal of the IHS Facility Core is to facilitate both patient-oriented and population-based research within the CEHNM. It promotes the strategic goals of the CEHNM by supporting existing research projects and fostering new interdisciplinary activities focusing on environmental health problems. The Core provides the infrastructure to carry out these studies and has several components, including biospecimen processing, DNA extraction, genotyping and laboratory assays. Epidemiologic and biostatistical capabilities include collaboration on study design, power calculations, measurement of exposure, covariates, confounders and outcomes and data management and analysis. The biorepository arm of the IHS Core provides a centralized, efficient and cost-effective resource for collecting or receiving, processing, storing, distributing and analyzing biological samples. The Core also seeks to address the needs of CEHNM investigators as their research focus changes. The addition of high throughput genotyping methods was the result of member requests, as was the development of oxidative stress assays and enhancement of data management capabilities. It also develops and validates new assays to meet the changing needs of CEHNM investigators.
In addition, the Core offers pro-bono or low-cost services to new CEHNM pilot projects. Training of students and postdoctoral fellows is an important component of Core activities, allowing methodologies to be exported to other laboratories, providing access to equipment and the expertise necessary to conduct studies requiring DNA adduct detection, measurement of oxidative stress markers, genotyping and other immunoassays of interest. The Core also provides extremely efficient, cost-effective cutting-edge web-based and other database management systems, which are expanding to include a major formal educational component. Finally, the IHS Facility Core works with the Community Outreach and Education Core (COEC) to disseminate research results to the surrounding community; its activities were highlighted in the most recent CEHNM newsletter.
|
|
Biostatistical Support
Provide comprehensive biostatistical support for study design and analysis, including power calculations, advanced modeling and outcome measurement techniques.
Processing and Storage of Biological Samples
The Core provides initial consultation on all aspects of sample collection and processing for specific studies including the types of tubes used for sample collection, shipment methods, separation of specific fractions, number of aliquots, etc. Biological samples received are coded to maintain confidentiality using preprinted bar code labels provided to investigators. The whole blood samples can be spotted onto filter cards, if required, and are then separated into various components: peripheral leukocytes, leukocyte fractions (e.g., mononuclear cells and granulocytes), red blood cells, serum and plasma; each fraction is coded before storage.
The Core also processes and stores urine samples that have been collected with or without a preservative and oral cells and sputum. Oral cells are processed after collection either for isolation of DNA and/or preparation of microscope slides for immunohistochemical analysis. The Core also stores slides of smeared lymphocytes and cut paraffin sections. This, again, was at the specific request of CEHNM investigators. This Core enhances the efficient use of precious human samples and also stimulates collaborative multiparameter studies on the same set of samples. Whenever possible, aliquots of the various materials are stored and available for subsequent new types of assays that may become of interest.
Isolation of DNA
Depending upon the needs of the investigator, the Core isolates DNA using various protocols, primarily Qiagen kits, salting out or phenol/chloroform extraction methods. Quality control routinely involves the determination of the ratio of the absorbances at 260 and 280 nm, but in some studies the pattern of two different highly polymorphic microsatellite repeats, UT1699 and UT1091 are determined. Any samples that have A260/280 ratios above or below 1.7-1.9 are treated with RNase, proteinase K and extracted with phenol. Absorbances are read on a 96 well UV reader (Molecular Devices) and data directly sent to an Excel file for determination of concentration and purity. DNA can also be isolated from the filter cards and cross-checked with the DNA from aliquoted cells for microsatellite repeats as an identity check. In the total of nearly 50 subjects in which this check has been performed, to date, there have been no discrepancies. Stock DNAs are kept in vials but depending upon the requirements of the specific study, aliquots can be prepared in tubes or, for high throughput genotyping, in 96 deep well storage plates. The database contains information on total yield of DNA, A260/280 ratios, microsatellite size, number of aliquots on hand, number shipped and recipient and remaining DNA stock. The Core has experience with WGA technology, routinely using Amersham kits.
Sample Analyses
The Core also provides analytical support, in addition to sample handling, including measurement of biologically effective dose (DNA adducts), oxidative stress markers and genotyping for SNPs. One of the major concerns of several CEHNM investigators is exposure to polycyclic aromatic hydrocarbons (PAHs). PAH-DNA adducts are assayed either by enzyme-linked immunosorbent assay (ELISA) of isolated DNA or quantitative immunohistochemistry of intact cells or tissue biopsies. Details on both methods have been published.1-3 The Core Facility provides a mechanism by which other CEHNM investigators can have samples analyzed and pilot data generated. However, it also teaches investigators how to analyze the samples in their own laboratory, making available to them monoclonal antibodies and the instrumentation for quantitation.
Since oxidative stress is a theme in a number of CEHNM investigators' research, markers of oxidative damage are important areas of work for the Core. Dr. Santella's laboratory has developed monoclonal antibodies and a quantitative immunohistochemical method for detection of oxidative DNA damage (8-oxodeoxyguanosine) in cells.4 Other CEHNM investigators have used this assay, including Dr. Tom Hei whose samples of asbestos-treated cells were initially assayed by the Core. However, his laboratory staff were later trained to conduct the assay.
Instrumentation for the ELISA includes a 96 well plate washer and color and fluorescence readers. For the quantitative immunohistochemical methods, a Becton Dickenson Cell Analysis System (CAS) microscope is used for peroxidase stained samples, while a Nikon microscope and a Hammamatsu ORCA-100 CCD camera are used for quantitation of fluorescence staining.
High Throughput Genotyping
Several different methods are used for high throughput genotyping depending upon the needs of CEHNM investigators. Master DNA 96 well plates are used to make replica plates. In addition to assay specific quality control samples, 5% of samples are blind-duplicated on the DNA master plates. Subject IDs are scanned into the computer to avoid potential data entry errors. Genotyping methods available for single SNP analysis include template-directed primer extension, with detection of incorporated nucleotides by fluorescence polarization5 and Applied Biosystems Taqman. More recently, multiplex methods have been used including Applied Biosystems SNPlex (48 SNPs) and Sequenom iPLEX mass spectrometry (20-25 SNPs). With the fluorescence polarization and Taqman assays, over 75 SNP assays have been run on DNA repair, hormone, carcinogen, alcohol and folate metabolism, oxidative stress, growth factor and cell cycle control genes. We have successfully used SNPlex with 46 of 48 SNPs providing usable data. For iPLEX, to date, two multiplexes of 21 and 24 SNPs have been assayed. Instrumentation is available in the Department of Pathology. A limitation of these multiplex assays is the need to start with large numbers of SNPs in order to find 20-48 that can work together. Recently, Columbia's Human Genome Center leased a BioTrove system that conducts 64 individual Taqman assays on 48 samples (or 32 on 96 samples) on a nanowell platform; the major advantage of which is that multiplexing is not necessary. Thus, the specific genotyping method is tailored to the specific study based on number of samples and SNPs.
Biorepository Data Management
Mr. Richard Buchsbaum and Mr. Jeffrey Keen of the Statistical Analysis Center coordinate maintenance of the sample inventory. A web-based database, which they developed, is used for the inventory of stored samples in conjunction with the bar code reading system. Each submitted sample is identified by a unique sample ID. All samples are labeled with a bar code representation of the sample ID. This bar code is scanned into the inventory system, avoiding the manual entry of sample IDs that inevitably leads to errors. The database allows linkage of the sample inventory with the information available from questionnaires, surveys, etc. through the unique sample Facility Core number. After a sample is processed, aliquoted and stored a paper record with an attached sample bar code records date collected and processed, who processed the sample, the sample amount and how many of each type of aliquot were made. The data management staff developed an on-line data entry system for bulk recording of new sample storage. This eliminated time-consuming and error-prone recording of sample location on individual paper forms, as well as data entry for the paper forms. If insufficient samples are received to fill all pre-set, study-specified aliquots, manual correction of the data is possible. Information in the database includes the study identifiers, sample ID, aliquot type, volume, location (freezer, shelf, rack, box, and position in box), date received, date processed and the technician who performed the processing. In addition, the database stores information on the results of DNA extraction, transformation, re-extraction, and other further processing. The database also records all use and shipment of specimens, whether for internal use or by outside institutions and researchers.
The database system employs a number of internal checks to ensure the integrity of the inventory data. In addition to the scanning of all sample IDs, the system uses an interface created in Microsoft Access, which provides a feature-rich interface for the user (dynamic querying and searches, automatic entry of default values, automation of routine tasks). Data are stored permanently in an MS SQL Server 2000 database, which enforces quality constraints and referential integrity (e.g., ensuring that all samples can be properly linked to a study and individual submission). Access to the data is via an encrypted HTTP (internet) connection, and access is provided for authorized users with valid passwords only.
Methods Development
In addition to working with investigators on development of methods for analysis of new SNPs, the Core is also able to develop new assays. As a result of the interest of CEHNM members in oxidative stress, several new assays were developed in the current grant period and made available to investigators. They include plasma and urinary isoprostanes using commercial kits, 8-oxodeoxyguanosine using monoclonal antibody 1F7 developed by Dr. Santella’s laboratory (4), oxidized plasma proteins assayed with a commercial antibody to the dinitrophenyl group after the derivitization of carbonyl groups by dinitrophenyl hydrazine and nitrated plasma proteins using a commercial antibody to nitrotyrosine.
Study Design, Data Management and Analysis
Study Design
Any research center dedicated to advancing a scientific program must be armed with the best tools to create, collect, weigh and interpret evidence. In this regard, epidemiological, clinical and basic science studies in environmental health science may benefit from guidance in study design and statistical analysis.
Many of the characteristic questions of interest to NIEHS investigators have large statistical components. As a major example, formal epidemiological studies of purported exposure-disease associations often arise as a result of public or scientific concern regarding clusters of disease that are perceived as excesses. Many of the exposures are truly environmental, for example, low dose electromagnetic fields derived from power lines or water contaminants introduced from disinfection plants. Published literature on these and other issues is often questioned, since causal interpretations may be hindered by problems of bias, measurement error, inadequate control for confounding variables and inappropriate methods of data analysis. Other exposures, such as dietary constituents and active and passive smoking, are not strictly environmental but pose similar problems for study.
The IHS Core provides an interrelated network of biostatistical specialists as a resource to the community of CEHNM researchers. These Core members provide consultative, programming and analytical expertise on a wide range of issues to Center investigators and pilot study investigators who are planning or conducting research through the CEHNM. These include advice on design and statistical tests for proposed studies and sophisticated multivariate regression models such as Poisson, proportional hazards and polychotomous logistic regression. Power calculations and consultation on various methods of correction for measurement error are provided. Consultation is available for advanced statistical analysis, including the development of new statistical models, as required.
Data Management
Modern epidemiological research is collaborative and data intensive. Sophisticated data collection, storage and quality control techniques are available and capable of greatly enhancing the efficiency and scientific precision of such projects. However, advanced expertise in exploiting these resources does not typically fall within the skill set of the individual investigator. As a result, the engagement of proficient personnel and centralization of the data management resources, making them available to a set of investigators with similar needs, can achieve great benefits and efficiencies. In addition, if the programmers and data managers involved work in a specialized area for a prolonged period of time, they gain a unique familiarity with the methods and needs of the investigators.
The unit's involvement in NIEHS projects had considerable positive impact. Standard operating procedures for data entry, storage and quality control were developed. As a result of these procedures, and because data were validated as they were entered, the data cleaning effort that was required at the end of the data entry cycle for projects was substantially reduced. In addition, since all data were stored in one location, the process of creating required datasets for analyses was greatly expedited.
Data management services have been applied to the following types of projects:
- Participant data collected from questionnaires
- Storage and retrieval of equipment generated data, including:
- Genotype results
- Urine analyses
- Blood analyses
- Water analyses
- Air-monitoring data
- Project management applications, including:
- Participant recruitment tracking
- Grant tracking
- CEHNM consultation tracking
- Query tracking
- Project issues tracking
Consultation on Data Collection Plans
Investigators in the scientific community are becoming increasingly interested in alternatives to the process of collecting data on questionnaires and subsequently entering them into a database system. By consulting with the data management team during the early stages of a project, investigators can plan highly efficient systems, such as web interfaces, that allow data entry directly into a database during the interview or data collection process, from any computer that has an Internet connection and standard web browser. (Use of a web-based rather than a paper questionnaire allows skip patterns to be enforced during initial data entry.) Other data collection options include the automatic transfer of data that are electronically generated by other types of apparatus, such as air-monitoring devices and laboratory equipment. The data management team will continue to work with investigators to develop increasingly resourceful methods for collecting data.
Consultation on the Design of Collection Instruments
The data management team will continue to consult with investigators and project managers on a regular basis to ensure that proposed questionnaires are consistent and free of ambiguities. Early consultation in the form design process allows data managers to steer investigators toward designs that lend themselves to programmable database systems and data entry screens. Poor instrument design forces compromises in the design of the relational database structure, resulting in difficult data extraction during the creation of datasets.
Design and Programming of the Relational Database
The data management team designs and develops relational database systems to accommodate the requirements of each research project. Databases are created using either Microsoft SQL Server or Microsoft Access, as appropriate. The choice depends on the specific needs of each project. A relational database structure is a powerful tool that collects all data for a project in one electronic location, regardless of their origin. Data collected from questionnaires, laboratories, and electronic equipment are stored in separate tables that are related to each other by a key identifier, usually the Subject ID. As tables are programmed, definitions of the particular type of data to be saved in each variable are created. This key process is one of several that allow data to be kept accessible and as clean as possible during all project phases.
Design and Programming of the Data Entry System
A graphical user interface system is created in order to allow data entry for projects that include questionnaires. Depending upon decisions made during the steps listed above, either Microsoft Access or Microsoft ASP.net will be used to create the data entry system. Although systems can be programmed more rapidly using Microsoft Access, the data entry staff must then use computers on which the Access system has been installed. Alternatively, if ASP.net is used to program web-based data entry systems, any computer with an Internet connection and web browser can access the system. In either case, the designed screens communicate with the relational database created to store the data.
Data entry systems are programmed to duplicate the paper questionnaires and to include intricate validation routines (including logic checks) and skip patterns. This high-level up-front validation keeps data entry errors low, significantly shortening data-cleaning requirements at the end of the data-entry process, thus enabling investigators to work with the data on a rapid schedule. Consequently, the actual programming is the most labor-intensive and time-consuming phase of data management. However, the NIEHS data management team has amassed a significant library of templates, programming code and data extraction code resulting from the projects that have been successfully undertaken by the team over previous years. These modules are routinely plugged into multiple projects, requiring little or no modification, thus speeding up the programming process. In this way the team is able to provide highly customized solutions to investigators in the most cost-efficient and expedient manner.
Management of Data Entry Operation
Ms. Levy supervises two data entry employees, one of whom is directly supported by the CEHNM grant. The staff is responsible for:
- efficient entry of data and queries (see description below)
- attending weekly meetings with the study's project manager
- proper filing of all questionnaires that have been entered
A query system is created within each database used by the data entry staff. Each new question that arises during the data entry process is recorded, and the system automatically associates it with the Subject ID of the questionnaire being entered. The questions are subsequently reviewed by the project manager.
Questionnaires are also physically marked with color-coded adhesives to signify that they require review. Automatically generated reports list currently open questions. These are reviewed in regular meetings of Ms. Levy, the data entry staff, and the various project managers. Once an issue is resolved by the project manager, the data entry staff follow-up and apply it to the database. All queries are marked as "closed," "under investigation," or "open"; those in the latter two categories are monitored until closure.
Programming of Data Import Systems
Projects often collect data from electronic apparatus, such as air monitoring equipment. The data management team has collaborated with investigators to design and program automatic data import routines, and will follow up on opportunities to provide data import systems for projects of this type.
Creation of Datasets for Review and Statistical Analysis
Output datasets are created for investigators in whatever format they request. Some request data for review in Microsoft Excel, format while others prefer Microsoft Access format. It is straightforward, using the SQL query language inherent in relational database systems, for CEHNM data managers to generate datasets containing selected variables for analysis, which statisticians frequently request. For statistical analysis purposes, SAS output datasets are often created using SAS's built-in ODBC connection to Access.
Backup and Security
Storage in a central location, on a secure server, ensures that the data are backed up daily and that access is available only to authorized users with individual passwords. The systems use Windows integrated security, which ensures that user IDs and passwords are encrypted and secure when they are sent over the network.
Physical Resources
The Biomarkers Laboratory is located in the P&S building, adjacent to Dr. Santella's research laboratory. Bloods are processed under sterile conditions in two laminar flow hoods in a tissue culture room (300 ft2). Two centrifuges are dedicated to blood processing. Additional equipment in the Biomarkers Laboratory include a cytospin and a 96-well UV reader for DNA concentration determination. A Tecan 75 robot is available for aliquoting DNA or other reagents into 96 or 384 well plates. Lymphocyte transformation is carried out in a separate laminar flow hood to protect cell lines from contamination. They are also cultured in a separate incubator dedicated to this work. To facilitate cell line maintenance, a Coulter Z-series cell counter is available.
Two additional equipment rooms (600 ft2 each) contain the freezers. Biological samples are stored in -80o or -140o mechanical freezers or liquid nitrogen freezers equipped with a Rees Scientific 1AI5 Series II telephone alarm system. The -140o freezers have liquid nitrogen backup systems. Fixed microscope slides are stored at -20o. Currently, there are five -20o, fourteen -80o, two -140o and two liquid nitrogen freezers. Whenever possible, multiple aliquots from the same subject are stored in separate freezers to further safeguard samples.
In Dr. Santella's adjacent research laboratory (1200 ft2), there is additional equipment including a 96-well plate washer, color and fluorescence readers connected to computers for rapid data analysis, a Perkin-Elmer Victor spectrofluorometer, a Nikon fluorescence microscope with a video camera and an Applied Biosystems 7500 real time thermal cycler. On the floor in shared space is a microtome for paraffin sections.
The biostatistical and data management activities take place within the Department of Biostatistics, which occupies 14,000 square feet and is located on the 6th floor in the newly renovated MSPH building. The data management center's computing facility is located in the Statistical Analysis Center (SAC), within the Department of Biostatistics. It consists of seven rack-mounted servers that are housed in a secure computer room with climate control, emergency power and an uninterruptible power supply (UPS). Daily backups and integrated security are implemented through the SAC infrastructure, and the SAC web server provides secure access to the data from remote locations. All data access is password-protected, and all network communications, including the web site, use 128-bit encryption.
All servers and PCs that are part of the SAC infrastructure are protected by both host-based firewalls and software to prevent the inadvertent installation of "spyware". In addition, SAC is working with an outside consulting firm that regularly maintains all SAC servers and PCs. This assures the fastest possible turn-around time for the application of security updates and review of firewall logs.
The Core provides access to specialized software libraries for statistical analysis and data management that have been written by the CEHNM statisticians and data managers. The statistical software includes an unusually wide variety of statistical applications written in APL by Dr. Levin. These include software for exact discrete time survival analysis with arbitrarily structured time-dependent covariates and unlimited numbers of tied observations; multinomial and related distribution problems for exact analysis; logistic regression and conditional likelihood analysis for matched and finely stratified samples; and, adaptable statistics and graphing tools. The APL language allows a user-efficient and powerful programming environment for developing computational solutions to analytic problems as they arise. The data management resources include the database systems from multiple projects, a set of custom designed project management software routines that were initially developed for the large Superfund project, and various modules programmed to enter data, perform queries, and display data. In all CEHNM database projects, Ms. Levy designs software resources to permit easy application to subsequent projects, creating major efficiencies.
|