|
Data
Center | Data Center General Purpose | Data
Collection | Administration History |
Technical Reports
|
|
|
Data Center General Purpose |
|
|
The Fast Track Data Center, serving the entire multi-site Fast Track Project, is located at the Center for Child and Family Policy at Duke University. The Data Center is responsible for the following:
The Data Center receives data collected by research teams, led by the Research Coordinators, at each of the four Fast Track sites. The Data Center maintains an inventory file for recording all stages in the processing routine, from the receipt of the data through its final storage. Data are collected on scan forms, plain paper forms, and laptop computers. Data sent on scan forms are read using an optical scanner to create ASCII files containing the data. At this time, bubble errors such as missing bubbles are noted and corrected. Once each scan sheet is checked and corrected, it is added to the rest of its group and the ASCII data file is then sent on to the next processing stage. The data collected on paper forms are entered (either at the sites or by the Data Center) into computerized data entry versions of the forms to create ASCII files. Data that are entered manually go through an additional data verification step using a dual-entry system. The original scan and paper forms are shipped to the Data Center, and photocopies of these forms are stored at the research sites. The data collected via laptop, already in ASCII format, are sent by FTP (File Transfer Protocol) to the Data Center. Laptop data initially are given a visual check to identify any incomplete files or other structural errors so that these problems can be corrected prior to processing. Copies of the laptop data are stored by the Research Coordinators. All data shipped to the Data Center are copied to create backups of the data, with one copy being placed on the hard drive of at least two Data Center computers and another copy being burned to CD. The next stage of processing is to create SAS datasets from the ASCII files, also known as 'raw data', with SAS programs that are written specifically for each measure. Many measures have been created, modified, or converted from scan or paper forms to computerized measures, especially since the eighth year of the study. The Data Center keeps documentation of all changes made to any of the measures used in the project. It is during this stage of processing that the raw data are read into SAS, the variables are formatted and labeled, records are checked for to eliminate more than record per subject, outliers are resolved, and several more levels of error checking take place. The results of this processing are called the 'unscored dataset' and it is this dataset that goes on to the next level of processing. These datasets contain all of the variables read from the scan sheets as well as the results of other manipulations (for example, corrections made within the SAS program) that might be needed for that particular instrument. Recently, the programs were updated to take into account conversion from paper and pencil or scan sheets to computerized measures. Verifying Data SAS datasets are checked again to verify that the data have completed all stages of data processing and are ready to be posted to the server. This verification process is accomplished using an administrative checking program that compares the dataset in question against the master database containing the identification information for each intervention, control, and normative child. The administrative checking program identifies children with missing records or incorrect TCIDs (Target Child Identification Numbers). Missing records identified by the administrative program are cross-checked against the exception reports that are sent with each shipment of data. Exception reports list the records included in the data files, as well as the records missing from the data files and the reason for each missing record. The results of the administrative check are compared to the exception report of the verified list of missing records. Any discrepancies between these two reports are resolved through communication with the sites. At this point, the datasets are considered to be 'clean' and these 'unscored datasets' are placed on the FTP server for the data analysts to access. It is in this stage of the process that the 'scored datasets' are created. Data Analysis SAS datasets are
created for each measure for each year, site, and cohort that the measure
was administered. In the year 2000, the Data Center began creating aggregate
datasets (combining across sites and cohorts) to facilitate downloading
of datasets by data analysts. Errors found in the aggregate or scored
level data are corrected at the unscored level and all datasets are
corrected and replaced. Analysts at the Data Center and at each site
continue to prepare technical reports for each measure and each year
that the measure is administered, as well as develop scoring procedures
and scoring programs. The technical reports, scoring procedures, scoring
programs, and scored datasets are archived and distributed through the
Data Center. Documentation and Data Archive Documentation of the Fast Track Project is extensive, encompassing a variety of domains:
Fast Track Project Website The Data Center created and maintains the Fast Track Project website. The website provides an overview of the project, a list of the Principal Investigators and their Curriculum Vitae, contact information, and a FAQ (Frequently Asked Questions) section. The website also has a measures section and a link to the Data Center portion of the website. The measures section of the website has a list of all measures used in the project, with detailed descriptions. Measures developed by Fast Track are reproduced on the website; author contact information is provided for other measures. The Data Center portion of the website provides information about the general purpose of the Data Center; background information on the data collection, including an overview of its organization, content, and sampling procedures; a listing of measures and the years/cohorts to which they were administered; links to technical reports and scoring programs; and a search engine that allows users to search throughout the website for information or data on a specific topic. The Data Center seeks to make the website as useful as possible in providing information to the project researchers, as well as to the broader scientific community. Through the website, the Data Center can more promptly respond to requests received from researchers, throughout the United States and across the world, for information about the Fast Track project. The Data Center
website also provides information on the policy for releasing data to
researchers who are not affiliated with the Fast Track project, procedures
for requesting data, and data use agreement and application forms that
may be downloaded.
|
|
|
|
|