Online Course
NRSG 795: BIOSTATISTICS FOR EVIDENCE-BASED PRACTICE
Module 1: Variables, Values, and Spreadsheets as Databases
Variables, Values, and Codebooks
Variables in healthcare data often reflect patient demographic characteristics (age, gender, marital status, education level, number of children, income), clinical indicators (diagnoses, allergies, lab values), and scores on specific measures (Beck Depression Inventory, satisfaction scores). These pieces of information are assigned numbers using different rules. For example gender will be represented with numbers (male=1 and female=2), income can be measured as dollars earned per year. Information about the variables in a dataset is described in a codebook or a data dictionary.
A codebook provides information on the structure, contents, and layout of a data file. Users are strongly encouraged to look at the codebook of a study before downloading and using the datafiles (examples of codebooks of national databases). While codebooks vary widely in quality and amount of information given, a typical codebook includes:
- Column locations and widths for each variable
- Definitions of different record types
- Response codes for each variable
- Codes used to indicate nonresponse and missing data
- Exact questions and skip patterns used in a survey
- Other indications of the content and characteristics of each variable
A data dictionary differs from a codebook in that it focuses more on providing a detailed description for each element or variable in the datasetl. Data dictionaries are used to document important and useful information such as a descriptive name, the data type, allowed values, units, and text description. A data dictionary provides a concise guide to understanding and using the data. The content of the datafile generally includes the following elements for each variable in the data file:
- Variable Name: Indicates the variable number or name assigned to each variable in the data collection.
- Variable Label: Indicates an abbreviated variable description to identify the variable for the user. In some cases, an expanded version of the variable name can be found in a variable description list.
- Code Value: Indicates the code values occurring in the data for this variable.
- Value Label: Indicates the textual definitions of the codes. Abbreviations commonly used in the code definitions are "DK" (Do Not Know), "NA" (Not Ascertained), and "INAP" (Inapplicable).
Ex: Variable name = education, variable label = highest educational level achieved
- (code value) = less than HS (value label)
- 1 = HS graduate/ GED
- 2 = some college
- 3 = Associates degree
- 4 = Bachelors degree
- 5 = Graduate degree
- 6 = missing data
- Missing Data Code: Indicates the values and labels of missing data.
Other terms describing a dataset that you might come across is whether the data is arranged as a ‘wide’ or a ‘long’ file. This refers to the format of the data layout. In a ‘wide’ dataset the variables are in columns which means each row represents the subject data (each row should contain a different subject often identified via an unique id number). This is the most common form of displaying data and the files we will use in class are in this format. In a ‘long’ dataset the variables are the rows and each column represents the subject. Data often needs to be in this format for some advanced statistics, particularly in longitudinal studies or studies that involve repeated collections of something over time.
Note: when creating figures and performing some types of analysis in Excel you may need to change the format of the data. This involves putting things that were in rows into columns or visa versa.
Learning Activity
This website is maintained by the University of Maryland School of Nursing (UMSON) Office of Learning Technologies. The UMSON logo and all other contents of this website are the sole property of UMSON and may not be used for any purpose without prior written consent. Links to other websites do not constitute or imply an endorsement of those sites, their content, or their products and services. Please send comments, corrections, and link improvements to nrsonline@umaryland.edu.