Introducing nhanesA

Christopher J. Endres

2021-01-30

Background

nhanesA was developed to enable fully customizable retrieval of data from the National Health and Nutrition Examination Survey (NHANES). The survey is conducted by the National Center for Health Statistics (NCHS), and data are publicly available at: https://www.cdc.gov/nchs/nhanes.htm. NHANES data are reported in well over one thousand peer-reviewed journal publications every year.

NHANES Data

Since 1999, the NHANES survey has been conducted continuously, and the surveys during that period are referred to as “continuous NHANES” to distinguish from several prior surveys. Continuous NHANES surveys are grouped in two-year intervals, with the first interval being 1999-2000.

Most NHANES data are in the form of tables in SAS ‘XPT’ format. The survey is grouped into five data categories that are publicly available, as well as an additional category (Limited access data) that requires written justification and prior approval before access. Package nhanesA is intended mostly for use with the publicly available data, but some information pertaining to the limited access data can also be retrieved.

The five publicly available data categories are: - Demographics (DEMO) - Dietary (DIET) - Examination (EXAM) - Laboratory (LAB) - Questionnaire (Q)

The abbreviated forms in parentheses may be substituted for the long form in nhanesA commands.

For limited access data, the available tables and variable names can be listed, but the data cannot be downloaded directly. To indicate limited access data in nhanesA functions, use: - Limited (LTD)

List NHANES Tables

To quickly get familiar with NHANES data, it is helpful to display a listing of tables. Use nhanesTables to get information on tables that are available for a given category for a given year.

library(nhanesA)
nhanesTables('EXAM', 2005)
##    Data.File.Name                             Data.File.Description
## 1           BPX_D                                    Blood Pressure
## 2           BMX_D                                     Body Measures
## 3           AUX_D                                        Audiometry
## 4        AUXTYM_D                         Audiometry - Tympanometry
## 5        DXXFEM_D          Dual Energy X-ray Absorptiometry - Femur
## 6        OPXFDT_D     Ophthalmology - Frequency Doubling Technology
## 7           OHX_D                                       Oral Health
## 8        PAXRAW_D                         Physical Activity Monitor
## 9           VIX_D                                            Vision
## 10        DXXAG_D Dual Energy X-ray Absorptiometry - Android/Gynoid
## 11        AUXAR_D                      Audiometry - Acoustic Reflex
## 12       OPXRET_D                   Ophthalmology - Retinal Imaging
## 13       DXXSPN_D          Dual Energy X-ray Absorptiometry - Spine

Note that the two-year survey intervals begin with the odd year. For convenience, only a single 4-digit year is entered such that nhanesTables('EXAM', 2005) and nhanesTables('EXAM', 2006) yield identical output.

List Variables in an NHANES Table

After viewing the output, we decide we are interested in table ‘BMX_D’ that contains body measures data. To better determine if that table is of interest, we can display detailed information on the table contents using nhanesTableVars.

nhanesTableVars('EXAM', 'BMX_D')
##    Variable.Name                Variable.Description
## 1       BMDSTATS Body Measures Component status Code
## 2        BMIARMC           Arm Circumference Comment
## 3        BMIARML            Upper Arm Length Comment
## 4        BMICALF                Maximal Calf Comment
## 5        BMIHEAD          Head Circumference Comment
## 6          BMIHT             Standing Height Comment
## 7         BMILEG            Upper Leg Length Comment
## 8       BMIRECUM            Recumbent Length Comment
## 9         BMISUB        Subscapular Skinfold Comment
## 10      BMITHICR         Thigh Circumference Comment
## 11        BMITRI            Triceps Skinfold Comment
## 12      BMIWAIST         Waist Circumference Comment
## 13         BMIWT                      Weight Comment
## 14       BMXARMC              Arm Circumference (cm)
## 15       BMXARML               Upper Arm Length (cm)
## 16        BMXBMI           Body Mass Index (kg/m**2)
## 17       BMXCALF     Maximal Calf Circumference (cm)
## 18       BMXHEAD             Head Circumference (cm)
## 19         BMXHT                Standing Height (cm)
## 20        BMXLEG               Upper Leg Length (cm)
## 21      BMXRECUM               Recumbent Length (cm)
## 22        BMXSUB           Subscapular Skinfold (mm)
## 23      BMXTHICR            Thigh Circumference (cm)
## 24        BMXTRI               Triceps Skinfold (mm)
## 25      BMXWAIST            Waist Circumference (cm)
## 26         BMXWT                         Weight (kg)
## 27          SEQN         Respondent sequence number.

We see that there are 27 columns in table BMX_D. The column SEQN is the respondent sequence number and is included in every NHANES table. Effectively, SEQN is a subject identifier that is used to join information across tables.

Import NHANES Tables

We now import BMX_D along with the demographics table DEMO_D.

bmx_d  <- nhanes('BMX_D')
## Processing SAS dataset BMX_D      ..
demo_d <- nhanes('DEMO_D')
## Processing SAS dataset DEMO_D     ..

We merge the tables and display several variables:

bmx_demo <- merge(demo_d, bmx_d)
options(digits=4)
select_cols <- c('RIAGENDR', 'BMXHT', 'BMXWT', 'BMXLEG', 'BMXCALF', 'BMXTHICR')
print(bmx_demo[5:8,select_cols], row.names=FALSE)
##  RIAGENDR BMXHT BMXWT BMXLEG BMXCALF BMXTHICR
##         2 156.0  75.2   38.0    36.6     53.7
##         1 167.6  69.5   40.4    35.6     48.0
##         2 163.7  45.0   39.2    31.7     41.3
##         1 182.4 101.9   41.5    42.6     50.5

Translation of Coded Values

NHANES uses coded values for many fields. In the preceding example, gender is coded as 1 or 2. To determine what the values mean, we can list the code translations for the gender field RIAGENDR in table DEMO_D

nhanesTranslate('DEMO_D', 'RIAGENDR')
## $RIAGENDR
##   Code.or.Value Value.Description
## 1             1              Male
## 2             2            Female
## 3             .           Missing

If desired, we can use nhanesTranslate to apply the code translation to demo_d directly by assigning data=demo_d.

demo_d <- nhanesTranslate('DEMO_D', 'RIAGENDR', data=demo_d)
## Translated columns: RIAGENDR
bmx_demo <- merge(demo_d, bmx_d)

The RIAGENDR field is now recoded as Male, Female instead of 1,2.

print(bmx_demo[5:8,select_cols], row.names=FALSE)
##  RIAGENDR BMXHT BMXWT BMXLEG BMXCALF BMXTHICR
##    Female 156.0  75.2   38.0    36.6     53.7
##      Male 167.6  69.5   40.4    35.6     48.0
##    Female 163.7  45.0   39.2    31.7     41.3
##      Male 182.4 101.9   41.5    42.6     50.5

Apply All Possible Code Translations to a Table

An NHANES table may have dozens of columns with coded values. Translating all possible columns is a three step process. 1: Download the table 2: Save a list of table variables 3: Pass the table and variable list to nhanesTranslate

bpx_d <- nhanes('BPX_D')
## Processing SAS dataset BPX_D      ..
head(bpx_d[,6:11])
##   BPQ150A BPQ150B BPQ150C BPQ150D BPAARM BPACSZ
## 1      NA      NA      NA      NA     NA     NA
## 2       2       2       2       2      1      3
## 3       1       2       2       2      1      4
## 4       2       2       2       2      1      3
## 5       2       2       2       2      1      4
## 6       2       2       2       2      1      4
bpx_d_vars  <- nhanesTableVars('EXAM', 'BPX_D', namesonly=TRUE)
#Alternatively may use bpx_d_vars = names(bpx_d)
bpx_d <- suppressWarnings(nhanesTranslate('BPX_D', bpx_d_vars, data=bpx_d))
## Translated columns: BPAARM BPACSZ BPAEN2 BPAEN3 BPAEN4 BPQ150A BPQ150B BPQ150C BPQ150D BPXPTY BPXPULS PEASCCT1 PEASCST1
head(bpx_d[,6:11])
##   BPQ150A BPQ150B BPQ150C BPQ150D BPAARM        BPACSZ
## 1    <NA>    <NA>    <NA>    <NA>   <NA>          <NA>
## 2      No      No      No      No  Right Adult (12X22)
## 3     Yes      No      No      No  Right Large (15X32)
## 4      No      No      No      No  Right Adult (12X22)
## 5      No      No      No      No  Right Large (15X32)
## 6      No      No      No      No  Right Large (15X32)

Some discretion should be applied when translating coded columns as code translations can be quite long. To improve readability the translation string is restricted to a default length of 32 but can be set as high as 128. Also, columns that have at least two categories (e.g. Male, Female) will be translated, but mincategories can be set to 1 to perform the translation even if only a single category is present.

Downloading a Complete Survey

The primary goal of nhanesA is to enable fully customizable processing of select NHANES tables. However, it is quite easy to download entire surveys using nhanesA functions. Say we want to download every questionnaire in the 2007-2008 survey. We first get a list of the table names by using nhanesTables with namesonly = TRUE. The tables can then be downloaded using nhanes with lapply.

q2007names  <- nhanesTables('Q', 2007, namesonly=TRUE)
q2007tables <- lapply(q2007names, nhanes)
names(q2007tables) <- q2007names

Import Dual X-Ray Absorptiometry Data

Dual X-Ray Absorptiometry (DXA) data were acquired from 1999-2006. More information may be found at https://wwwn.cdc.gov/nchs/nhanes/dxa/dxa.aspx. By default the DXA data are imported into the R environment, however, because the tables are quite large it may be desirable to save the data to a local file then import to R as needed. When nhanesTranslate is applied to DXA data, only the 2005-2006 translation tables are used as those are the only DXA codes that are currently available in html format.

#Import into R
dxx_b <- nhanesDXA(2001)
#Save to file
nhanesDXA(2001, destfile="dxx_b.xpt")
#Import supplemental data
dxx_c_s <- nhanesDXA(2003, suppl=TRUE)
#Apply code translations
dxalist <- c('DXAEXSTS', 'DXITOT', 'DXIHE')
dxx_b <- nhanesTranslate(colnames=dxalist, data=dxx_b, dxa=TRUE)

If you are interested in working with accelerometer data from 2003-2006 then please see packages nhanesaccel https://r-forge.r-project.org/R/?group_id=1733 and accelerometry https://cran.r-project.org/package=accelerometry.

Searching across the comprehensive list of NHANES variables

The NHANES repository is extensive, thus it is helpful to perform a targeted search to identify relevant tables and variables. Comprehensive lists of NHANES variables are maintained for each data group. For example, the demographics variables are available at https://wwwn.cdc.gov/nchs/nhanes/search/variablelist.aspx?Component=Demographics. The nhanesSearch function allows the investigator to input search terms, match against the comprehensive variable descriptions, and retrieve the list of matching variables. Matching search terms (variable descriptions must contain one of the terms) and exclusive search terms (variable descriptions must NOT contain any exclusive terms) may be provided. The search can be restricted to a specific survey range as well as specific data groups.

# nhanesSearch use examples
#
# Search on the word bladder, restrict to the 2001-2008 surveys, 
# print out 50 characters of the variable description
nhanesSearch("bladder", ystart=2001, ystop=2008, nchar=50)
#
# Search on "urin" (will match urine, urinary, etc), from 1999-2010, return table names only
nhanesSearch("urin", ignore.case=TRUE, ystop=2010, namesonly=TRUE)
#
# Search on "urin", exclude "During", search surveys from 1999-2010, return table names only
nhanesSearch("urin", exclude_terms="during", ignore.case=TRUE, ystop=2010, namesonly=TRUE)
#
# Restrict search to 'EXAM' and 'LAB' data groups. Explicitly list matching and exclude terms, leave ignore.case set to default value of FALSE. Search surveys from 2009 to present.
nhanesSearch(c("urin", "Urin"), exclude_terms=c("During", "eaten during", "do during"), data_group=c('EXAM', 'LAB'), ystart=2009)
#
# Search on "tooth" or "teeth", all years
nhanesSearch(c("tooth", "teeth"), ignore.case=TRUE)
#
# Search for variables where the variable description begins with "Tooth"
nhanesSearch("^Tooth")

Searching for tables that contain a specific variable

nhanesSearch is a versatile search function as it imports the comprehensive variable lists to a data frame. That allows for detailed conditional extraction of the variables. However, each call to nhanesSearch takes up to a minute or more to process. Faster processing can be achieved when we know the name of a specific variable of interest and we look only for exact matches to the variable name. Function nhanesSearchVarName matches a given variable name in the html directly, then only the matching elements are converted to a data frame. Consequently, a call to nhanesSearchVarName executes much faster than nhanesSearch; typically under 30s. nhanesSearchVarName is useful for finding all data tables that contain a given variable.

#nhanesSearchVarName use examples

nhanesSearchVarName('BPXPULS')
##  [1] "BPX_D" "BPX_E" "BPX"   "BPX_C" "BPX_B" "BPX_F" "BPX_G" "BPX_H" "BPX_I"
## [10] "BPX_J"
nhanesSearchVarName('CSQ260i', includerdc=TRUE, nchar=38, namesonly=FALSE)
##   Variable.Name                   Variable.Description Data.File.Name
## 1       CSQ260i Do you now have any of the following p        CSX_G_R
## 2       CSQ260i Do you now have any of the following p          CSX_H
##   Data.File.Description Begin.Year EndYear   Component Use.Constraints
## 1         Taste & Smell       2012    2012 Examination        RDC Only
## 2         Taste & Smell       2013    2014 Examination            None

Searching for tables by name pattern

In order to group data across surveys, it is useful to list all available tables that follow a given naming pattern. Function nhanesSearchTableNames is used for such pattern matching. For example, if we want to work with all available body measures data we can retrieve the full list of available tables with nhanesSearchTableNames(‘BMX’). The search is conducted over the comprehensive table list, which is much smaller than the comprehensive variable list, such that a call to nhanesSearchTableNames takes only a few seconds.

# nhanesSearchTableNames use examples
nhanesSearchTableNames('BMX')
##  [1] "BMX_D" "BMX"   "BMX_E" "BMX_C" "BMX_B" "BMX_F" "BMX_H" "BMX_G" "BMX_I"
## [10] "BMX_J"
nhanesSearchTableNames('HPVS', includerdc=TRUE, nchar=42, details=TRUE)
##        Years                             Data.File.Name     Doc.File
## 1  2009-2010 Human Papillomavirus (HPV) - 6, 11, 16 & 1 HPVSER_F Doc
## 2  2005-2006 Human Papillomavirus (HPV) - 6, 11, 16 & 1 HPVS_D_R Doc
## 3  2007-2008 Human Papillomavirus (HPV) - 6, 11, 16 & 1 HPVSER_E Doc
## 4  2005-2006 Human Papillomavirus (HPV) - 6, 11, 16 & 1 HPVSER_D Doc
## 5  2005-2006 Human Papillomavirus (HPV) - Multiplexed 6 HPVSRM_D Doc
## 6  2005-2006 Human Papillomavirus (HPV) DNA - Vaginal S HPVSWR_D Doc
## 7  2007-2008 Human Papillomavirus (HPV) DNA - Vaginal S HPVSWR_E Doc
## 8  2009-2010 Human Papillomavirus (HPV) DNA - Vaginal S HPVSWR_F Doc
## 9  2011-2012 Human Papillomavirus (HPV) DNA - Vaginal S HPVSWR_G Doc
## 10 2005-2006 Human Papillomavirus (HPV) DNA - Vaginal S HPVSWR_D Doc
## 11 2009-2010 Human Papillomavirus (HPV) DNA - Vaginal S HPVS_F_R Doc
## 12 2011-2012 Human Papillomavirus (HPV) DNA - Vaginal S HPVS_G_R Doc
## 13 2013-2014 Human Papillomavirus (HPV) DNA - Vaginal S HPVSWR_H Doc
## 14 2013-2014 Human Papillomavirus (HPV) DNA - Vaginal S HPVS_H_R Doc
## 15 2015-2016 Human Papillomavirus (HPV) DNA - Vaginal S HPVSWC_I Doc
## 16 2015-2016 Human Papillomavirus (HPV) DNA - Vaginal S HPVSWR_I Doc
## 17 2015-2016 Human Papillomavirus (HPV) DNA - Vaginal S HPVS_I_R Doc
## 18 2017-2018 Human Papillomavirus (HPV) DNA - Vaginal S HPVS_J_R Doc
##                         Data.File        Date.Published
## 1  HPVSER_F Data [XPT - 171.6 KB]         November 2013
## 2                        RDC Only             July 2013
## 3  HPVSER_E Data [XPT - 155.7 KB]         November 2013
## 4  HPVSER_D Data [XPT - 151.6 KB]             July 2013
## 5  HPVSRM_D Data [XPT - 302.6 KB]          January 2015
## 6  HPVSWR_D Data [XPT - 694.4 KB]         November 2010
## 7  HPVSWR_E Data [XPT - 677.9 KB]           August 2012
## 8  HPVSWR_F Data [XPT - 725.2 KB]           August 2012
## 9  HPVSWR_G Data [XPT - 661.1 KB]            March 2015
## 10 HPVSWR_D Data [XPT - 694.4 KB] Updated November 2018
## 11                       RDC Only           August 2012
## 12                       RDC Only            March 2015
## 13 HPVSWR_H Data [XPT - 716.6 KB]         December 2016
## 14                       RDC Only         December 2016
## 15  HPVSWC_I Data [XPT - 33.3 KB]         November 2018
## 16 HPVSWR_I Data [XPT - 667.5 KB]         November 2018
## 17                       RDC Only         November 2018
## 18                       RDC Only         December 2020

Please send any feedback or requests to . Hope you enjoy your experience with nhanesA!

Sincerely,
Christopher J. Endres