Simple cepumd Workflow
Source:vignettes/articles/simple-cepumd-workflow.Rmd
simple-cepumd-workflow.Rmd
Introduction
The following is an example of using cepumd
to calculate
a 2021 annual, weighted estimate of mean expenditures on pets for all of
the U.S. using CE integrated data.
I’ll need the following data and metadata sources:
- Interview survey data zip files for 2020 and 2021
- Diary survey data zip files for 2021
- Hierarchical grouping zip files
Data gathering
Data files can be downloaded from the CE PUMD Data Files page (it’s easiest to use CSV) and hierarchical grouping files can be downloaded from the CE PUMD Documentation page
To get the files I’ll first create a temporary directory and download
all of the files that I need from the BLS website into this directory.
You might choose to store your files differently, but this convention
will keep the files organized, it will keep the code simple, and
everything will be in a folder that will be easy to clean up after.
Since the BLS blocks third party applications I’ll add a user-agent to
identify myself in the download function that is stored in a variable
called cepumd_ua
(not shown).
ce_data_dir <- tempdir()
download.file(
"https://www.bls.gov/cex/pumd/data/comma/intrvw21.zip",
fs::path(ce_data_dir, "intrvw21.zip"),
mode = "wb",
headers = list(
"User-Agent" = cepumd_ua
)
)
download.file(
"https://www.bls.gov/cex/pumd/data/comma/intrvw20.zip",
fs::path(ce_data_dir, "intrvw20.zip"),
mode = "wb",
headers = list(
"User-Agent" = cepumd_ua
)
)
download.file(
"https://www.bls.gov/cex/pumd/data/comma/diary21.zip",
fs::path(ce_data_dir, "diary21.zip"),
mode = "wb",
headers = list(
"User-Agent" = cepumd_ua
)
)
download.file(
"https://www.bls.gov/cex/pumd/stubs.zip",
fs::path(ce_data_dir, "stubs.zip"),
mode = "wb",
headers = list(
"User-Agent" = cepumd_ua
)
)
Calculate CE weighted mean estimate
The first step in the workflow is to load the 2021 hierarchical
grouping file for integrated expenditures using ce_hg()
.
This file is used to get the correct UCCs associated with pet
expenditures.
Next, one can use ce_prep_data()
to correctly merge all
of the Interview and Diary survey files for calculating annual
estimates.
pet_data <- ce_prepdata(
2021,
integrated,
integ21_hg,
uccs = ce_uccs(integ21_hg, expenditure = "Pets", ucc_group = "PETS"),
dia_zp = file.path(ce_data_dir, "diary21.zip"),
int_zp = c(
file.path(ce_data_dir, "intrvw20.zip"),
file.path(ce_data_dir, "intrvw21.zip")
)
)
gt(head(pet_data))
newid | finlwt21 | wtrep01 | wtrep02 | wtrep03 | wtrep04 | wtrep05 | wtrep06 | wtrep07 | wtrep08 | wtrep09 | wtrep10 | wtrep11 | wtrep12 | wtrep13 | wtrep14 | wtrep15 | wtrep16 | wtrep17 | wtrep18 | wtrep19 | wtrep20 | wtrep21 | wtrep22 | wtrep23 | wtrep24 | wtrep25 | wtrep26 | wtrep27 | wtrep28 | wtrep29 | wtrep30 | wtrep31 | wtrep32 | wtrep33 | wtrep34 | wtrep35 | wtrep36 | wtrep37 | wtrep38 | wtrep39 | wtrep40 | wtrep41 | wtrep42 | wtrep43 | wtrep44 | mo_scope | popwt | ucc | ref_yr | ref_mo | cost | survey |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
04378034 | 15009.70 | 26582.38 | 30220.87 | 28559.33 | 0.00 | 28263.97 | 0.00 | 26801.74 | 30392.73 | 0.00 | 0.00 | 35877.18 | 0.00 | 0.00 | 31583.15 | 0.00 | 28192.04 | 0.00 | 0.00 | 27742.65 | 32402.03 | 32193.33 | 0.00 | 33288.04 | 33269.07 | 28259.38 | 31891.03 | 32534.43 | 0.00 | 0.00 | 0.00 | 26770.84 | 0.00 | 24617.21 | 31914.74 | 31437.74 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 33248.90 | 0.00 | 0.00 | 0.00 | 0 | 0 | NA | NA | NA | 0 | I |
04378044 | 14659.23 | 0.00 | 24636.01 | 25415.50 | 0.00 | 29842.23 | 26240.00 | 31559.69 | 0.00 | 0.00 | 32690.11 | 0.00 | 35219.62 | 0.00 | 0.00 | 32209.17 | 28073.98 | 0.00 | 30630.13 | 28545.87 | 0.00 | 22419.07 | 0.00 | 29752.76 | 35818.35 | 0.00 | 0.00 | 0.00 | 37168.17 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 32609.68 | 28785.20 | 32743.62 | 0.00 | 30564.90 | 0.00 | 0.00 | 0.00 | 33512.77 | 26508.03 | 28093.74 | 0 | 0 | NA | NA | NA | 0 | I |
04378144 | 13807.65 | 0.00 | 28979.32 | 27083.40 | 26693.86 | 27992.45 | 29585.03 | 0.00 | 29732.86 | 29316.57 | 27431.31 | 0.00 | 0.00 | 28173.63 | 0.00 | 22879.56 | 0.00 | 0.00 | 25564.56 | 26377.60 | 0.00 | 26336.40 | 27843.16 | 0.00 | 26624.50 | 0.00 | 26128.58 | 27506.85 | 0.00 | 0.00 | 0.00 | 27602.16 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 28507.23 | 31304.32 | 25729.67 | 0.00 | 32459.95 | 0.00 | 0.00 | 0.00 | 0 | 0 | NA | NA | NA | 0 | I |
04378174 | 34366.94 | 0.00 | 56197.54 | 0.00 | 68378.57 | 0.00 | 0.00 | 64913.65 | 51360.36 | 0.00 | 68007.76 | 68969.61 | 0.00 | 61009.71 | 0.00 | 73813.78 | 67483.74 | 0.00 | 0.00 | 0.00 | 61308.65 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 64421.14 | 80364.05 | 63777.95 | 0.00 | 68784.96 | 0.00 | 0.00 | 0.00 | 66196.82 | 69101.02 | 68260.70 | 69157.21 | 59350.62 | 0.00 | 56558.47 | 61551.35 | 63870.17 | 0.00 | 0.00 | 0 | 0 | NA | NA | NA | 0 | I |
04378194 | 26543.42 | 48347.47 | 0.00 | 52400.57 | 0.00 | 0.00 | 0.00 | 54452.42 | 56357.94 | 0.00 | 49279.82 | 0.00 | 49634.22 | 50323.44 | 0.00 | 0.00 | 51177.03 | 0.00 | 0.00 | 54998.37 | 0.00 | 50254.05 | 0.00 | 0.00 | 49948.46 | 50022.11 | 50772.96 | 0.00 | 48660.00 | 49470.47 | 48256.39 | 56070.84 | 51642.24 | 0.00 | 0.00 | 0.00 | 51286.78 | 0.00 | 49888.79 | 47868.12 | 55679.25 | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0 | NA | NA | NA | 0 | I |
04378264 | 15160.89 | 0.00 | 26527.27 | 0.00 | 35462.68 | 28120.91 | 0.00 | 0.00 | 0.00 | 31427.12 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 38128.59 | 34121.49 | 33596.86 | 0.00 | 29419.94 | 0.00 | 0.00 | 0.00 | 37406.35 | 34331.83 | 35157.93 | 32163.44 | 27623.50 | 0.00 | 26467.51 | 30457.72 | 32260.50 | 0.00 | 0.00 | 29017.90 | 0.00 | 34166.82 | 0.00 | 0.00 | 33442.30 | 34997.28 | 0.00 | 26329.75 | 25890.87 | 0.00 | 0 | 0 | NA | NA | NA | 0 | I |
The last step is to run ce_mean()
to get annual,
weighted pet expenditure estimates.
agg_exp | mean_exp | se | cv |
---|---|---|---|
101537912995 | 761.1465 | 50.24688 | 6.601473 |
Clean-up
Finally, now that the analysis is done, I’ll delete the temporary directory that contains all the CE data.
unlink(ce_data_dir, recursive = TRUE, force = TRUE)