Simple cepumd Workflow • cepumd

Introduction

The following is an example of using cepumd to calculate a 2021 annual, weighted estimate of mean expenditures on pets for all of the U.S. using CE integrated data.

I’ll need the following data and metadata sources:

Interview survey data zip files for 2020 and 2021
Diary survey data zip files for 2021
Hierarchical grouping zip files

Data gathering

Data files can be downloaded from the CE PUMD Data Files page (it’s easiest to use CSV) and hierarchical grouping files can be downloaded from the CE PUMD Documentation page

To get the files I’ll first create a temporary directory and download all of the files that I need from the BLS website into this directory. You might choose to store your files differently, but this convention will keep the files organized, it will keep the code simple, and everything will be in a folder that will be easy to clean up after. Since the BLS blocks third party applications I’ll add a user-agent to identify myself in the download function that is stored in a variable called cepumd_ua (not shown).

ce_data_dir <- tempdir()

download.file(
  "https://www.bls.gov/cex/pumd/data/comma/intrvw21.zip",
  fs::path(ce_data_dir, "intrvw21.zip"),
  mode = "wb",
  headers = list(
    "User-Agent" = cepumd_ua
  )
)

download.file(
  "https://www.bls.gov/cex/pumd/data/comma/intrvw20.zip",
  fs::path(ce_data_dir, "intrvw20.zip"),
  mode = "wb",
  headers = list(
    "User-Agent" = cepumd_ua
  )
)

download.file(
  "https://www.bls.gov/cex/pumd/data/comma/diary21.zip",
  fs::path(ce_data_dir, "diary21.zip"),
  mode = "wb",
  headers = list(
    "User-Agent" = cepumd_ua
  )
)

download.file(
  "https://www.bls.gov/cex/pumd/stubs.zip",
  fs::path(ce_data_dir, "stubs.zip"),
  mode = "wb",
  headers = list(
    "User-Agent" = cepumd_ua
  )
)

Calculate CE weighted mean estimate

The first step in the workflow is to load the 2021 hierarchical grouping file for integrated expenditures using ce_hg(). This file is used to get the correct UCCs associated with pet expenditures.

integ21_hg <- ce_hg(
  2021,
  integrated,
  hg_zip_path = file.path(ce_data_dir, "stubs.zip")
)

Next, one can use ce_prep_data() to correctly merge all of the Interview and Diary survey files for calculating annual estimates.

pet_data <- ce_prepdata(
  2021,
  integrated,
  integ21_hg,
  uccs = ce_uccs(integ21_hg, expenditure = "Pets", ucc_group = "PETS"),
  dia_zp = file.path(ce_data_dir, "diary21.zip"),
  int_zp = c(
    file.path(ce_data_dir, "intrvw20.zip"),
    file.path(ce_data_dir, "intrvw21.zip")
  )
)

gt(head(pet_data))

newid	finlwt21	wtrep01	wtrep02	wtrep03	wtrep04	wtrep05	wtrep06	wtrep07	wtrep08	wtrep09	wtrep10	wtrep11	wtrep12	wtrep13	wtrep14	wtrep15	wtrep16	wtrep17	wtrep18	wtrep19	wtrep20	wtrep21	wtrep22	wtrep23	wtrep24	wtrep25	wtrep26	wtrep27	wtrep28	wtrep29	wtrep30	wtrep31	wtrep32	wtrep33	wtrep34	wtrep35	wtrep36	wtrep37	wtrep38	wtrep39	wtrep40	wtrep41	wtrep42	wtrep43	wtrep44	ucc	ref_yr	ref_mo	survey
04378034	15009.70	26582.38	30220.87	28559.33	0.00	28263.97	0.00	26801.74	30392.73	0.00	0.00	35877.18	0.00	0.00	31583.15	0.00	28192.04	0.00	0.00	27742.65	32402.03	32193.33	0.00	33288.04	33269.07	28259.38	31891.03	32534.43	0.00	0.00	0.00	26770.84	0.00	24617.21	31914.74	31437.74	0.00	0.00	0.00	0.00	0.00	33248.90	0.00	0.00	0.00	NA	NA	NA	I
04378044	14659.23	0.00	24636.01	25415.50	0.00	29842.23	26240.00	31559.69	0.00	0.00	32690.11	0.00	35219.62	0.00	0.00	32209.17	28073.98	0.00	30630.13	28545.87	0.00	22419.07	0.00	29752.76	35818.35	0.00	0.00	0.00	37168.17	0.00	0.00	0.00	0.00	0.00	32609.68	28785.20	32743.62	0.00	30564.90	0.00	0.00	0.00	33512.77	26508.03	28093.74	NA	NA	NA	I
04378144	13807.65	0.00	28979.32	27083.40	26693.86	27992.45	29585.03	0.00	29732.86	29316.57	27431.31	0.00	0.00	28173.63	0.00	22879.56	0.00	0.00	25564.56	26377.60	0.00	26336.40	27843.16	0.00	26624.50	0.00	26128.58	27506.85	0.00	0.00	0.00	27602.16	0.00	0.00	0.00	0.00	0.00	28507.23	31304.32	25729.67	0.00	32459.95	0.00	0.00	0.00	NA	NA	NA	I
04378174	34366.94	0.00	56197.54	0.00	68378.57	0.00	0.00	64913.65	51360.36	0.00	68007.76	68969.61	0.00	61009.71	0.00	73813.78	67483.74	0.00	0.00	0.00	61308.65	0.00	0.00	0.00	0.00	0.00	64421.14	80364.05	63777.95	0.00	68784.96	0.00	0.00	0.00	66196.82	69101.02	68260.70	69157.21	59350.62	0.00	56558.47	61551.35	63870.17	0.00	0.00	NA	NA	NA	I
04378194	26543.42	48347.47	0.00	52400.57	0.00	0.00	0.00	54452.42	56357.94	0.00	49279.82	0.00	49634.22	50323.44	0.00	0.00	51177.03	0.00	0.00	54998.37	0.00	50254.05	0.00	0.00	49948.46	50022.11	50772.96	0.00	48660.00	49470.47	48256.39	56070.84	51642.24	0.00	0.00	0.00	51286.78	0.00	49888.79	47868.12	55679.25	0.00	0.00	0.00	0.00	NA	NA	NA	I
04378264	15160.89	0.00	26527.27	0.00	35462.68	28120.91	0.00	0.00	0.00	31427.12	0.00	0.00	0.00	0.00	0.00	38128.59	34121.49	33596.86	0.00	29419.94	0.00	0.00	0.00	37406.35	34331.83	35157.93	32163.44	27623.50	0.00	26467.51	30457.72	32260.50	0.00	0.00	29017.90	0.00	34166.82	0.00	0.00	33442.30	34997.28	0.00	26329.75	25890.87	0.00	NA	NA	NA	I

The last step is to run ce_mean() to get annual, weighted pet expenditure estimates.

ce_mean(pet_data) |>
  gt()

agg_exp	mean_exp	se	cv
101537912995	761.1465	50.24688	6.601473

Clean-up

Finally, now that the analysis is done, I’ll delete the temporary directory that contains all the CE data.

unlink(ce_data_dir, recursive = TRUE, force = TRUE)