Skip to contents

Introduction

The following is an example of using cepumd to calculate a 2021 annual, weighted estimate of mean expenditures on pets for all of the U.S. using CE integrated data.

I’ll need the following data and metadata sources:

  • Interview survey data zip files for 2020 and 2021
  • Diary survey data zip files for 2021
  • Hierarchical grouping zip files

Data gathering

Data files can be downloaded from the CE PUMD Data Files page (it’s easiest to use CSV) and hierarchical grouping files can be downloaded from the CE PUMD Documentation page

To get the files I’ll first create a temporary directory and download all of the files that I need from the BLS website into this directory. You might choose to store your files differently, but this convention will keep the files organized, it will keep the code simple, and everything will be in a folder that will be easy to clean up after. Since the BLS blocks third party applications I’ll add a user-agent to identify myself in the download function that is stored in a variable called cepumd_ua (not shown).

ce_data_dir <- tempdir()

download.file(
  "https://www.bls.gov/cex/pumd/data/comma/intrvw21.zip",
  fs::path(ce_data_dir, "intrvw21.zip"),
  mode = "wb",
  headers = list(
    "User-Agent" = cepumd_ua
  )
)

download.file(
  "https://www.bls.gov/cex/pumd/data/comma/intrvw20.zip",
  fs::path(ce_data_dir, "intrvw20.zip"),
  mode = "wb",
  headers = list(
    "User-Agent" = cepumd_ua
  )
)

download.file(
  "https://www.bls.gov/cex/pumd/data/comma/diary21.zip",
  fs::path(ce_data_dir, "diary21.zip"),
  mode = "wb",
  headers = list(
    "User-Agent" = cepumd_ua
  )
)

download.file(
  "https://www.bls.gov/cex/pumd/stubs.zip",
  fs::path(ce_data_dir, "stubs.zip"),
  mode = "wb",
  headers = list(
    "User-Agent" = cepumd_ua
  )
)

Calculate CE weighted mean estimate

The first step in the workflow is to load the 2021 hierarchical grouping file for integrated expenditures using ce_hg(). This file is used to get the correct UCCs associated with pet expenditures.

integ21_hg <- ce_hg(
  2021,
  integrated,
  hg_zip_path = file.path(ce_data_dir, "stubs.zip")
)

Next, one can use ce_prep_data() to correctly merge all of the Interview and Diary survey files for calculating annual estimates.

pet_data <- ce_prepdata(
  2021,
  integrated,
  integ21_hg,
  uccs = ce_uccs(integ21_hg, expenditure = "Pets", ucc_group = "PETS"),
  dia_zp = file.path(ce_data_dir, "diary21.zip"),
  int_zp = c(
    file.path(ce_data_dir, "intrvw20.zip"),
    file.path(ce_data_dir, "intrvw21.zip")
  )
)

gt(head(pet_data))
newid finlwt21 wtrep01 wtrep02 wtrep03 wtrep04 wtrep05 wtrep06 wtrep07 wtrep08 wtrep09 wtrep10 wtrep11 wtrep12 wtrep13 wtrep14 wtrep15 wtrep16 wtrep17 wtrep18 wtrep19 wtrep20 wtrep21 wtrep22 wtrep23 wtrep24 wtrep25 wtrep26 wtrep27 wtrep28 wtrep29 wtrep30 wtrep31 wtrep32 wtrep33 wtrep34 wtrep35 wtrep36 wtrep37 wtrep38 wtrep39 wtrep40 wtrep41 wtrep42 wtrep43 wtrep44 mo_scope popwt ucc ref_yr ref_mo cost survey
04378034 15009.70 26582.38 30220.87 28559.33 0.00 28263.97 0.00 26801.74 30392.73 0.00 0.00 35877.18 0.00 0.00 31583.15 0.00 28192.04 0.00 0.00 27742.65 32402.03 32193.33 0.00 33288.04 33269.07 28259.38 31891.03 32534.43 0.00 0.00 0.00 26770.84 0.00 24617.21 31914.74 31437.74 0.00 0.00 0.00 0.00 0.00 33248.90 0.00 0.00 0.00 0 0 NA NA NA 0 I
04378044 14659.23 0.00 24636.01 25415.50 0.00 29842.23 26240.00 31559.69 0.00 0.00 32690.11 0.00 35219.62 0.00 0.00 32209.17 28073.98 0.00 30630.13 28545.87 0.00 22419.07 0.00 29752.76 35818.35 0.00 0.00 0.00 37168.17 0.00 0.00 0.00 0.00 0.00 32609.68 28785.20 32743.62 0.00 30564.90 0.00 0.00 0.00 33512.77 26508.03 28093.74 0 0 NA NA NA 0 I
04378144 13807.65 0.00 28979.32 27083.40 26693.86 27992.45 29585.03 0.00 29732.86 29316.57 27431.31 0.00 0.00 28173.63 0.00 22879.56 0.00 0.00 25564.56 26377.60 0.00 26336.40 27843.16 0.00 26624.50 0.00 26128.58 27506.85 0.00 0.00 0.00 27602.16 0.00 0.00 0.00 0.00 0.00 28507.23 31304.32 25729.67 0.00 32459.95 0.00 0.00 0.00 0 0 NA NA NA 0 I
04378174 34366.94 0.00 56197.54 0.00 68378.57 0.00 0.00 64913.65 51360.36 0.00 68007.76 68969.61 0.00 61009.71 0.00 73813.78 67483.74 0.00 0.00 0.00 61308.65 0.00 0.00 0.00 0.00 0.00 64421.14 80364.05 63777.95 0.00 68784.96 0.00 0.00 0.00 66196.82 69101.02 68260.70 69157.21 59350.62 0.00 56558.47 61551.35 63870.17 0.00 0.00 0 0 NA NA NA 0 I
04378194 26543.42 48347.47 0.00 52400.57 0.00 0.00 0.00 54452.42 56357.94 0.00 49279.82 0.00 49634.22 50323.44 0.00 0.00 51177.03 0.00 0.00 54998.37 0.00 50254.05 0.00 0.00 49948.46 50022.11 50772.96 0.00 48660.00 49470.47 48256.39 56070.84 51642.24 0.00 0.00 0.00 51286.78 0.00 49888.79 47868.12 55679.25 0.00 0.00 0.00 0.00 0 0 NA NA NA 0 I
04378264 15160.89 0.00 26527.27 0.00 35462.68 28120.91 0.00 0.00 0.00 31427.12 0.00 0.00 0.00 0.00 0.00 38128.59 34121.49 33596.86 0.00 29419.94 0.00 0.00 0.00 37406.35 34331.83 35157.93 32163.44 27623.50 0.00 26467.51 30457.72 32260.50 0.00 0.00 29017.90 0.00 34166.82 0.00 0.00 33442.30 34997.28 0.00 26329.75 25890.87 0.00 0 0 NA NA NA 0 I

The last step is to run ce_mean() to get annual, weighted pet expenditure estimates.

ce_mean(pet_data) |>
  gt()
agg_exp mean_exp se cv
101537912995 761.1465 50.24688 6.601473

Clean-up

Finally, now that the analysis is done, I’ll delete the temporary directory that contains all the CE data.

unlink(ce_data_dir, recursive = TRUE, force = TRUE)