Population based cancer incidence rates were abstracted from National Cancer Institute,
State Cancer Profiles for all available counties in the United States for which data were
available. This is a national county-level database of cancer data that are collected by state
public health surveillance systems. All-site cancer is defined as any type of cancer that is
captured in the state registry data, though non-melanoma skin cancer is not included. All-site
age-adjusted cancer incidence rates were abstracted separately for males and females.
County-level annual age-adjusted all-site cancer incidence rates for years 2006–2010 were
available for 2687 of 3142 (85.5%) counties in the U.S. Counties for which there are fewer
than 16 reported cases in a specific area-sex-race category are suppressed to ensure
confidentiality and stability of rate estimates; this accounted for 14 counties in our study.
Two states, Kansas and Virginia, do not provide data because of state legislation and
regulations which prohibit the release of county level data to outside entities. Data from
Michigan does not include cases diagnosed in other states because data exchange
agreements prohibit the release of data to third parties. Finally, state data is not available for
three states, Minnesota, Ohio, and Washington. The age-adjusted average annual
incidence rate for all counties was 453.7 per 100,000 persons.
We selected 2006–2010 as it is subsequent in time to the EQI exposure data which was
constructed to represent the years 2000–2005. We also gathered data for the three leading
causes of cancer for males (lung, prostate, and colorectal) and females (lung, breast, and
The EQI was used as an exposure metric as an indicator of cumulative environmental
exposures at the county-level representing the period 2000 to 2005. A complete description
of the datasets used in the EQI are provided in Lobdell et al. and methods used for index
construction are described by Messer et al. The EQI was developed for the period 2000–
2005 because it was the time period for which the most recent data were available when
index construction was initiated. The EQI includes variables representing each of the
environmental domains. The air domain includes 87 variables representing criteria and
hazardous air pollutants. The water domain includes 80 variables representing overall water
quality, general water contamination, recreational water quality, drinking water quality,
atmospheric deposition, drought, and chemical contamination. The land domain includes 26
variables representing agriculture, pesticides, contaminants, facilities, and radon. The built
domain includes 14 variables representing roads, highway/road safety, public transit
behavior, business environment, and subsidized housing environment. The
sociodemographic environment includes 12 variables representing socioeconomics and
crime. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files.
This dataset is associated with the following publication:
Jagai, J., L. Messer, K. Rappazzo , C. Gray, S. Grabich , and D. Lobdell. County-level environmental quality and associations with cancer incidence#. Cancer. John Wiley & Sons Incorporated, New York, NY, USA, 123(15): 2901-2908, (2017).