Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Skip to content

Natural Language Processing Text Data from Final Contractor/Grantee Reports and Evaluation Reports (2011-2021)

Metadata Updated: July 12, 2024

This data asset contains data files of text extracted from pdf reports on the Development Experience Clearinghouse (DEC) for the years 2011 to 2021 (as of July 2021). It includes three specific "Document types" identified by the DEC: Final Contractor/Grantee Report, Final Evaluation Report, and Special Evaluation. Each PDF document labeled as one of these three document types and labeled with a publication year from 2011 to 2021 was downloaded from the DEC in July 2011. The dataset includes text data files from 2,579 Final Contractor/Grantee Reports, 1,299 Final Evaluation reports, and 1,323 Special Evaluation reports. Raw text from each of these PDFs was extracted and saved as individual csv files, the names of which correspond to the Document ID of the PDF document on the DEC. Within each csv file, the raw text is split into paragraphs and corresponding sentences. In addition, to enable Natural Language Processing of the data, the sentences are cleaned by removing unnecessary special characters, punctuation, and numbers, and each word is stemmed to its root to remove inflections (e.g. pluralization and conjugation). This data could be used to analyze trends in USAID's programming approaches and terminology. This data was compiled for USAID/PPL/LER with the Program Cycle Mechanism.

Access & Use Information

Public: This dataset is intended for public access and use. License: See this page for license information.

Downloads & Resources


Metadata Created Date August 16, 2022
Metadata Updated Date July 12, 2024

Metadata Source

Harvested from USAID JSON

Additional Metadata

Resource Type Dataset
Metadata Created Date August 16, 2022
Metadata Updated Date July 12, 2024
Data First Published 2021-11-24
Data Last Modified 2024-07-11
Category Civil Society
Public Access Level public
Bureau Code 184:15
Metadata Context
Metadata Catalog ID
Schema Version
Catalog Describedby
Data Dictionary
Harvest Object Id c47a00eb-41d8-4cde-80f3-b87b728e9088
Harvest Source Id 0aeddf52-9cb0-4ab6-bc1f-b53172fe5348
Harvest Source Title USAID JSON
Homepage URL
Program Code 184:010
Source Datajson Identifier True
Source Hash 5ee02e36ff6b6d9ec7f104222ead150509a231db9fe58eb20313bd685d9f2ed4
Source Schema Version 1.1

Didn't find what you're looking for? Suggest a dataset here.