Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Skip to content

Try the next-generation Data Catalog at catalog-beta.data.gov and help shape it with your feedback.

Functional annotation for 15 diverse arthropod genomes

Metadata Updated: December 2, 2025

We present the annotation results of 15 arthropod proteomes using an open source, open access and containerized pipeline for genome-scale functional annotation of insect proteomes and apply it to a diverse range of arthropod species. You can find more information about the pipeline at our readthedocs site. The files for each genome include GOanna, InterproScan and KOBAS predictions.

Arthropod genomes selected for this study and their assembly and annotation statistics.

  1. Apis Mellifera (honey bee)
  2. Drosophila melanogaster (fruit fly)
  3. Tribolium castaneum (red flour beetle)
  4. Latrodectus hesperus (Western black widow spider)
  5. Limnephilus lunatus (caddisfly)
  6. Oncopeltus fasciatus (Large milkweed bug)
  7. Homalodisca vitripennis (Glassy-winged sharpshooter)
  8. Eurytemora affinis (calanoid copepod)
  9. Agrilus planipennis (emerald ash borer)
  10. Copidosoma floridanum (parasitoid wasp)
  11. Athalia rosae (turnip sawfly)
  12. Ceratitis capitata (Mediterranean fruit fly)
  13. Cimex lectularius (Cimicidae bed bug)
  14. Varroa destructor(parasitic mite)
  15. Diaphorina citri (Asian citrus psyllid)


    Resources in this dataset:
    • Resource Title: Cimex lectularius (Cimicidae bed bug) annotation.

      File Name: CLEC.tar.gz

      Resource Description: Functional annotation for Clec-OGSv1.2 protein set


    • Resource Title: Tribolium castaneum (red flour beetle) annotation.

      File Name: TCAS.tar.gz

      Resource Description: Functional annotation for TCAS_OGS_v3 protein set


    • Resource Title: Drosophila melanogaster (fruit fly) annotation.

      File Name: DMEL.tar.gz

      Resource Description: Functional annotation for DMEL_r6.38 protein set


  • Resource Title: Varroa destructor (parasitic mite) annotation.

    File Name: VDES.tar.gz

    Resource Description: Functional annotation for NCBI Varroa destructor Annotation Release 100 protein set based on Vdes_3.0 genome (GCA_002443255.1)


  • Resource Title: Oncopeltus fasciatus (Large milkweed bug) annotation.

    File Name: ONCFAS.tar.gz

    Resource Description: Functional annotation for oncfas_OGSv1.2 protein set


  • Resource Title: Apis Mellifera (honey bee) annotation.

    File Name: AMEL.tar.gz

    Resource Description: Functional annotation for OGSv3.3 protein set from Amel_4.5 genome (GCA_000002195.1)


  • Resource Title: Homalodisca vitripennis (Glassy-winged sharpshooter) annotation.

    File Name: HVIT.tar.gz

    Resource Description: Functional annotation for HVIT-BCM_version_0.5.3 protein set based on Hvit_1.0 genome (GCA_000696855.1)


  • Resource Title: Limnephilus lunatus (caddisfly) annotation.

    File Name: LLUN.tar.gz

    Resource Description: Functional annotation for LLUN-BCM_version_0.5.3 protein set from Llun_1.0 genome (GCA_000648945.1)


  • Resource Title: Latrodectus hesperus (Western black widow spider) annotation.

    File Name: LHES.tar.gz

    Resource Description: Functional annotation for LHES-BCM_version_0.5.3 protein set from Lhes_1.0 genome (GCA_000697925.1)


  • Resource Title: Eurytemora affinis (calanoid copepod) annotation.

    File Name: EAFF.tar.gz

    Resource Description: Functional annotation for EAFF-BCM_version_0.5.3 protein set from Eaff_1.0 genome (GCA_000591075.1)


  • Resource Title: Copidosoma floridanum (parasitoid wasp) annotation.

    File Name: CFLO.tar.gz

    Resource Description: Functional annotation for CFLO-BCM_version_0.5.3 protein set based on Cflo_1.0 genome (GCA_000648655.1)


  • Resource Title: Ceratitis capitata (Mediterranean fruit fly) annotation.

    File Name: CCAP.tar.gz

    Resource Description: Functional annotation for Ccap-OGSv1 protein set based on Ccap_1.1 assembly (GCA_000347755.2)


  • Resource Title: Athalia rosae (turnip sawfly) annotation.

    File Name: AROS.tar.gz

    Resource Description: Functional annotation for AROS-BCM_version_0.5.3 protein set based on Aros_1.0 genome (GCA_000344095.1)


  • Resource Title: Agrilus planipennis (emerald ash borer) annotation.

    File Name: APLA.tar.gz

    Resource Description: Functional annotation for APLA-BCM_version_0.5.3 protein set based on Apla_1.0 genome (GCA_000699045.1)

  • Access & Use Information

    Public: This dataset is intended for public access and use. License: Creative Commons CCZero

    Downloads & Resources

    Dates

    Metadata Created Date March 30, 2024
    Metadata Updated Date December 2, 2025

    Metadata Source

    Harvested from USDA JSON

    Additional Metadata

    Resource Type Dataset
    Metadata Created Date March 30, 2024
    Metadata Updated Date December 2, 2025
    Publisher Agricultural Research Service
    Maintainer
    Identifier 10.15482/USDA.ADC/1522860
    Data Last Modified 2025-11-22
    Public Access Level public
    Bureau Code 005:18
    Metadata Context https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
    Schema Version https://project-open-data.cio.gov/v1.1/schema
    Catalog Describedby https://project-open-data.cio.gov/v1.1/schema/catalog.json
    Harvest Object Id 31647e39-ab1e-41f1-b2c6-de3b1a3c73e8
    Harvest Source Id d3fafa34-0cb9-48f1-ab1d-5b5fdc783806
    Harvest Source Title USDA JSON
    License https://creativecommons.org/publicdomain/zero/1.0/
    Old Spatial {"type": "Polygon", "coordinates": -125.33203125, 30.654452824401, -125.33203125, 48.848450835898, -74.35546875, 48.848450835898, -74.35546875, 30.654452824401, -125.33203125, 30.654452824401}
    Program Code 005:040
    Source Datajson Identifier True
    Source Hash 50c35b875c9ef568febb343d570520d66046adf1b7097a6ed1a747f6fabdd8e7
    Source Schema Version 1.1
    Spatial {"type": "Polygon", "coordinates": -125.33203125, 30.654452824401, -125.33203125, 48.848450835898, -74.35546875, 48.848450835898, -74.35546875, 30.654452824401, -125.33203125, 30.654452824401}
    Temporal 2021-07-06/2021-07-06

    Didn't find what you're looking for? Suggest a dataset here.