1. Prepare data and metadata
2. Create a new record in IPDS
3. Create a new landing page
4. Finalize metadata
5. Decide how to organize and display data and metadata
6. Upload files and edit the landing page
7. Format citation
8. Final steps
Frequently Asked Questions
- Where can I find information about how to create and/or review a metadata record?
- How can I grant read/write permissions to USGS and non-USGS users while a data release is still in progress?
- What if I need to revise my data after they have been released?
- Will ScienceBase send the XML metadata record(s) from my data release to the USGS Science Data Catalog?
- Why is CSV format recommended instead of Excel?
- What is the file size limit for uploading and downloading files?
- Can I release legacy data in ScienceBase?
- A). My data release is associated with a publication. How will the two reference each other?
B). I don’t have the publication’s citation yet, but I would like to release the data now. Can I add the citation at some point in the future?
- Which repository should I use to release code?
- What repository services does ScienceBase provide for USGS data release products?
- How can I see other data releases from my Science Center or from a particular period of time?
Links to additional information
- The ScienceBase team will check data releases against this checklist before making them public.
- The USGS Fundamental Science Practices (FSP) website contains an FAQ page about data release and a guide to the publishing path options.
- The USGS data management website contains a guide to the steps of data release, with links to tools and resources.
- Before beginning the ScienceBase data release workflow, scientists should view the ScienceBase User Agreement.
- Options to browse and query the public data release products in ScienceBase.
- View and Filter USGS Data Releases using the ScienceBase Data Release Summary Dashboard.
A data release should contain only 1) data and 2) metadata.
- A best practice is to release data in an open, machine-readable format. For example, tablular data in .csv or .txt format is preferrable to Excel.
- Data obtained from published sources do not need to be included - simply document the source and methods in your metadata.
- Proprietary or sensitive data should not be included.
- Metadata should be in XML format
- Metadata should also conform to an FGDC-endorsed metadata standard, FGDC CSDGM* or ISO**.
*Federal Geographic Data Committee Content Standard for Digital Geospatial Metadata
**International Organization for Standardization
- See step 4 below for information on how to finalize metadata for ScienceBase.
- Learn about metadata creation tools.
- Tutorial Video: Structuring and Documenting a USGS Public Data Release can help you decide how many data and metadata files to include in your data release.
USGS Fundamental Science Practices (FSP) guidance states that "a data release is an information product that is non-interpretive and does not include extended descriptions beyond what is required in the full metadata record." Extended text descriptions, figures, maps, and files in PDF format are more appropriate for USGS series publications handled by the USGS Science Publishing Network (SPN).
Data and metadata for a data release should be reviewed and approved according to the USGS Fundamental Science Practices (FSP) process. The USGS uses its Information Product Data System (IPDS) to track the data and metadata review process.
- When you create a new record in IPDS, select "Data Release" in the Product Type dropdown menu.
- New records in IPDS are assigned an IP number.
- Each new data release should correspond to one IP number.
- For more information on data and metadata review, see the review checklists on the USGS data management website.
Data releases often have associated manuscripts that also go through review. In these cases, the review processes are separate. There should be an IPDS record for the data release and another for the manuscript.
You can create a new data release landing page via the ScienceBase Data Release Tool.
- Sign in to the ScienceBase Data Release Tool.
- Follow the form instructions to provide basic information about the data you are releasing.
- When the form is submitted successfully, you will receive an automated email with a link to your new landing page and a reserved Digital Object Identifier (DOI).
Note: if the following steps are not completed by the author, metadata will be finalized automatically by the ScienceBase team at the end of the process.
These instructions are for metadata records in the Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM) format. The USGS metadata creation tools, the Metadata Wizard and the Online Metadata Editor, create metadata in this format.
1) Check the title in your metadata record
- The title element from the metadata record will be prominently displayed in search results of the USGS Science Data Catalog, Department of the Interior's data catalog, and data.gov.
- The title listed in the metadata should be the full title of the dataset it is describing, not the filename.
- Choose a descriptive title for your dataset that incorporates who, what, where, why, and scale. For example, the title could be "(measurement) of (phenomenon) in (geographic feature) at (geographic location) in (time period)".
- All metadata records should have a unique title, even if there are multiple metadata records in the data release.
- Remove non-ASCII characters from metadata title and other special characters such as < > and &. See number 4 below.
2) Add your Digital Object Identifier (DOI)
- USGS policy requires the use of a Digital Object Identifier (DOI) for data releases. Note: if you use the ScienceBase Data Release Tool to start a new data release, you will have the option to reserve a DOI for your landing page.
- Add the full DOI URL (e.g., https://doi.org/10.5066/P9R7L1NS) to the online linkage element (<onlink>) in the citation information section of your metadata (instructions).
- Optional: add the DOI URL to the network resource element (<networkr>) in the distribution section (instructions). Note: some advanced metadata authors use this field for data download links.
3) Add the following distribution information to your metadata
- Distribution liability statement: please select the USGS disclaimer statement(s) that are relevant to your data release (instructions). Disclaimer statements are available on the FSP website.
- Distribution contact information: please add ScienceBase as the distribution contact (instructions).
Contact Organization and/or Contact Person: "U.S. Geological Survey - ScienceBase"
Contact Address: "Denver Federal Center, Building 810, Mail Stop 302" "Denver" "CO" "80225"
Contact Phone: "1-888-275-8747"
Contact Email: "firstname.lastname@example.org"
4) Check for non-ASCII characters in your metadata
- Non-ASCII characters (curly quotes, em dash, en dash) and other special characters (greater than and less than signs, ampersands) can be problematic in downstream applications, such as USGS webpages.
- Please avoid using non-ASCII characters and other special characters in your metadata title and abstract. The ScienceBase data release team will have to modify these characters in order to save your title and abstract to your digital object identifier (DOI).
- Check your content for non-ASCII characters using an ASCII validation tool such as Online ASCII Tools.
ScienceBase data releases can be organized in several ways. Data authors can choose the approach that works best for their product. The optimal organization often depends on the number of data and metadata files.
Note: please upload only one metadata record per page in ScienceBase (it is possible to upload additional records if they are in zipped files). This is because the USGS Science Data Catalog, which harvests metadata records from data releases in ScienceBase, can only pull one metadata record from a page.
► If you have one metadata record to describe your data, upload your files (both data and metadata) directly to the landing page (example).
► If you have multiple metadata records and data sets, you have two options:
- Upload data and metadata directly to the landing page in zipped bundles (example). There should be one metadata record uploaded separately - a summary metadata record that describes the entire data release. The summary metadata record will be the only one harvested by the Science Data Catalog.
- Create subpages that are nested under the landing page (example). Use this option if you would like your data sets to be independently discoverable. Nested pages in ScienceBase are called "child items". To create a new child item, click the "Add" dropdown menu, then select “Add Child Item”. On each child item, upload one metadata record and its associated data file(s).
- Note: a best practice is to also upload a summary metadata record to the landing page to describe the entire product.
- Note: all unzipped metadata records will be harvested by the Science Data Catalog, including those attached to child items (please make sure all child item metadata records have unique and descriptive titles).
- Adding an image to a data release: if you would like to display an image on a ScienceBase page, upload the image in .JPG or .PNG format. The image will be automatically displayed on the page.
- ScienceBase can generate web services for certain geospatial file types: shapefiles, GeoTIFFs and ESRI Service Definition (.SD) files. The web services can be used to serve the data to outside applications and to display the data in the preview map on a ScienceBase page. For more information, see the ScienceBase Geospatial Services page.
Resources: This tutorial video can help you determine the best way to structure and document your data releases.
Note: the current file size limit for uploads in ScienceBase is about 30 GB. If your file sizes exceed 30 GB per file, please contact email@example.com. Also note that there is a 100 file limit for the number of files that can be attached to a single item.
► The most efficient way to populate an empty ScienceBase page is to start by uploading an XML metadata record in an FGDC-endorsed format. Click the "Add" dropdown menu on the upper right side of the page, then select "Attach Files":
When you upload a metadata record, ScienceBase will recognize the format and bring up a popup window to ask if you would like to pull content from the metadata:
Select "Yes" to automatically populate the key fields in the edit form. You may still need to manually edit some of the information. Click "Save" to save your changes.
► To edit your page, click the "Manage Item" dropdown menu on the upper right side of the page, then select "Edit Item":
► To add a child item (subpage nested under the landing page), click the "Add" dropdown menu, then select "Add Child Item":
► If you need to give additional people access to your ScienceBase item while it is private, click the "Manage Item" dropdown menu, then select "Manage Item Permissions". You can then search for ScienceBase user accounts and grant read/write permissions.
To share a private data release with people outside the USGS (e.g., for a journal review), click "Manage Anonymous Access Links" in the "Item Actions" section at the bottom of the page:
You can generate a temporary URL to share with your reviewers, who can view the data release without having to sign up for a ScienceBase account. (Note: the data release will be locked for editing while the link is active).
The data release citation should include the following information:
- Each author (last name, first and middle initials)
- Publication type (U.S. Geological Survey data release)
- Digital Object Identifier URL
► ScienceBase can automatically generate citations from the content of uploaded metadata records, but the citation format usually needs to be modified. Please verify that automatically generated citations have the correct format and author order. The citation field can be edited in the first tab of the edit form.
Note: if a data release has child items, the ScienceBase team will propagate the landing page citation to all child items, so only the landing page citation should be edited.
- Cartwright, J.M., 2015, Hydrologic and soil data collected in limestone cedar glades at Stones River National Battlefield, Tennessee: U.S. Geological Survey data release, https://doi.org/10.5066/F7NV9G9C.
- Coates, P.S., Casazza, M.L., Ricca, M.A., Brussee., B.E., Blomberg, E.J., Gustufson, K.B., Overton, C.T., Davis, D.M., Niell, L.E., Espinosa, S.C., Gardner, S.C., and Delehanty, D.J., 2015, Integrating spatially explicit indices of abundance and habitat quality: an applied example for greater sage-grouse management: U.S. Geological Survey data release, https://doi.org/10.5066/F75D8PW8.
- Additional guidance on data citation (USGS data management website)
- Answers to data citation FAQs (USGS FSP website)
If the citation format is not correct, the ScienceBase team will reformat the citation for the authors before making the data release public.
► When you are ready to make the data release public, please email firstname.lastname@example.org.
- Note: if the data release is associated with a primary publication and you haven't yet provided the publication's DOI or IPDS, please include this information in your email.
- A member of the ScienceBase team will check the data release against the checklist and share any recommendations they have. Please allow up to 2 business days for completion of this step.
- When the data release has been finalized, the ScienceBase team will:
1. Make the data release public. (Public data releases are no longer open for modifications).
2. Register the DOI so it's an active link.
- Once you have been notified that your data are public, you can use the recommended citation on the landing page to cite your data
- If you cite the data in a publication, please send the publication's citation to email@example.com so that it can be added to the landing page.
Frequently Asked Questions
The USGS data management website: https://www.usgs.gov/products/data-and-tools/data-management/describe-metadatadocumentation.
The USGS has two tools for metadata creation: the Metadata Wizard and Online Metadata Editor (OME). In both tools, users fill out a form by answering questions about their data. They can then generate and output XML metadata records in the correct format. The OME is an online application and the Metadata Wizard is a desktop application. The Wizard is recommended for geospatial and tabular data because it has the ability to parse information from certain geospatial and tabular file types, as well as automate the process of describing column (and value) definitions.
The USGS Metadata Parser tool (https://mrdata.usgs.gov/validation/) allows users to validate an XML metadata file against the FGDC CSDGM standard and view it in an easy-to-read format.
How can I grant read/write permissions to USGS and non-USGS users while a data release is still in progress?
- To give permissions to USGS employees and other users with ScienceBase accounts, select the "Manage Item" dropdown menu, then "Manage Item Permissions":
Select "Custom Permissions". Enter a user’s name or email address into the "User" text box. Wait for the autocomplete to find the user's ScienceBase account, then select it and click "Add".
ScienceBase accounts are automatically created for users the first time they log in with their Active Directory credentials. If someone hasn't logged in to ScienceBase before, they won’t yet have an account. Users without Active Directory credentials can request a ScienceBase account if they are collaborating with USGS partners.
If you would like to create a user group in ScienceBase for managing permissions, please contact firstname.lastname@example.org.
- To share a private data release with someone outside the USGS (e.g., for a journal review), click "Manage Anonymous Access Links" in the "Item Actions" section at the bottom of the page:
Select "Create New Anonymous Entry Link". This will create a temporary URL you can share with reviewers, allowing them to view the data release without having to sign up for a ScienceBase account. The data release will be locked for editing while the link is active. To unlock, select "Manage Anonymous Access Links" again and remove the link.
The USGS Fundamental Science Practices (FSP) website describes procedures for documenting revisions to data releases. Please follow this guidance if you need to correct or add to published data. Contact the ScienceBase team at email@example.com when you are ready to update your data release.
Here are examples of revised data releases in ScienceBase:
- Pinzari, C.A. and Bonaccorso, F.J., 2018, Hawaiian Islands Hawaiian Hoary Bat Genetic Sexing 2009-2018 (ver. 3.0, November 2019): U.S. Geological Survey data release, https://doi.org/10.5066/P9R7L1NS.
- Engott, J.A., 2018, Mean annual water-budget components for the Island of Oahu, Hawaii, for current conditions, 2001-10 rainfall and 2001-10 land cover (ver. 2.0, February 2018): U.S. Geological Survey data release, https://doi.org/10.5066/F72F7KH4.
Will ScienceBase send the XML metadata record(s) from my data release to the USGS Science Data Catalog?
Yes, by default ScienceBase will automatically perform this function for authors. Metadata records on the landing page and all child items will be sent to the USGS Science Data Catalog (SDC) after the data release is made public.
Some science centers and programs have alternate methods of submitting metadata records to the SDC and may not wish for their records to be sent from ScienceBase. This option is also supported; ScienceBase keeps a list of these centers, and XML records associated with their data release products will not be sent from ScienceBase. If you would like to add your center to this list, please contact firstname.lastname@example.org.
Comma-separated values format (.csv) is preferable to Microsoft Excel format (.xlsx) because .csv is often more machine-readable and can be more easily incorporated into other workflows. While both .csv and .xlsx are considered open formats (that is, you don't need proprietary software to view them), .xlsx supports features that can make it less machine-readable. For example, if there are multiple worksheets in an Excel workbook or if some of the information is conveyed through formatting, it would be more difficult to use or work with the data in other applications (e.g. Python, R).
Files larger than 1GB should be uploaded using the ScienceBase Cloud Uploader tool available in the "Item Actions" section at the bottom of a ScienceBase page. While performance may still be dependent on users' local internet connections, files up to ~30 GB in size can be uploaded.
Yes, but ScienceBase has a formal process for publicly releasing data, which enables the ScienceBase team to catalog, track, and update these resources in a uniform way. If you would like to release your legacy data in ScienceBase, you will need to go through FSP review and work with the ScienceBase team.
A). My data release is associated with a publication. How will the two reference each other?
B). I don’t have the publication’s citation yet, but I would like to release the data now. Can I add the citation at some point in the future?
A). The citation will be added to the landing page in the "Related External Resources" section (see example). In associated publications, data release citations should be included in the reference section. USGS publications have links to their associated data releases at the top of their landing pages in the USGS Publications Warehouse.
B). Yes, a publication’s citation can be added to a data release at any time, even after it has been made public and the edit permissions have been restricted. If you would like to add a citation to a public data release, please send the citation to email@example.com (or to someone on the ScienceBase team) and we’ll add it to the landing page. If you’ve updated the metadata to include the publication’s citation, please also send the most recent version of the metadata and we’ll replace the metadata in the data release.
The recommended repository for software is USGS GitLab (https://code.usgs.gov), a Git-based platform for software development (additional information). Users can mint a DOI using the USGS DOI Tool to point to the software in GitLab.
If a data release has associated code (e.g., a Python script used to process the data), it can be included as part of the data release in ScienceBase. All code uploaded to ScienceBase must be well-documented.
ScienceBase supports the following services:
- Providing reliable access to public data release items
- Curating landing page content
- Creating multiple backups of data and metadata
- Calculating checksums to ensure file integrity
- Directing inquiries about the data to the point of contact listed for the data release
Science centers / data authors are responsible for the following:
- Answering questions about the data
- Correcting any errors discovered in the data
- Records management and data archival responsibilities for internal Bureau purposes (e.g., Scientific Case Files) according to the USGS Records Program. These responsibilities extend beyond public data access requirements for open data. Contact your local Records Management Contact or the USGS Records Management Program at firstname.lastname@example.org for additional information.
- Performing file format migrations or data transcriptions, if necessary
Check out the ScienceBase Data Release Summary Dashboard to see a breakdown of data releases by Mission Area, Region, and Science Center and to filter by time ranges. This dashboard uses ScienceBase's advanced querying capabilities to generate this information. Learn how to create these queries yourself here.