StreamNet compiles two specific levels of data: 1) high level indicators of fish population health; and 2) fish abundance estimates and abundance indexes at local scales. As a first step, before data are shared with StreamNet they undergo rigorous review at the source agency (StreamNet Data-Contributing Partners' QA/QC Procedures). All compiled data then pass a series of rigorous QA/QC data validation steps, include supporting metadata and, where applicable, have links to reference documents at the StreamNet Library.
Data Exchange Standard
Fish abundance and abundance index data, including reference documents, in the StreamNet database must conform to StreamNet’s Data Exchange Standard (DES). That document precisely defines the data elements, their organization in tables, and required formats. The DES serves as the common denominator for the specific data types contained in the database.
Adherence to the DES assures that data can be loaded into the database, can be queried accurately, and are equivalent for further analysis by users. Additions or changes to the DES are made following a formal documented procedure.
Conversion of source data to the DES is the responsibility of the project’s data stewards in the StreamNet partners (agencies and tribes) that supply data to the database. QA procedures are applied by the agency data stewards, and automated validation routines are run when the data are submitted to the central database at PSMFC. All validation failures result in rejection of the data record, and the reasons for rejection are reported. Any errors are conveyed back to the original data collectors, thus creating a feedback mechanism that promotes data quality improvements all the way to the projects that collect the field data.
An essentially identical process is used for the high level indicator data (the "Coordinated Assessments" population-scale high level indicator data).
Automated Data Validation
This is the current list of Data Validation Rules.
We use an automated data validation and loading system that provides real time feedback on the success (or not) of data validation. Data are submitted one record at a time, and multiple validation checks are run on each record at three levels. First, each field has its own set of rules. Examples include ensuring numeric fields do not contain text, ensuring codes fall within the group of allowable values, and ensuring text strings are within acceptable length ranges. The second level of validation ensures that values in the various fields within a data record are compatible. For example, if a record is submitted for spring run coho salmon, it is rejected because there is no spring run of that species. The third level of validation looks for data problems between rows of data within a table. This serves to prevent duplicate data. A useful feature of the automated validation routines is that the data may be run against the validation rules and an error report obtained without actually submitting any data for inclusion in the database. This feature allows data providers to check entire sets of data, fix all errors, and then submit a complete data set after it is known all records will pass validation. The interface used for data submission allows for adding new records, for changing existing records, and for deleting existing records.
Enhancements at the Central Database
We track the date and time records are created and updated. Stored procedures automate the creation of data in various internal tables that enhance the filtering and rapid retrieval of data. When a request is made to delete a record, it is instead moved to a separate archive table rather than being deleted from the central StreamNet database. Thus the central database functions as a backup for all data that have been submitted and can be used to recover data in the unlikely event data are irretrievably lost at the submitting partner. Georeferencing is developed and maintained, allowing the integrated query system to find local-scale data by HUC, NPCC subbasin, or state/county. We maintain a custom set of web services that allow Northwest Power and Conservation Council to retrieve via automation specific sets of detailed time series data and StreamNet Library reference documents for use in their dashboards and other web pages.
The procedures described above ensure the form of the data are correct. It is still possible that values can be entered incorrectly, such as by keystroke errors. We depend on our users to inform us when a suspect value is encountered. In these cases we investigate, and any confirmed error is corrected all the way back to the originating biologist. Corrected data are resubmitted to the central database by the StreamNet partner organization.
A summary of StreamNet use statistics, including responses for assistance, is available here: ISRP StreamNet User Information 4/11/2019