You are on page 1of 39

RF

Corsello Research Foundation

Information Workflow
Information Lifecycle Example

Overview
RF
Corsello Research Foundation

Stages of Information
All data must pass through several stages in its lifecycle
Creation or Collection Processing or Review (QA/QC) Use and Re-use Disposal

The main stage is use and re-use, which may result in data creation
Analysis results are new data creations Intermediate data may be directly disposed
Corsello Research Foundation

RF

Information Stores
Data is always stored in some place and format A data store indicates the place in which data is stored
A relational database (e.g. Oracle, SQL Server) is a type of store A network share is another type of store

A data format indicates the internal structure or encoding of data within the store
A pdf file is a type of format A table in a database defines its own format
Corsello Research Foundation

RF

Information Formats
Once data is within a specific format, that format will govern how it may be used
Images (e.g. jpg) can be displayed, but data within them is lost (e.g. text) Documents (e.g. pdf) can be read and indexed for searching, but numeric data within them is lost (e.g. tables) for use Databases allow for data to be transformed into other formats as needed
Unless the database contains pre-formatted content (e.g. pdf file in Oracle)

Data format is critical for data exchange and understanding within computer programs

RF

Corsello Research Foundation

Information Flows
Each of the stages of information will tie to a different human workflow

Actual human workflow should be based upon human need


Technical needs to support next stage data use is secondary
There are no easy buttons, every workflow will take some support

RF

Corsello Research Foundation

Work Flows
Each work area or topic (e.g. water quality, fish counts) will require its own work flow governing data management
Several work areas may end up using the same work flow, but that should be circumstantial rather than planned

Each information phase will have a sub-workflow for a given topic


Water quality will have a master workflow with a separate sub-flow for:
Collection QA/QC (includes loading to databases)

Use and analysis (getting data out of databases)


Disposal (mostly rules on how long to keep data in the database)
Corsello Research Foundation

RF

Work Flows

RF

Corsello Research Foundation

Workflows
Introduction
Corsello Research Foundation

RF

Planning
For any new project, planning must occur to determine what is to be collected
For each dataset to be collected, there must be a data standard produced for handling that type of data Data standards should be common across all projects for a given data type Data stores may need to be created to support each data type

RF

Corsello Research Foundation

Planning Phase

RF

Corsello Research Foundation

Standardizing
If data standardization is needed, the process involves several aspects:
Identify existing standards
US Federal / US DoD Industry / International

Identify existing formats


COTS tool formats (e.g. Microsoft Word)
Non-COTS tool formats (e.g. DSS)

Model data Evaluate existing related data

Resulting standards and model becomes the norm for the organization
Should be considered a mostly one-time cost
Corsello Research Foundation

RF

Standardization Phase

RF

Corsello Research Foundation

Creation / Collection
Data gets created in several ways:
Field collection Real-time telemetry (e.g. SCADA) Analysis results Report generation

Each form of data creation may need a workflow Field collection is of primary concern due to two primary factors:
Human involvement and potential for mistake / blunder Time component (data re-collected is time shifted)

RF

Corsello Research Foundation

Creation Phase

RF

Corsello Research Foundation

Processing / QA/QC
Once created, most data must be evaluated for quality, correctness
If data is not acceptable, there must be a rejection capability

Accepted data is processed, transformed and loaded into the final information store(s)
This may be a manual or automated process COTS tools may be ideal for this (e.g. Aquarius for water quality)

Each domain of data will be treated differently

RF

Corsello Research Foundation

Processing Phase

RF

Corsello Research Foundation

Use and Analysis


Final data is used in various ways for simple display and for generating additional value-added data Each form of use that results in the creation of a data product is a data use
Analysis (model runs) Reports (synthesized from human review of data)

Results are then treated as newly created data back in the creation phase

RF

Corsello Research Foundation

Use Phase

RF

Corsello Research Foundation

Use and Reuse Cycle


Output of analysis is input to the creation phase Forms a closed-loop cycle

Relations exist
Source - Output Source - Source

RF

Corsello Research Foundation

Implementation
Implementing a data strategy is an ongoing process These cycles will be developed in concert with the data producers and users Tools will be bought / built as needed to facilitate effective information management There will be several implementation efforts that will span projects
Corsello Research Foundation

RF

Sites
Concepts
Corsello Research Foundation

RF

Overview
All field data is collected at a geographic location
If a given location is well-known and used repeatedly, the management of that location provides value A site is a name that represents a location where sampling may take place
All data collected at a specific site can be related back to the site at which it was collected Querying the site will yield the data collected

RF

Corsello Research Foundation

Location
While sites are intuitively a spatial location, locations do not necessarily need to be stored for the site to be useful
If however, the site location is stored (e.g. GIS point)
Querying by location will yield all sites in that location Query by basin (basin stored spatially), will result in all sites within that basin to be returned

In addition to the spatial nature of the site itself, a site boundary can be stored indicating the uncertainty of collections

RF

Corsello Research Foundation

Site
A site will be defined as a named place where some form of collection or sampling may be performed A site may have a spatial location (GIS shape) associated with it
Support for points, lines (transect) and areas (netting area) A second spatial location is allowed (area only) for sampling approximation

Sampling events are associated with sites


One site will support any number of events Multiple types of events (e.g. water quality) may occur at a single site
Corsello Research Foundation

RF

Sampling Events
Any activity of collecting data is a sampling event
A sampling event that occurs at a defined site may be entered and associated with that site

The organization that performs the sampling is associated with the event (e.g. contractor company)
The project that the sampling is being conducted for (paying) is associated with the event

RF

Corsello Research Foundation

Projects
Any organized work effort may be a project
All formal work projects are projects
Projects can be nested (sub-projects)

There are two classes of project


Project, an official work project Work Effort, finer-grained effort within a project (task, SOW, etc)

Work efforts can be nested as can projects


Work efforts can be under a project or stand-alone

Projects cannot be under work efforts

RF

Corsello Research Foundation

Organizations
An organization is a group of people working toward a common goal
Any named group is an organization

Just a formalization for tracking and grouping

Organizations will be managed to track project teams (external agencies) and personnel alignments

RF

Corsello Research Foundation

Contactable Party
Organizations and people can be contacted, and therefore have contact information (email, phone, address) A contactable party will be defined as any of the below:
A person An organization A point of contact
A job role within an organization which may be filled by a person

Some other external thing that has contact information

RF

Corsello Research Foundation

Point of Contact
A point of contact is a simple abstraction of a job or position Allows for a front-desk type of entity that is intermittently filled by various people Each project has a default point of contact
This allows the actual person filling the role to change more easily

RF

Corsello Research Foundation

Data Catalog
There is a current effort to build a card catalog for data within the district The previous slides provide data elements that will be used in the data catalog and as a mechanism for mining all data across the organization The data catalog will become the inventory of data with links to the actual data cataloged

RF

Corsello Research Foundation

Current Model

RF

Corsello Research Foundation

Development
The data catalog concept is still notional at this time

A data model for each of the items in the previous slide are being developed
Once modeled, these data elements may be collected, even without a tool in place for the data Implementation of the tools will be based upon a prioritization
Need for capability Cost to develop

Time to develop
Dependency on other capability

RF

Corsello Research Foundation

Water Quality
Workflows
Corsello Research Foundation

RF

Overview
Water quality data is commonly collected across many projects Collections are commonly performed by contractors Collections use several types of collection methods
Fixed telemetry

Fixed time series (e.g. continual hydrolab)


Grab series (e.g. one-time hydrolab) Grab instantaneous (e.g. handheld probe)

Data elements collected varies by collection


Temperature, TDG, DO, pH, Color, Depth, etc.

RF

Corsello Research Foundation

Collections
All collections are performed at some form of site
Instantaneous grab samples may not have well-known sites, but are still sampling events

Sampling events may be continuous such as telemetry and fixed time series
Multi-level samplings occur at a single site (sites have no Z axis)

Sampling events may be scheduled


Create the event, then later add the data
Corsello Research Foundation

RF

Workflow
The water quality workflow will incorporate several aspects
Many forms of field collection activities
Many forms of data submission (telemetry) QA/QC processes for evaluation Aquarius tool integrated into data process Multiple database insertions
Aquarius database CWMS database

Others?

A partial flow for field collection activities (non-telemetry) has been developed

RF

Corsello Research Foundation

Flowchart
Currently Notional

RF

Corsello Research Foundation

Questions

What other data areas should be considered?

What other projects should be addressed?


How should external collection activities be managed?

RF

Corsello Research Foundation

You might also like