Flow Data Flow
Overview
Flow Data Flow will simplify the process of cleaning and sharing time series data, considerably reducing the time between data capture and data analysis. While the project began with a river and wetlands focus, it has been revised more broadly, in response to interest from researchers in other disciplines. The technology delivered will be immediately applicable to most disciplines that collect time series data.

Background
Environmental and agricultural research projects often measure environmental variables such as: river flow, soil moisture, CO2 levels, radiation and greenhouse gas emissions.
Environmental and agricultural research often involves collecting and analysing time series sensor data, for example, tracking and measurement of variables such as river flow, soil moisture, soil and air temperature, radiation, 3D wind speed and directions, CO, water vapour, weather and tree water use data, greenhouse gas emissions from soils, rain gauges etc.
In a given study there are usually many sensors, distributed across the area being studied. Over the course of such a study, sensors are regularly checked and calibrated - resulting in data artefacts. That is, the act of checking or calibrating the sensor affects the data being collected. Other artefacts are caused by, for example, tidal flows leaving sensors temporarily above the water-line. As such, the raw data from a study comprises multiple streams of noisy time-series data. Such noise is typical of time-series sensor data, and a similar story can be told for sensors from the domains of other collaborators in this project.
Currently, data cleaning is carried out using tools such as Excel. Increasing volumes of data present difficulties, as this approach does not scale well. There is a strong desire to make the cleaning and synchronising of the data more streamlined and interactive.
A recent project took 4 years and over $1M to capture approximately 6GB of time series data. In the last 6 months, only approximately 5% of the data has been analysed.
What will this project deliver? Once complete, the Flow Data Flow project will significantly reduce the time taken to clean data.
Project Deliverables
The project has the following deliverables:
- definition of a time-series data format, suitable for storage in a repository;
- software to graph time series data, and allow sections to be selected for removal, (interactive cleaning). Software to display data that is either out-of-sequence or has badly formed date stamps, allowing data to be edited or purged;
- software to simultaneously display graphs of multiple time-series, allowing them to be interactively aligned;
- a repository for collecting and sharing time-series data, annotated so that time-series can be re-used across multiple studies.
Possible future work
This project could be extended with a feature whereby the tools would make a guess at that data which required cleaning, to assist users clean data even faster.
There are many other avenues for extension, relating to:
- automatic feature detection and correction of noise and artefacts;
- direct interface with hardware to automate metadata annotation;
- performing feature detection of data, performing rudimentary scientific analysis.
Benefits
River and wetland management has great social and political currency. Research has the potential to: enhancing biodiversity; reduce erosion; promote river health; measure the effects of irrigation; and secure potable water supplies.
The research community will benefit from the Flow Data Flow toolkit in the following ways:
- the quality of analysis will increase;
- the time between data capture and analysis will decrease;
- the cost of data cleaning will decrease; and
- the capacity to share data will be enhanced, by providing a shared repository.
Researchers from diverse faculties at many institutions have indicated that they will support the project. The project already involves six Intersect members: the University of Sydney, UNSW, Southern Cross University, the University of Newcastle, the University of New England and Macquarie University.
Project sheet available here


