Innovation and Partnerships Office

Efficient Data Reduction Method with Locally Exchangeable Measures IB-2013-133

APPLICATIONS OF TECHNOLOGY:

  • Measurement collection mechanisms for network communications and routers
  • System logs
  • Statistical analysis, e.g., financial markets, energy use, social network media
  • Modeling, e.g., environmental studies, nuclear fusion simulations
  • Science and engineering experiments

ADVANTAGES:

  • Efficient data size reduction – 47-80% in tests, with much higher potential
  • Retention of data accuracy
  • Effective on streaming or stored (offline) data

ABSTRACT

Berkeley Lab researcher Alexander Sim and colleagues have developed a dynamic sampling algorithm that reduces large streaming data, yet provides accurate information about the data for analysis. The Berkeley Lab technology could prove beneficial to network routers, for use in network monitoring mechanisms; facilities that generate large amounts of data, as a means to reduce data volume; and social networks, among other applications.

Large streaming data are an essential part of computational modeling and network communications. Yet such data are generally intractable to store, compute, search, and retrieve. This dynamic data reduction algorithm detects redundant patterns and reduces data size by exploiting the exchangeability of measurements; it exploits both redundancies of data in a time series and redundancies of data distribution. The Berkeley Lab technology can be used for streaming data in high frequency as well as stored data.

A common technique in network monitoring and other practices to reduce the size of collected monitoring measurements is to store a random sample, such as one out of 1,000 network packets. The drawbacks to this approach are lack of scalability for high frequency streaming data and no guarantee of reflecting underlying data distribution. Another method is to use the exact or approximate data compression technique, such as spectral analysis. However, current data compression methods require use of either whole data or data chunks of a designated size; these methods are impractical for large streaming data in high frequency. Berkeley Lab’s algorithm resolves drawbacks to the above approaches.

DEVELOPMENT STAGE: Proven principle. Data reduction between 47% and 80% demonstrated in experiments. Potential for exponential scale data deductions for streaming data. Development aiming for a prototype is ongoing.

STATUS: Patent pending. Available for licensing or collaborative research.

FOR MORE INFORMATION:
Choi, J., Hu, K., and Sim, A. Relational Dynamic Bayesian Networks with Locally Exchangeable Measures. Computational Research Division, Lawrence Berkeley National Laboratory.

REFERENCE NUMBER: IB-2013-133