Welcome Note

Welcome readers, I would like to share my knowledge in the fascinating field of System Integration. The integration of ERP System with Planning & Scheduling, Manufacturing Execution System(MES) and Shop Floor Control sytems.

Wednesday, June 30, 2010

What is the purpose of data compression algorithm in Process Historian

Author - Roger Palmen
IT Consultant - MES at Logica

The advantages are clear: a data historian can process immense quantities of data. Any good historian can process thousands of data-points with sampling rates of up to once per millisecond. When you do the maths, sampling once every millisecond will amount to 31.5 billion samples per year. Use a small size to store takes about 2 bytes, so that will be 58.7Mb per year for a single datapoint. A small server will have 1000 points, but i have seen systems running up to 60000 to 70000. You will spend your days adding disk arrays. And storing is one thing, but how to access these huge amounts of data? For reference: the most-used historian (OSISoft PI Server) requires a 'simple' quadcore, 8GB mem, 2,2Tb Drivespace server to power a 1 million point server capturing 1 year of data. And that server will cost you much less than $10.000 in HW&OS. So that brings us to the disadvantages. As far as i'm concerned there aren't any... Why? You can throw all data away without loosing any of the INFORMATION contained in that data. Let's go to the practical examples. Compression algorithms generally compress on amplitude and frequency of the points that need to be stored. Let's look at the amplitude first. If you have a temperature gage in your process, that measures effectively at a one degree accuracy. That gage could be connected to a system that indicates the temperarature using 3 decimal digits. If we then throw away all differences smaller than 0.5 degrees, we loose a lot of data but no information, right? Same applies to frequency.Let's look at the fuel gage in your car. If you hit the brakes the fuel will slosh through the tank and will make the reading go up and down vividly. But is that really relevant? Looking at the general trend it should go down only a little every minute (except when refuelling ofcourse). So there is no need to capture all the details because you're just interested in the general trend. The theory behind this is the Nyquist–Shannon sampling theorem. Take a look at wikipedia for some details about that. In the real world there are a few rules of thumb that you can use to define the compression settings for each point. Using those you can easily reduce the data volume with 90% or more without loosing any information. To summarize: 1) Any substantional system cannot work well or cost-effective without compression algorithms. 2) When setup right, there are no theoretical drawbacks to using compression algorithms. 3) One exception: if you don't know what is relevant in your data, don't use compression. But then you're looking at research applications where you do not know before what is relevant or not.

1 comment:

  1. Thanks to author for this post it’s very easy to understand about the storage tank (pressure). Storage tanks are used to store hot and cold water, fuel and oil storage etc. Storage Tank- Manufacturer, Supplier and Exporter of Storage tanks in Delhi, Provides various types of storage tanks like pressure vessel, hot water storage tanks, propane storage tanks, plastic water storage tanks, fuel storage tanks and cold water storage tanks.

    ReplyDelete