Monday, March 27, 2017

Data, Data & More Data

Lately, I've been reading about the increasing volume of data generated in the world.

Twenty-five years ago in the early nineties, we were talking about Bytes (8 single binary digits) and Kilobytes (1,024 Bytes) of data and dial-up modems with speeds measured in a few KB per second. Over the intervening years we've moved through Megabytes (1,024 KB) and now, Gigabytes (1,024 MB), with Terabytes (1,024 GB) sometimes used in home computing. The next measurements are Petabytes (1,024 TB) and Exabytes (1,024 PB).

Not used every day, but already named are the four next steps, Zettabyte, Yottabyte, Brontobyte and Geopbyte. It's hard for me to imagine volumes of data that large. One Yottabyte is 10 followed by twenty-four zeros and a Geopbyte has 10 followed by 30 zeros worth of bytes.

For comparison, the entire digital collection of the Library of Congress (LOC) (not counting multiple backup copies (continuously updated) stored at other locations) probably contains on the order of 11 or 12 Petabytes and right now is growing at over 15 TB a day. Even at that speed the LOC would add only a little over 5 PB of data a year. At the current rate it would take over 200 years but, of course, the pace is always increasing so it won’t be long before the LOC reaches the size of one Exabyte.  

Incidentally, I didn't mention why Bytes are the basic value when it comes to computing. As mentioned above, a byte contains 8 bits. A bit can only be off or on and we see it most often shown as 0 or 1. Since there are eight of these little guys in a Byte, the total number of the various combinations of values (01111111, 01111110, 01111101, etc.) is 256, which are enough to represent standard letters, numbers and symbols.

I first became interested in data when studying teletype (TTY) transmission in the Army Security Agency in 1959. In those primitive days (even though much advanced over smoke signals and such) teletype messages were punched onto narrow paper tape using what might be called a 5-bit system. The tape was wide enough to allow up to five holes to be punched in a line across the tape.

There are only 32 combinations using 5 holes/no holes (mark or space, digital 1 or 0) so our keyboards had a shift device. In the lower case were letters (all capitals) and some functions such as space, carriage return and new line. Upper case used those same keys to represent numbers and symbols and other functions such as bell. Of course, the SHIFT function, denoting lower (letters) or upper (numbers) had to be part of the code too so the receiving TTY would know when to shift appropriately. Interestingly, Japanese TTY keyboards had three levels of shift because of the greater number of simple characters required to spell out the complicated kanji (pictograph) characters.

After working with the system for a few months some people got good at reading the character codes and could read the message straight from the tape instead of waiting for it to go through the TTY machine and print out on paper. 

About all I remember after all these years is the code 2,4 for "R" and 1,3,5 for "Y" which were incorporated into test tape loops that would be transmitted continuously as a place holder on a radio frequency. A sample tape might read: "CQ CQ CQ DE KRGN KRGN KRGN QSA IMI RYRYRYRY" CQ means "Calling unspecified stations," DE "This is," KRGN transmitting station's call sign, QSA IMI "What is my signal strength" and RYRYRY is a test since it tests all five mark/space conditions (2,4 for R and 1,3,5 for Y) on the TTY machines, transmitter and receiver.

During the middle of the last century the quantity of data began to grow quickly as handwritten letters and books were supplemented by phonograph recordings, movies, and wire and tape recordings.

Even where I worked, the paper tapes, magnetic tapes and paper copies of radio transmissions were saved and shipped back to the United States in ever increasing volumes. 

Sometimes I read about the ever-increasing volumes of data being generated and saved but I also wonder if any thought has been given to the number and sophistication of the measuring devices and the frequency of measurements as they affect data accumulation?

For instance, only one hundred years ago, a ship's log might contain notes for speed and heading, rudimentary weather observations, disciplinary actions, the daily plot of location, berthing and sailing times and other important information, all written by hand in a paper logbook.

Ships today have logs a little more complex and usually more than one.

The official Deck Log might contain all information required by the owner or the US Navy if a government ship. Navy ships might have all or some of the following information in their Deck Logs (from the Naval History and Heritage Command website):  
·        Absentees
·        Accidents [material]
·        Accidents/Injuries [personnel]
·        Actions [combat]
·        Appearances of Sea/Atmosphere/Unusual Objects
·        Arrests/Suspensions
·        Arrival/Departure of Commanding Officer
·        Bearings [navigational]
·        Cable/Anchor Chain Strain
·        Collisions/Groundings
·        Courts-Martial/Captain's Masts
·        Deaths
·        Honors/Ceremonies/Visits
·        Incidents at Sea
·        Inspections
·        Meteorological Phenomena
·        Movement Orders
·        Movements [getting underway; course, speed changes; mooring, anchoring]
·        Passengers
·        Prisoners [crew members captured by hostile forces]
·        Propulsion Plant Status changes
·        Receipts and Transfers [of Crew Members]
·        Ship's Behavior [under different weather/sea conditions]
·        Sightings [other ships; landfall; dangers to navigation]
·        Soundings [depth of water]
·        Speed Changes
·        Tactical Formation
·        Time of Evolutions/Exercises/Other Services Performed

Depending on the ship's purpose there might be cargo logs showing loading, unloading and stowage data on cargo with notes about the periodic mate's inspections, especially of the refrigerated cargo. There might be an engineer's log concerning the operation, maintenance and fuel consumption of the engines and their might be a medical officer's log showing data concerning that area of operations.

Much more data is recorded, and more frequently than ever before. Sometimes I wonder if we didn't have the capability of automatically recording all that information whether it would still be required.

Considering the area of weather, Grandpa used to look up once an hour or so from his work in the fields, feel the wind blowing and say, "There's a brisk SW wind today and it looks like it might rain," generating only a few data points per hour and he seldom recorded his readings.

Now we have forecasters parsing that SW wind into data flows that would be almost unrecognizable to Grandpa. Velocities and directions broken down by increasing numbers of locations into hourly (or more frequent) data spreads; wind shadows, convergence zones, mini-versions of same for many localities; precipitation at surface, upper level and in between zones, and quantities and qualities of each; and eddies and swirls both natural and manmade, measured as frequently as desired, AND recorded and preserved for how long?

I can envision a future of weather data generation (around a major airport, say) where directions, velocities and dew points might be sampled every few seconds for the many variables that might forecast wind shear, icing potential or other desired (or gov't required) information. There might be sensors for everything from micro bursts to hourly trends, reported from sensors spaced every few feet around the perimeter of the runways and even sampling a few thousand feet up the glide paths. All this division of general observations into discreet data bits will result in accretion at ever increasing volumes.

And, of course, the same principles apply to everything from data streams emitted by astronomy, oceanography and microbiology to those of national economic sampling and forecasting, inter (and intra) national espionage and all sorts of realms in between.

Can you imagine how many Megabytes of data are generated by one CT scan? There might be twelve to thirteen MB of data generated for each scan and there are about 80 million scans done per year just in the US, all resulting in more collections of data held for years or even decades to keep insurers and lawyers happy.

And how about the streams of data collected or generated by space satellites as they increasingly use high-definition photography and all-frequency radio intercepts.

All this expanded data generation begs the question, "what good is it?" Will we eventually be able to (and should we) accumulate, and meaningfully process, enough data to forecast the weather without fail in the Magnolia area of Seattle? Will anyone ever be able to state (to paraphrase the old example) that a butterfly flapping its wings a little off kilter in downtown Yokohama today will forecast the path of a twister through tornado alley on April 19th? And will the decreased weather-related deaths allowed by better forecasting be more than offset by the increasing deaths from distracted drivers checking and texting the latest weather on their phones?

Maybe we should relegate large blocks of this newly expanded data mass to some sort of "Snapchat" for data? Two minutes after its purpose in life has passed it will go "poof" and disappear forever, thus saving a Yottabyte or Geopbyte of storage for more important things?

Looking back over the past seventy-some years, it amazes me how the volume of data has grown. Of course, the speed of communication has increased at a pace unforeseen even twenty or thirty years ago. I read recently that at the end of 2016 global internet traffic exceeded 1.1 Zettabyte and that will double by 2019. And that’s just the internet.

Where are we headed? I don’t know. With quantum computing (and other quantum-related advances), artificial intelligence and other computing developments, maybe we can postpone blowing ourselves up until the sciences governing behavior and negotiation catch up and let us solve some of our inter (and intra) national problems.

I’ll leave the solution of those problems to you younger folks. Meanwhile Kuro and I will enjoy our walks and smell the roses and other flowers along the way.

No comments:

Post a Comment