Lately, I've been reading about the increasing volume of
data generated in the world.
Twenty-five years ago in the early nineties, we were talking about Bytes (8
single binary digits) and Kilobytes (1,024 Bytes) of data and dial-up modems
with speeds measured in a few KB per second. Over the intervening years we've
moved through Megabytes (1,024 KB) and now, Gigabytes (1,024 MB), with
Terabytes (1,024 GB) sometimes used in home computing. The next measurements
are Petabytes (1,024 TB) and Exabytes (1,024 PB).
Not used every day, but already named are the four next steps, Zettabyte,
Yottabyte, Brontobyte and Geopbyte. It's hard for me to imagine volumes of data
that large. One Yottabyte is 10 followed by twenty-four zeros and a Geopbyte
has 10 followed by 30 zeros worth of bytes.
For comparison, the entire digital collection of the Library of Congress (LOC)
(not counting multiple backup copies (continuously updated) stored at other
locations) probably contains on the order of 11 or 12 Petabytes and right now
is growing at over 15 TB a day. Even at that speed the LOC would add only a
little over 5 PB of data a year. At the current rate it would take over 200
years but, of course, the pace is always increasing so it won’t be long before
the LOC reaches the size of one Exabyte.
Incidentally, I didn't mention why Bytes are the basic value when it comes to
computing. As mentioned above, a byte contains 8 bits. A bit can only be off or
on and we see it most often shown as 0 or 1. Since there are eight of these little
guys in a Byte, the total number of the various combinations of values
(01111111, 01111110, 01111101, etc.) is 256, which are enough to represent
standard letters, numbers and symbols.
I first became interested in data when studying teletype (TTY) transmission in
the Army Security Agency in 1959. In those primitive days (even though much
advanced over smoke signals and such) teletype messages were punched onto
narrow paper tape using what might be called a 5-bit system. The tape was wide
enough to allow up to five holes to be punched in a line across the tape.
There are only 32 combinations using 5 holes/no holes (mark or space, digital 1
or 0) so our keyboards had a shift device. In the lower case were letters (all
capitals) and some functions such as space, carriage return and new line. Upper
case used those same keys to represent numbers and symbols and other functions
such as bell. Of course, the SHIFT function, denoting lower (letters) or upper
(numbers) had to be part of the code too so the receiving TTY would know when
to shift appropriately. Interestingly, Japanese TTY keyboards had three levels
of shift because of the greater number of simple characters required to spell
out the complicated kanji (pictograph) characters.
After working with the system for a few months some people got good at reading
the character codes and could read the message straight from the tape instead
of waiting for it to go through the TTY machine and print out on paper.
About all I remember after all these years is the code 2,4
for "R" and 1,3,5 for "Y" which were incorporated into test
tape loops that would be transmitted continuously as a place holder on a radio
frequency. A sample tape might read: "CQ CQ CQ DE KRGN KRGN KRGN QSA IMI
RYRYRYRY" CQ means "Calling unspecified stations," DE "This
is," KRGN transmitting station's call sign, QSA IMI "What is my
signal strength" and RYRYRY is a test since it tests all five mark/space
conditions (2,4 for R and 1,3,5 for Y) on the TTY machines, transmitter and
receiver.
During the middle of the last century the quantity of data began to grow
quickly as handwritten letters and books were supplemented by phonograph
recordings, movies, and wire and tape recordings.
Even where I worked, the paper tapes, magnetic tapes and paper copies of radio
transmissions were saved and shipped back to the United States in ever
increasing volumes.
Sometimes I read about the ever-increasing volumes of data being generated and
saved but I also wonder if any thought has been given to the number
and sophistication of the measuring devices and the frequency of
measurements as they affect data accumulation?
For instance, only one hundred years ago, a ship's log might contain notes for
speed and heading, rudimentary weather observations, disciplinary actions, the
daily plot of location, berthing and sailing times and other important
information, all written by hand in a paper logbook.
Ships today have logs a little more complex and usually more than one.
The official Deck Log might contain all information required by the owner or
the US Navy if a government ship. Navy ships might have all or some of the
following information in their Deck Logs (from the Naval History and Heritage
Command website):
·
Absentees
·
Accidents [material]
·
Accidents/Injuries [personnel]
·
Actions [combat]
·
Appearances of Sea/Atmosphere/Unusual Objects
·
Arrests/Suspensions
·
Arrival/Departure of Commanding Officer
·
Bearings [navigational]
·
Cable/Anchor Chain Strain
·
Collisions/Groundings
·
Courts-Martial/Captain's Masts
·
Deaths
·
Honors/Ceremonies/Visits
·
Incidents at Sea
·
Inspections
·
Meteorological Phenomena
·
Movement Orders
·
Movements [getting underway; course, speed
changes; mooring, anchoring]
·
Passengers
·
Prisoners [crew members captured by hostile
forces]
·
Propulsion Plant Status changes
·
Receipts and Transfers [of Crew Members]
·
Ship's Behavior [under different weather/sea
conditions]
·
Sightings [other ships; landfall; dangers to
navigation]
·
Soundings [depth of water]
·
Speed Changes
·
Tactical Formation
·
Time of Evolutions/Exercises/Other Services
Performed
Depending on the ship's purpose there might be cargo logs
showing loading, unloading and stowage data on cargo with notes about the
periodic mate's inspections, especially of the refrigerated cargo. There might
be an engineer's log concerning the operation, maintenance and fuel consumption
of the engines and their might be a medical officer's log showing data
concerning that area of operations.
Much more data is recorded, and more frequently than ever before. Sometimes I
wonder if we didn't have the capability of automatically recording all that
information whether it would still be required.
Considering the area of weather, Grandpa used to look up once an hour or
so from his work in the fields, feel the wind blowing and say,
"There's a brisk SW wind today and it looks like it might rain,"
generating only a few data points per hour and he seldom recorded his readings.
Now we have forecasters parsing that SW wind into data flows that would be
almost unrecognizable to Grandpa. Velocities and directions broken down by
increasing numbers of locations into hourly (or more frequent) data
spreads; wind shadows, convergence zones, mini-versions of same for many
localities; precipitation at surface, upper level and in between zones,
and quantities and qualities of each; and eddies and swirls both natural
and manmade, measured as frequently as desired, AND recorded and preserved for
how long?
I can envision a future of weather data generation (around a major airport,
say) where directions, velocities and dew points might be sampled every few
seconds for the many variables that might forecast wind shear, icing potential
or other desired (or gov't required) information. There might be sensors
for everything from micro bursts to hourly trends, reported from sensors spaced
every few feet around the perimeter of the runways and even sampling a few
thousand feet up the glide paths. All this division of general observations
into discreet data bits will result in accretion at ever increasing volumes.
And, of course, the same principles apply to everything from data streams
emitted by astronomy, oceanography and microbiology to those of national
economic sampling and forecasting, inter (and intra) national espionage and all
sorts of realms in between.
Can you imagine how many Megabytes of data are generated by one CT scan? There
might be twelve to thirteen MB of data generated for each scan and there are
about 80 million scans done per year just in the US, all resulting in more
collections of data held for years or even decades to keep insurers and lawyers
happy.
And how about the streams of data collected or generated by space satellites as
they increasingly use high-definition photography and all-frequency radio
intercepts.
All this expanded data generation begs the question, "what good is
it?" Will we eventually be able to (and should we) accumulate, and
meaningfully process, enough data to forecast the weather without fail in the
Magnolia area of Seattle? Will anyone ever be able to state (to paraphrase the
old example) that a butterfly flapping its wings a little off kilter in
downtown Yokohama today will forecast the path of a twister through tornado
alley on April 19th? And will the decreased weather-related deaths allowed by
better forecasting be more than offset by the increasing deaths from distracted
drivers checking and texting the latest weather on their phones?
Maybe we should relegate large blocks of this newly expanded data mass to
some sort of "Snapchat" for data? Two minutes after its purpose in
life has passed it will go "poof" and disappear forever, thus saving
a Yottabyte or Geopbyte of storage for more important things?
Looking back over the past seventy-some years, it amazes me how the volume of
data has grown. Of course, the speed of communication has increased at a pace
unforeseen even twenty or thirty years ago. I read recently that at the end of
2016 global internet traffic exceeded 1.1 Zettabyte and that will double by
2019. And that’s just the internet.
Where are we headed? I don’t know. With quantum computing (and other
quantum-related advances), artificial intelligence and other computing developments,
maybe we can postpone blowing ourselves up until the sciences governing
behavior and negotiation catch up and let us solve some of our inter (and
intra) national problems.
I’ll leave the solution of those problems to you younger folks. Meanwhile Kuro
and I will enjoy our walks and smell the roses and other flowers along the way.