Unless you’ve been living under a rock for the last few days, you know that our friends over at WikiLeaks have released an enormous amount of data regarding the US/Afghan War.
My initial reaction was to be both amazed at the sheer volume and impressed by the work done to make this information visible in a variety of popular applications. Specifically, the data available is:
- Complete dump of the Afghan War website, HTML format
- All Entries, CSV format
- All Entries, SQL format
- All entries, KML format
- All NATO entries, KML format
- Entries by month, KML format
- Entries with scale filter, KML format
Unfortunately, while the presentation of this data set is impressive, as an IT Security specialist, I would be extremely remiss if I didn’t point out that there is absolutely no vetting or fact-checking of this information. WikiLeaks isn’t providing original documentation, but rather an organized, electronic data version.
In short, we have no idea if any of this data is valid.
Consequently, to prove the point, I’m trying an experiment:
After examining the files, I’ve discovered that the word “cache” appears frequently. I have taken the Google Earth version of the document and via a simple search-and-replace, have replaced all instances of “cache” with “nuclear weapon”.
I’m now making this altered data set freely available to the public. Please download and distribute it as widely as possible.
The experiment is this: I’m curious how long it will take before the WikiLeaks data and my altered data are intermingled to the point that some idiot starts ranting about all the nuclear weapons found in Afghanistan.
It should be made clear that what WikiLeaks has stumbled across is the correct way to report news in the 21st Century: present it as a series of dates and times, corresponding to a location, and load it in Google Earth. Allow the user — not the news organization — to determine the relevancy of the incident to his/her daily life.
Beyond that, it’s necessary to vet and fact-check the news incident and make that data also available so that the user can draw their own conclusions.
In short, what passes for “news” in the 21st Century is little more than heresay from sources that cannot be trusted. The only solution is to simply make the data available to the public and let the individual decide if it’s relevant or accurate.
Unfortunately, what WikiLeaks has done is brilliant from the presentation perspective, but utterly useless as information. The sources appear to have been neither vetted nor fact-checked, therefore absolutely nothing they have presented is inherently valid.
Hence the explicit attempt to demonstrate this by making obviously-altered data available.
Note that no attempt has been made to disguise this file as the WikiLeaks version. File lengths are different, checksum hashes will not match, and the applications used to alter the data set may have left footprints of their use. No attempt was made to conceal the fact that this is a doctored version of the WikiLeaks data set.
This is intentional. A corollary to the point being made by this experiment is that even though the data set is easily distinguished from the WikiLeaks version, it won’t matter.
Here is the link to the altered data:
- All entries, KML format, the word “cache” replaced with “nuclear weapon”
16.4MB download, 169.7MB decompressed
Please download and disseminate as widely as possible. The point is to confuse the two data sets in order to make clear just how easily the original was altered. By extension, this should make clear that the WikiLeaks data set is no more reliable than simple heresay.
Please note: this file, in order to conform to the WikiLeaks release, is compressed in .7z format. You will need a program capable of decompressing .7z files. Such programs are freely available for all Linux distributions using the native package installer.
I’ve not used 7-Zip on OS X, but the code is UNIX-y, so I assume there is an OS X version.
For Windows, I recommend 7-Zip. It’s free, small, fast, and can work with just about any compression format you care to mention. Note that in Windows, the command-line version is significantly faster than the GUI version, particularly with large files.