Once a year, every Googler makes a pilgrimage to MountainView to pay homage to the algorithm. The algorithm has brought joy and happiness to many who have found love or laughed at videos of people falling over or sighed at the cuteness of cats these days. Instagramographers seem to believe that posting a photograph of their lunch is the best way to solve world hunger. Facebookers give a big thumbs-up to sharing every half-formed thought because of the self-evident benefit of making the world more connected, and certainly not for any other, more sinister reason. People who bought this also bought that. So the databases grew.
Then people felt slightly uneasy about where all this information was going. Services like Snapchat capitalised on this with their the-message-will-self-destruct-in-ten-seconds technology. We had Edward Snowden. We have Internet Security day, a Data Protection Commissioner in every other country, and friendlier terms of use of the above-mentioned social networks. There are boycotts of online retailers in response to the predatory tactics employed by some and on the basis that spending a little more in local businesses, rather than feeding the database for the sake of ten percent off, might actually be worthwhile. We grew sceptical.
Big anecdotes
Ironically, most of the people making a case for big data use anecdotal evidence, or case studies at best, rather than, well, big data. There was once a project that claimed to predict whether you were likely to get the flu or not, allowing you to get yourself vaccinated before it was too late. The ups and downs of Google Flu Trends are well documented and there are other good examples of mission-creep in big data. Road safety and the welfare of long-distance drivers has been greatly improved by the introduction of tachographs, then electronic on-board recorders, and now GPS fleet management. What we had originally was a really beneficial use of data science but what has followed is somewhat less positive, as companies track the locations of staff who do things other than drive for a living and effectively spy on their own employees. That’s not what big data promised.
For commercial organisations, big data is actually about small data. It’s about segmentation based on adding more and more variables to identify smaller and smaller groups until they find the Cell Of One. This is the point at which data science diverges from data analytics: Science is concerned with what is true of most people, with finding essential truths about groups while companies that use data analytics are a bit more gung-ho, a bit less concerned with how things are done, and in a big hurry to sell to you. The boundary between public and private, between group interests and individual interests, is adjudicated in science by ethics committees. People are asked for their information, told how it will be used and stored, and can agree to be involved or politely decline. In companies, however, the priority seems to be to do whatever they can get away with as quickly and secretively as possible, and to apologise later. Science states assumptions, tests hypotheses, and improves by increments, even if it takes time to make progress. Science is also founded on scepticism and criticism, not things that monopolies are especially good at.
Data analytics vs. data science
For states, spying is only one use of personal data. A much more noble and more fruitful use of data science is to better understand how society works. Aggregated data for hospital databases, for example, could eliminate the need for expensive data collection in the field of epidemiology. The purpose is to improve healthcare provision, not health insurance discrimination, not causing fights by telling everyone about their boyfriend’s … history. Google Flu Trends was a battle between fast-moving, all-promising big data and the more cautious scientists in the Centre for Disease Control or some such. Google’s neural networks and big data analytics was quick to over-estimate while data science got a better answer later.
It’s conceivable that people are happy to have information stored and used for research. They’re happy for take part as one of a state-sized group, but not happy to be identifiable. On the other hand, they seem to be beginning to resist being treated as datapoints by salespeople. What’s more, salespeople have a habit of over-selling, and the more devout and evangelical they get about why it’s good to give them your data, the more worried you should be. The promise of big data and data analytics looks like a bit of a sales pitch at the moment. This tells us that, at least, the algorithms need further refinement and, perhaps, that they need to apply a bit more science.
Metropolitan Ploice
Westminster, London, UK
September 26, 2024
University Hospitals of Derby and Burton NHS Foundation Trust
Derby, UK
October 29, 2024
University of Glasgow
Glasgow, UK
September 11, 2024
University of Glasgow
Glasgow, UK
September 11, 2024
GSK
London, Stevenage, UK
October 09, 2024
State University of New York
NY, USA
September 28, 2024