Once a year, every Googler makes a pilgrimage to MountainView to pay homage to the algorithm. The algorithm has brought joy and happiness to many who have found love or laughed at videos of people falling over or sighed at the cuteness of cats these days. Instagramographers seem to believe that posting a photograph of their lunch is the best way to solve world hunger. Facebookers give a big thumbs-up to sharing every half-formed thought because of the self-evident benefit of making the world more connected, and certainly not for any other, more sinister reason. People who bought this also bought that. So the databases grew.
Ironically, most of the people making a case for big data use anecdotal evidence, or case studies at best, rather than, well, big data. There was once a project that claimed to predict whether you were likely to get the flu or not, allowing you to get yourself vaccinated before it was too late. The ups and downs of Google Flu Trends are well documented and there are other good examples of mission-creep in big data. Road safety and the welfare of long-distance drivers has been greatly improved by the introduction of tachographs, then electronic on-board recorders, and now GPS fleet management. What we had originally was a really beneficial use of data science but what has followed is somewhat less positive, as companies track the locations of staff who do things other than drive for a living and effectively spy on their own employees. That’s not what big data promised.
For commercial organisations, big data is actually about small data. It’s about segmentation based on adding more and more variables to identify smaller and smaller groups until they find the Cell Of One. This is the point at which data science diverges from data analytics: Science is concerned with what is true of most people, with finding essential truths about groups while companies that use data analytics are a bit more gung-ho, a bit less concerned with how things are done, and in a big hurry to sell to you. The boundary between public and private, between group interests and individual interests, is adjudicated in science by ethics committees. People are asked for their information, told how it will be used and stored, and can agree to be involved or politely decline. In companies, however, the priority seems to be to do whatever they can get away with as quickly and secretively as possible, and to apologise later. Science states assumptions, tests hypotheses, and improves by increments, even if it takes time to make progress. Science is also founded on scepticism and criticism, not things that monopolies are especially good at.
Data analytics vs. data science
For states, spying is only one use of personal data. A much more noble and more fruitful use of data science is to better understand how society works. Aggregated data for hospital databases, for example, could eliminate the need for expensive data collection in the field of epidemiology. The purpose is to improve healthcare provision, not health insurance discrimination, not causing fights by telling everyone about their boyfriend’s … history. Google Flu Trends was a battle between fast-moving, all-promising big data and the more cautious scientists in the Centre for Disease Control or some such. Google’s neural networks and big data analytics was quick to over-estimate while data science got a better answer later.
It’s conceivable that people are happy to have information stored and used for research. They’re happy for take part as one of a state-sized group, but not happy to be identifiable. On the other hand, they seem to be beginning to resist being treated as datapoints by salespeople. What’s more, salespeople have a habit of over-selling, and the more devout and evangelical they get about why it’s good to give them your data, the more worried you should be. The promise of big data and data analytics looks like a bit of a sales pitch at the moment. This tells us that, at least, the algorithms need further refinement and, perhaps, that they need to apply a bit more science.
July 13, 2020
Ada Health GmbH
August 08, 2020
Wolfson Institute of Preventive Medicine, Queen Mary University
July 14, 2020
Decision Analysis Services (DAS)
July 28, 2020
Queen’s University Belfast
August 05, 2020
Imperial College London
July 19, 2020
UCL Great Ormond Street Institute of Child Health
July 09, 2020
Independent Monitoring Authority.
July 10, 2020
July 16, 2020
UK Lighthouse Labs Network
Alderley Park, Macclesfield, UK
July 15, 2020