A Genealogy of Big Data

"Big Data" is a catch-all term used to denote large, rapidly acquired data sets, the methods used to analyze them and make them productive, the social embeddedness of data-gathering technology, the institutions which desire to exploit the data, and the allegedly novel era in which these items, activities and institutions co-exist. This article asserts that the origin of the so-called era of big data can be traced to the period when statistics was perceived to have been transformed from a descriptive practice into an analytical science. This perspectival shift allowed statistics to morph further, from a means of measurement to a means of production - a productive capacity utilized in attempts to remake society during the early 20th century's eugenics movement, within which the originators of modern statistics were also foundational figures. Eugenics can be seen as a data-driven information science rather than a biological pseudo-science, and echoes of the motivations and rationalisations of the eugenics movement can still be perceived in the discourse around "big data."

The transformation in the capacity and role of statistics is credited to several 19th and 20th century eugenicists -- most notably Francis Galton, Karl Pearson and Ronald A. Fisher -- whose motivation for developing modern statistics was its application to the study of human heredity, and to the new science of eugenics, for which it was thought statistics could provide a rigorous scientific support that biological science could not. 1 In the early 20th century, modern statistical practices and the rhetoric of eugenics found entry points into biological science, education, intelligence testing, medicine, the insurance industry, demography, philanthropy, the military, popular culture and daily life -- including into the American west, in the persons of several foundational figures of Stanford University and Silicon Valley. This statistical advance was accelerated, in some sectors, by the advent of the electro-mechanical punched-card tabulating machines which predated computers.

The common interpretation of eugenics as a biological pseudo-science is rejected, and eugenics is conceived -- following the thesis of Sanem Guvenc-Salgirli 1.1 -- as an interdisciplinary science whose object was social life, in both the analytic and productive senses. Building on that conception, it is noted that eugenics was manifested institutionally as research organizations whose products were taxonomic archives of data. It is claimed that eugenics was specifically an "information science" whose aim was to provide a knowledge infrastructure enabling the reformation and systematic administration of society -- a statistically driven administration to be executed without the political burden of negotiating the claims or oppositions of special interests, and without an a priori de-personalization and universalization of the ground for those claims: an effect which may be read as countering the impersonal "veil of ignorance" proposed in social contract theory with impersonal "spectacles of omniscience."

It is then asserted that after eugenics' racist and ableist elements were allegedly repudiated, eugenics lived on in an ideological cluster in which its primary tropes were reformulated in the language and goals of Silicon Valley, immigration reform, national productivity, education reform, big data and other discursive realms. Finally, the cluster's reconnection with molecular genetics is anticipated, now that the sequencing of the human genome has transformed biological science into an information science dependent on statistics and data.

Hot or Not

"I may here speak of some attempts by myself, made hitherto in too desultory a way, to obtain materials for a Beauty Map of the British Isles. Whenever I have occasion to classify the persons I meet into three classes "good, medium, bad," I use a needle mounted as a pricker, wherewith to prick holes, unseen, in a piece of paper, torn rudely into a cross with a long leg. I use its upper end for "good," the cross arm for "medium," the lower end for "bad." The prick-holes keep distinct and are easily read off at leisure. The object place and date are written on the paper. I used this plan for my beauty data, classifying the girls I passed in streets or elsewhere as attractive, indifferent, or repellent. Of course this was a purely individual estimate, but it was consistent, judging from the conformity of different attempts in the same population. I found London to rank highest for beauty; Aberdeen lowest."

Francis Galton, Memories of My Life (1911), from the chapter on "Race Improvement," recalling his flaneurial expeditions of the 1880's.

Nation states and their precursors collected data - about treasure, trade, taxes, agriculture, productivity, war, population, disease - for centuries before statistics became scientific. Statistics was descriptive, not analytical, limited to aggregation, averaging, summation, and categorization. Although the math that eventually transformed statistical practices had been developing for centuries, the breakthrough didn't come until the 1880's, with the work of Francis Galton, and later, his protege and biographer, Karl Pearson, 1.2 and Pearson's associate, Ronald Fisher. 2 Their work was motivated not by an interest in trade and statecraft, but by their interest in heredity, and ultimately, eugenics, a term which Galton coined. Galton's definition of eugenics -- which we do not have to accept as comprehensive or complete -- was "the study of all agencies under human control which can improve or impair the racial quality of future generations." He believed eugenics should become a "religion," and it became the overriding concern of the last thirty years of his life, and of a good deal of Pearson's and Fisher's careers.

Galton's Statistical Innovation

Galton was a cousin to Charles Darwin. Darwin's work on animal evolution inspired Galton to research human heredity, resulting in Galton's first major work, Hereditary Genius (1869), in which he attempted to account for the alleged genetic transmission of genius through generations of families. Scientists at the time had no direct knowledge of chromosomes or DNA -- the existence of genes was inferred from visual observations of plant and animal breeding experiments. Galton's dissatisfaction with the biological ignorance underlying inheritance led him to focus on a statistical analysis. 3 In Hereditary Genius he used Guassian distributions to data-model the inheritability of physical traits, intellectual achievement and social eminence. He then combined genealogical research with an analysis of the frequency of appearance of various expressions of "genius" within families, comparing the deviations from the average through generations, and concluded that genius was indeed inherited.

Galton's first experiment with statistical analysis had been the creation of weather maps, plotting regions of high and low temperature, and analyzing their distributions. 4 Later, focusing on the frequency and distribution of human traits and behaviors, as he had earlier with temperature and pressure, he pioneered the use of applied probability theory, while inventing the statistical techniques of linear regression and correlation. 5

Fingerprints, photoshops and face books

In the 1870's, Galton began assembling human population and biometric data, and data from genetic experiments on plant and animal life. He established an "anthropometric laboratory" for the physical measurement of bodies and testing of reflexes. He created elaborate genealogical surveys which tracked longevity, health, social status, intelligence and physical strength. Eventually he had physical and genealogical data on thousands of families and schoolchildren. 6 He also became interested in the biometrics of criminals, at first via fingerprinting. He experimented with "composite portraiture" and stereoscopic photography, "photoshopping" full-face photographs of individuals together to produce composites of ideal criminal types, or of class, race and disease types. Galton claimed that he "could not make good composites of lunatics; their features are apt to be so irregular in different ways that it was impossible to blend them." In Galton's estimation, when attractiveness rather than genius was under consideration, the middle of a bell curve was preferable to a belle:

"This face and the qualities it connotes probably gives a clue to the direction in which the stock of the English race might most easily be improved. It is the essential notion of a race that there should be some ideal typical form from which the individuals may deviate in all directions, but about which they chiefly cluster, and towards which their descendants will continue to cluster. The easiest direction in which a race can be improved is towards that central type, because nothing new has to be sought out. It is only necessary to encourage as far as practicable the breed of those who conform most nearly to the central type, and to restrain as far as may be the breed of those who deviate widely from it." 7

In 1883, Galton published his findings in Inquiries into Human Faculty and Its Development. His Faculty revealed the desirable inheritable characteristics his research had uncovered, how to analyze and define those traits, and, ultimately, how their development should be furthered via artificial selection, or eugenics.

Galton's portraiture publications inspired a number of derivative face books. Henry Pickering Bowditch, professor of physiology at Harvard University, and dean of the Harvard Medical School, reported:

"In 1886 and 1887 the subject attracted much attention in this country and in many of the colleges composite photographs of the graduating classes were produced." 8

At Smith College, where eugenics was in the curriculum, composite portraiture was combined with rigorous data collection about student health, diet and genealogy, a project initiated by Smith professor John Tappan Stoddard, whom Bowditch assisted. Bowditch's own composite photographs were displayed posthumously at the Second International Congress of Eugenics.

Pearson the Protege

Probability was the foundation of eugenics, according to Galton, but despite his capacity to innovate in the field of statistical analysis, Galton was not a skilled mathematician. He had to farm out some of the more advanced mathematical problems in his work, 9 and later relied on his protege, Karl Pearson, to build upon his innovations.

"He took up my work on Correlation, vastly extending its theory and adding largely to the data. I had gone no further than to obtain simple results based on the Gaussian law of distribution; he worked out those results with great mathematical skill and elaboration. He also generalised them so as to deal with other laws of distribution than the Gaussian." 10

It is Pearson who is generally credited with being a major founder of mathematical statistics, contributing the Pearson correlation coefficient and many other methods - innovations largely reached through applying statistical analysis to the study of heredity, natural selection and biology.

Institutionalizing Eugenics

By the early 1900's, a "raft of statistical studies" purported to show genetic degeneration in England, and the rhetoric around eugenics took a darker turn, portraying the nation as at an existential risk. 11 Eugenicists also complained about insufficient genealogical data. To fix this problem - and to capitalize on the hysteria - in 1904 Galton funded the creation of Eugenics Record Office at University College. In 1907, under Karl Pearson's directorship, it became The Francis Galton Laboratory for the Study of National Eugenics. 12 In 1910, their American collaborator, Charles Davenport, set up the Eugenics Record Office at Cold Spring Harbor on Long Island, with funding from Rockefeller, Carnegie and Harriman.

The mission of these institutions was to gather social, genealogical, demographic, medical and biometric data. They were research institutions and archives; propagating the eugenic faith was not their role - that fell to the numerous amateur "eugenical societies" whose function, as Leonard Darwin put it, was "to build up a social superstructure on the scientific foundations laid by central organizations engaged in biological and eugenical research." 13 Among their vast collections and output, the Galton Laboratory notably produced an immense multi-volume Treasury of Human Inheritance, which cross-referenced family histories with allegedly inherited conditions, while the ERO published a Trait Book listing a taxonomy of thousands of inheritable traits, and an Index to Germ Plasm for couples considering a eugenical marriage.

The Americans controversially tended to rely more on a Mendelian genetic analysis than a biometric analysis of phenotypes. Pearson disparaged Mendelism, but his "chi-square" statistical tests "remain the most widely used technique for analysing Mendelian data." 14

The Consolation of Correlation

The philosophy of correlation was central to Pearson's conception of science. He believed that correlation, not causation, was the ultimate basis of knowledge, even in physics. 15 Causation was merely more perfect correlation. Variance was present even at the atomic level, no two atoms were alike, and the materiality of biological genes was not relevant to the statistical science of biometrics. In his biography of Galton, he describes his idealist view, attributing its origin to Galton:

The physicist's method of describing phenomena was seen to be only fitting when a high degree of correlation existed. In other words he was assuming for his physical needs a purely theoretical limit - that of perfect correlation. Henceforward the philosophical view of the universe was to be that of a correlated system of variates, approaching but by no means reaching perfect correlation, i.e. absolute causality, even in the group of phenomena termed physical. Biological phenomena in their numerous phases, economic and social, were seen to be only differentiated from the physical by the intensity of their correlations. 16

Pearson's idealism has contemporary correlates in the rhetoric of Silicon Valley. In 2008, the author and entrepreneur Chris Anderson published a short, widely-read article, The End of Theory, which asserted that semantic and causal analysis, social science, and all interpretative models humans use to understand the world are outdated due to the advent of big data:

Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.

But faced with massive data, this approach to science -- hypothesize, model, test -- is becoming obsolete...

There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot...

Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all. 17

Pearson's biographer, Theodore Porter, relates that after Pearson embraced the study of statistics in the 1890s: "Missionary fervor became characteristic of the man... Pearson's statistical movement had aspects of a schismatic sect." 18

Pearson was gravely concerned with birth-rate statistics. He believed that the adoption of child-labor laws and compulsory education had created an economic disincentive for the top segment of the working-class to reproduce. Smaller families meant a larger proportion of the working-class would suffer from tuberculosis and insanity - he believed statistics showed that these conditions afflicted first and second-born children more frequently. His data also showed that the people he considered least fit were reproducing the most, incentivized by charity. Pearson believed that counter-legislation creating economic incentives to reproduction for the fit, and disincentives for the unfit, was needed. Failure to do so would result in economic and military disaster. 19

A Fisher of Data

Pearson's successor at the Galton Laboratory, R.A. Fisher, was the founder of population genetics and is also credited, like Pearson, as being a founder of modern statistics. Fisher synthesized the two competing strains in the study of heredity. He recognized that the "wave theory" of Darwinian gradual evolution, which was studied via statistically correlated phenotypes using Pearson's biometrics, was compatible with the "particle theory" (traits, located in genes, are either turned on or off) of Mendelian genetics - due to the very diffuse multiple-gene locations of many inheritable traits. 20 Fisher also innovated statistical procedures for the measurement of significance and variation. A long-time member of the Eugenics Society, Fisher shared Pearson's concern over birth rates in the middle and upper classes, proposing the political solution of welfare for the well-to-do, by which the state would fund child-bearing for the well-off: the more your net-worth, the more you get per child. Fisher is also notable for not having renounced eugenics after the Second World War. 21

Strange Bedfellows

Consequent to the existential hysteria over race degeneration, and the institutional positioning of eugenic research in multiple countries, eugenics successfully, albeit not without resistance, infiltrated mainstream science, higher education, philanthropy, business and popular culture &ndash world-wide, with degrees of support from every segment of the political spectrum. Wealthy benefactors bankrolled eugenic research. Eager students went out and did field work. Worried or enthusiastic families filled out surveys. Professors wrote books and taught classes. Anarchists like Emma Goldman found common cause with reactionaries like Alexander Graham Bell. Universities added curricula. Better-baby and fitter-family contests toured county fairs. States passed sterilization laws. If the rhetoric around eugenics had seemed fraught before the first World War, it was worse afterward. By the 1930's, the stage was set for a drawn-out, catastrophic act, even as the "eugenics movement" began to fracture and diffuse.

 Fixing a Hole

"The science of statistics is the chief instrumentality through which the progress of civilization is now measured, and by which its development hereafter will be largely controlled...

"...all the sciences are dependent in ever increasing degree upon the science of statistics. All of them now recognize it as the key, the "open sesame" to further progress...

"We in the United States need all the aid we can get from statistics, need it more than any other people, for more than any other we are living in the period of change. Transformation in many of our methods is in progress...

"The people are engaged in a fierce struggle to regain from the politician the control and management of the government, national, state and local...

"Within this period statistics have become the foundation of modern government, both in the administrative and the legislative branches. There is no phase of the relations of government to people which legislation does not now enter; no problem with which it does not seek to deal; no innovation from which it shrinks; no condition for which it has not a panacea; It is only by the statistical search-light that we can determine the effects and defects of all this mass of new legislation covering so many strange and untried fields.

Thus the life of the [American Statistical] Association covers the development of statistics into an exact science, its application to all fields of human activity, its utilization as the standard for the measurement of human progress, and its acceptance as the test of the trend and the tendencies of that progress.

"Within a short period of time, this new science of statistics has been so effectively organized as to afford a surer horoscope of the future than any agency that has heretofore existed...

"In 1890 came another innovation in census work which, in its immediate results and its ultimate possibilities, may be described as epoch-making in statistics -- the introduction of automatic tabulation ...

"I cannot detain the reader with a statement of the correlation of the data of individual elements of the population, in combination with other data beyond the reach of hand tabulation, which this invention opened up. The sociological value of the minuter statistical presentation of demographic data thus brought within reach, is not yet fully understood and only partially realized. Without it we could never hope to lay bare all the truth we must have, if we are to cope successfully with the problems growing out of the heterogeneous commingling of races which our defective immigration laws are forcing upon us...

"Closely connected with vital statistics, is the statistical study of the defective, delinquent and criminal classes... It is a study as essential as that of vital statistics to the well being of the human race. We have recently come to realize its importance, in the awakening of scientists to the fact that there is a science called eugenics, and the relationship which this science bears to human progress and sociological advance. The need for restraining the genetically deficient classes and families from the function of reproduction, is recognized as imperative... The work of the Eugenics Record Office at Cold Spring Harbor, on Long Island, organized by Dr. C. B. Davenport for research in human heredity and its application to human affairs, is making gratifying progress, and finds the statistical method its most effective instrumentality...."

"I beg you to notice that in what I have enumerated in tedious detail of the applications of statistics in modern life, I have only grazed the surface. Statistics create an endless procession of moving photographs of the work and civilization of today.

"These statistics are compiled because men use them and cannot intelligently conduct their business without them... They are the basis of the new rule of Publicity now acknowledged to be the best safeguard of both private and public interests. They are the basis of the new science of Efficiency which is working a revolution in industrial methods. They are the only check that exists for the restraint of speculation and the emancipation of the many from the iron domination of the few. They are not always sufficient to accomplish that, but they do place in the possession of all information which formerly did not exist or was confined to the few...

"We have seen that the governing laws of the social body can only be discovered by the accumulation of statistical facts ... It is the dream of the true statistician that the day will some time arrive when the facts of demography will be available on identical bases for the entire globe. When that dream is realized, when comparable international statistics actually and everywhere exist, then we shall know the laws which determine human progress and can effectively apply them...

""Some of us have faith to believe that the day of universal justice is coming to the world, that it draws yearly nearer, and that in the end it will make international wars impossible. We recognize no agency more effective to this end than the statistical method, through which alone we can gain complete knowledge of ourselves and of other peoples, and measure the relative progress of each and of all.

"Thus the science of Statistics in the large sense is the greatest of all the sciences, for beyond all others it becomes the international bond of union. Behold therefore within the life-time of the Association, through this young science of ours the whole world is akin!"

from Seventy Five Years Of Progress In Statistics: The Outlook For The Future, by Simon Newton Dexter North, Assistant Secretary and Statistician of the Carnegie Endowment for International Peace, on the occasion of the 75th anniversary of the American Statistical Association, February 13, 1914. 22

Adventures in Tabulation

Simon Newton Dexter North, the first director (1902) of the US Census, and one-time president of the American Statistical Association, held a utopian view of the power of statistical data analysis that rivals any 21st century advocate of open data and open government. Statistics were the bedrock of governance and the pathway to world peace - once certain "problems" were solved: the early census had special counts for the blind, deaf, poor, insane, criminal and feeble-minded populations, a specialty of obvious import to the eugenic movement; legislative attempts to create additional categories to track other groupings were made in the 1920s.

North also presided over the full extension and improvement of electro-mechanical data processing in the US census system. In 1890, tabulating machines using the Hollerith punched-card had been introduced - the punch card which later evolved into the storage medium of the first computers.

Data-crunchers at the Galton Lab and the ERO recognized the relevance of electric punched-card systems for their project - their "computers" tended to be human (and women) - but they appear to have largely failed to implement their use, possibly due to the cost. The Galton Lab had some less-elaborate calculating machines, and the ERO used Hollerith machines for at least one study, but the full potential of punched-card enabled eugenics would be reached outside of the Anglo-American community, later in the 1930s and 1940s.

In his biography of Galton, Pearson relates that Galton himself invented a mechanical punched-card sorting machine in the 1880's:

The first difficulty, however, of the border-line cases, which involve such a large proportion of the population and therefore the multiplication of cards, in several groups, Galton got over by what he termed a "mechanical selector." I have not found any 'selector' described before 1888, but many since, all involving Galton's principle, some patented, without any recognition of Galton's priority. The idea is indeed a very simple one; each individual has a card 8 to 9 inches long... The cards are placed vertically and loosely in a box divided into batches by partitions so that there is not sufficient friction to interfere with their independent motion. The bottom of the box... is replaced by a "keyboard" as Galton termed it; this keyboard is of the breadth of the variate portion of the cards, and can be elevated by a lever... When the keyboard is elevated its wires pass into the notches of those cards which are within possible errors of the individual set on the keyboard-all the other cards but these are raised and thus discriminated from those which require examination... Galton considered that this mechanical selector of which he gives ample drawings could deal with 500 cards at a time. 23

It's worth noting, too, that the final version of Galton's pocket registrator, which he used to create his Beauty Map, appears to have been a fairly sophisticated mechanical "disk drive" built by a manufacturer of surgical instruments, with five rotating paper disks (or dials as Pearson called them) and needles which acted as styli.

Of Policies and Hole Punchers

The earliest intersection of statistics, eugenics and industrial-strength electric punched-card tabulators occurred in the insurance industry. Pearson's research was "regularly drawn on" by insurance companies, and research by insurance companies was valued by eugenic researchers. 24 Galton had sought to obtain data from insurance offices as early as 1905. New York Life provided the American Genetic Association's eugenics committee with two million mortality records in 1914. 25 Life insurance company representatives attended and exhibited at the International Congress of Eugenics.

Insurance companies were also private industry's earliest adopters of punched-card tabulation. In 1914, Metropolitan Life Insurance reported a large volume of punched-card use:

Total handlings of cards and bonus vouchers: 362,742,111

A new system of accounting by means of Pierce perforated cards is being installed. These cards have spaces for punching fifty different sets of holes, and various combinations of these holes correspond to numbers which represent the data required. Such data as premiums and ages appear in numerical form, but code numbers have been given to each state, kind of policy and other data which do not naturally appear in numerical form...

By means of this machine, classifications are made of the total business issued, canceled and revived, according to years of issue, kinds of policies and ages, and also by states and districts, by occupation, by nationality, by family history, by personal history and by numerous other subdivisions. Much more work may be completed in a shorter time than hitherto, as the sorting machine handles 150 to 200 cards a minute and the tabulating machine about 80 a minute. It is believed that these are the most complete machines for complex statistical purposes that have as yet been installed in any office. 26

Quantified Health

A "Life Extension Institute" was formed in 1913, with the collaboration of Metropolitan Life Insurance Company, doctors, academics, and leading eugenicists - such as Charles Davenport and Alexander Graham Bell from the ERO, David Starr Jordan from Stanford, J.H. Kellogg from Battle Creek Sanitarium, and Yale economist Irving Fisher, among many others. 27

LEI's official publication, How to Live - a lifestyle guide for the purpose of preventative healthcare - bridged the hygiene movement with the eugenics movement. The book delineated the everyday practices which promoted health and protected germ plasm.

"We should never forget that this germ plasm, which we receive and transmit, really belongs, not to us, but to the race; and that we have no right, through alcoholic or other unhygienic practises, to damage it; but that, on the contrary, we are under the most solemn obligation to keep it up to the highest level within our power. We are the trustees of the racial germ plasm that we carry."

"...For us of the present generation, hygiene is of immediate concern; but if we are to build for future generations, hygiene must give way to, or grow into, eugenics. The accomplishment of a true eugenic program will be the  crowning work of the health movement and the grandest service of science to the human race."

An apt subtitle would have been "The Eugenics of Everyday Life." A separate term in use at the time, "euthenics," denoted eugenical care of the body and domestic environment. It's tempting, but perhaps over-reaching, to detect in "Life Extension" and euthenics a forebear of the "quantified self" movement, in which the technology has only advanced to the level of periodic urinalysis, calorie counting, nutrition, cookery, hygiene, habit-modification and manipulation of the domestic living environment. Researching one's own genealogy also has aspects of self-tracking when done in the service of eugenics and assessing one's genetic fitness.

The Book of Likes and the Life History Book

One of Galton's projects in the 1880's was a Life-History Book. In 1912, physician Havelock Ellis, in The Task of Social Hygiene, related an implementation plan for it dreamed up by Henry Hamill, in which he compares the Life-History Book to the 19th century fad of "Books of Likes and Dislikes" :

As young people circulate their "Books of Likes and Dislikes," etc., and thus in an entertaining way provide each other with insight into mutual character, so the Life History need not be an arcanum - at least where people have nothing to be ashamed of. It would be a very trying ordeal no doubt to admit even intimate friends to this confidence. But as eugenics spread, concealment of taint will become almost impracticable and the facts may as well be confessed...

As the facts of individual evolution would be noted, so likewise would those of dissolution. The first signs of decay -- the teeth, the elasticity of body and mind -- would provide a valuable sphere for all who are disposed to the diary habit. The journals of individuals with a gift for introspection would furnish valuable material for psychologists in the future... The book might have several volumes, and that for the periods of infancy and childhood might need to be less private than the one for puberty... lovers might communicate their life histories to each other as a preliminary... Not everyone may agree with this conception of the Life History Album and its uses. Some will prefer a severely dry and bald record of measurements... The important point is to realize that, in some form or another, a record of this kind from birth or earlier is practicable, and constitutes a record which is highly desirable alike on personal, social, and scientific grounds.

The extent to which Eugenics Societies could also have comprised a social movement of self-trackers is strongest where practicing physicians and hygienists, like John Harvey Kellogg and LEI's Eugene Fisk, participated; but overall, the standard narrative of eugenics emphasizes the "Quantified Other."

Welcome to Your Disease

Metropolitan Life agreed to allow LEI to conduct periodic medical examinations of all policy-holders. LEI reported in 1914 that 100,000 people had been given medical examinations, a number that eventually reached 1.5 million. 28 The data from the exams was made available to the insurance companies for analysis, with the assurance that it would not be used to deny individuals coverage, but examinations were used by employers to deny employment:

The Guaranty Trust Company of New York, for instance, requires all applicants to submit to a physical test prepared by the Life Extension Institute. It establishes a complete health history of the person and enables the company to know whether or not the applicant is physically fit for his work and, what is even more important, to know whether or not the new employe may be a menace to the health of others. All bodily conditions are measured according to set standards and graded "A," "B," "C," etc. None below the "C" stage is admitted to employment. 29

Incredibly, Eugene Fisk, LEI's Director of Hygiene, reported that the examinations revealed that the majority of workers suffered from impairment requiring surgery:

"In the examination of a large group of supposedly healthy persons busy at their work, the Life Extension Institute found practically all showing some form of impairment and more than 50 per cent, in need of surgical attention." 30

This absurd prevalence of alleged medical impairment has a contemporary equivalent. The latest genetic screening tests, according to a 2010 paper, A universal carrier test for the long tail of Mendelian disease, show that 35% of all people carry a lethal recessive gene. That figure is considered low, since the estimate is that all people carry four or five lethal alleles, and "more than 90% of the Mendelian disease burden remains to be accounted for" 31 - which is almost certainly true, and is also insignificant most of the time. The paper presents an argument in favor of universal pre-conception testing for rarely seen "long tail" Mendelian diseases on the basis of a cost-benefit matrix, and also on the basis of a particular economic demand it will create: "Just as the personal computer increased the demand for computer scientists and promoted computer literacy, so too will the personal genome increase the demand for medical geneticists and promote genetic literacy." 32

Not Quite Google Health

It appears that if LEI had succeeded in maintaining the participation of the insurers and other stakeholders, it could have effectively created the first "universal" health record database - an early attempt to create a permanent and putatively predictive health care system for individuals, with participation from industry, insurance, the state and medicine (with 9,500 doctors in its network 33 ). Competition among various inventors of tabulating devices, largely backed by different insurance companies, may have been one factor that kept a universal database from coming into existence. LEI was also enjoined from practicing medicine by New York State in 1935. 34 In the 1970's, it was acquired by the technology company, Control Data Corporation, whereafter it seems to have developed a specialty treating celebrity athletes for addiction problems.

The motives and aims of insurance industry actuaries and eugenics researchers overlapped, but there was an important distinction: the actuaries sought two things: to identify risk factors in order to deny policies to reduce costs, and to understand risk factors in order to assuage them, thus prolonging life to reduce costs; the eugenicists had little interest in prolonging the lives of the "unfit."

"An Orgy of Tabulation"

Programme of the Eugenics Society of the U.S.A

"Important steps in the field of eugenic education should be taken immediately. A wide-spread and profound interest must be stimulated in the recognition and analysis of the biological factors in civilization...

"The first step is to "teach the teachers," to digest existing knowledge of eugenics and get it into the hands of teachers, preachers, lecturers, etc., in a suitable form for retailing to the public...

"Care should be taken to square eugenics with rational democratic ideals, by exposing false claims of class superiority and espousing equal opportunity to demonstrate intrinsic merit. The burden and menace to the average man and woman, which come from neighbours who are constitutionally feeble-minded, criminal, insane, vicious, or incapable, should be emphasized...

"In promoting such systematic education there is need to plan and publish an outline of education, together with a list of suitable textbooks and teachers' manuals with notes on supplementary reading and practical laboratory work on one's own pedigree and other eugenic problems.

"Because mental tests are being developed and standardized to a very accurate and practical degree, it is highly desirable that both general and specialized tests for mental ability be widely used by schools, institutions and societies, and that records of tests be preserved so that the mental development of the particular child may be traced from year to year.
35

Education as Social Pruning

Concurrent with its colonization of the health and insurance systems, eugenics infiltrated the education system. Many eugenics advocates did not believe children of the unfit could be improved via education, thus the task of the education-minded eugenics advocate was manifold:

The U.S. school reform movement of the 1910's and 1920's was marked by hysteria over immigration, race-mixture, the movement of poor workers and their families from farms to cities, and fear of an industrial labor revolt - which proper education would preclude. Intelligence testing was an innovative tool to counteract the anxieties produced by these trends.

Inventing Intelligence

An intelligence test developed by the French psychologist, Alfred Binet, was revised by eugenicists, primarily Henry Herbert Goddard, Charles Spearman and Lewis Terman. Their work formed the foundation of the science of Psychometrics, whose origins can also be traced back to Galton and Pearson: Galton pioneered studies of reaction time and visual acuity; Pearson refined the statistical correlational techniques indispensable to the development of the tests. 36

In 1904, Charles Spearman, an innovative statistician, psychologist, and eventual colleague of Pearson at University College, used correlation techniques applied to data variables from a battery of tests to derive the concept of "general ability" or "g" - a hypothetical variable representing intelligence. 37

"every intellectual performance may be regarded as proceeding from two distinct factors: on the one hand the specific ability or disposition for that particular performance, and on the other general ability due to the common fund of intellective energy." 38

Spearman had a social vision for testing based on this theory:

"This determination is becoming so easy that it might well be carried out regularly. It seems even possible to anticipate the day when there will be yearly official registration of the "intellective index," as we will call it, of every child throughout the kingdom.

"This registration controlled and digested by an expert bureau could scarcely fail to shed a flood of light on many vital problems. We should learn how the intellective index depends on age, sex, nutrition, physical exercise, fatigue, climate, formal training, etc. Even the influence of heredity would become much more accessible to study...

In course of time there seems no reason why the intellective index or system of indices should not become so well understood as to enable every child's education to be properly graded according to his or her capacity. Thus the present difficulties of picking out the abler children for more advanced education, and the 'mentally defective' children for less advanced would vanish in the solution of the more general problem of adapting education to all.

Still wider, though doubtless dimmer, are the vistas opened up as to the possible consequences in adult life. It seems not altogether chimaeric to look forward to the time when citizens instead of choosing their career at almost blind hazard will undertake just the professions really suited to their capacities. One can even conceive the establishment of a minimum index to qualify for parliamentary vote and above all for the right to have offspring." 39

Putting theory into practice, in 1905 Henry Herbert Goddard translated the Binet intelligence test into English, and the revised Binet test became the basis for the application of Spearman's principles. Goddard was the director of research at "The Vineland Training School for Feeble-Minded Girls and Boys" in Vineland, New Jersey. Goddard is known for innovating the testable category "moron" and for his influential work of eugenic analysis, The Kallikak Family: A Study in the Heredity of Feeble-Mindedness, which traced the genealogical origin of hundreds of members of a "defective" New Jersey family back to a revolutionary war-hero father.

Building a Better Gene Pool at Stanford

Building on Goddard's work, Lewis Terman, of Stanford University, developed the Stanford-Binet intelligence test, whose latest revision is still in use today.

Terman, who coined the term "I.Q.," saw the test as a eugenic tool:

"Thus far intelligence tests have found their chief application in the identification and grading of the feeble-minded. Their value for this purpose is twofold. In the first place, it is necessary to ascertain the degree of defect before it is possible to decide intelligently upon either the content or the method of instruction suited to the training of the backward child. In the second place, intelligence tests are rapidly extending our conception of "feeble-mindedness" to include milder degrees of defect than have generally been associated with this term. The earlier methods of diagnosis caused a majority of the higher grade defectives to be overlooked. Previous to the development of psychological methods the lowgrade moron was about as high a type of defective as most physicians or even psychologists were able to identify as feeble-minded...

"It is safe to predict that in the near future intelligence tests will bring tens of thousands of these high-grade defectives under the surveillance and protection of society. This will ultimately result in curtailing the reproduction of feeble-mindedness and in the elimination of an enormous amount of crime, pauperism, and industrial inefficiency."
40

And for good measure:

"...not all criminals are feeble-minded, but all feeble-minded are at least potential criminals. That every feeble-minded woman is a potential prostitute would hardly be disputed by any one."

Terman believed that, due to defective defective-detection, gifted children were at risk of being lost in the crowd, consigned to "lifelong habits of sub-maximum efficiency." The factors affecting intelligence could be "sifted, weighed and measured." Given enough time and enough testing, the testing regime would be honed to greater perfection, removing ambiguity from variant achievement levels, and settling the issue of nature vs. nurture in the realm of intelligence. Terman's manipulation of test results using statistical correlation methods reinforced his belief that the superior performance of children from higher social classes was due to the "original endowment" of the children, not to environmental or class factors.

OK Darwin

With Stanford President and fellow American Eugenic Society member David Starr Jordan, and Stanford-educated eugnicist Paul Popenoe, Terman helped found the "Human Betterment Foundation," an organization whose mission was to investigate the effects of California's eugenic policies and laws, and to coordinate and disseminate research from leading biological laboratories to schools, social workers and physicians. 41 The foundation's report, Sterilization for Human Betterment, authored by Popenoe, inspired Nazi eugenic policies after 1933. 42 Terman and his students also had consulting roles for Popenoe's American Institute for Family Relations, a eugenical marriage counseling organization, which used Terman's psychological tests for matchmaking purposes. 43 A modern entrepreneur would perhaps have called it "OK Darwin."

Terman's son, Frederick, became Provost of Stanford, and is widely credited, along with Stanford professor and eugenics advocate William Shockley, as a father of "Silicon Valley" - that locus of promiscuous, fecund and borderless interplay between academia, government, the Defense Department and private industry. In California, even the companies have family trees:

To this day, a poster of the Fairchild family tree, showing the corporate genealogy of the scores of Fairchild spin offs, hangs on the walls of many silicon valley firms. This picture has come to symbolize the complex mix of social solidarity and individualistic competition that emerged in the Valley. The tree traces the common ancestry of the region's semiconductor industry and reminds engineers of the personal ties that enabled people, technology and money to recombine rapidly into new ventures. The importance of these overlapping, quasi-familial ties is reflected in continuing references, more than three decades later, to the 'fathers' (or 'grandfathers') of Silicon Valley and their offspring, the "Fairchildren." 44

In the quasi-families of Silicon Valley, government contracts and academic nurture precede nature.

The Quantified Curriculum

The educators who are known as the "fathers of the curriculum" of the U.S. public school system were eugenicists: "genetic psychologist" G. Stanley Hall, proto-behaviorist and educational psychologist Edward Thorndike, and the author of the foundational text, The Curriculum (1918), John Franklin Bobbitt. 45

In Practical Eugenics (1909), Bobbitt stated:

At present our doctrines of heredity are not as they were. We are coming to see that heredity is dominant in the characters of men. Human plasticity is not so great as has been assumed. A child cannot be moulded to our will. The design laid in heredity is the only one that can be worked out in actuality. The actual is only a realized copy of the potential. It is true the potential is drawn in rather broad lines thus permitting the necessary degree of adaptation; to this extent the individual is plastic. But recent statistics of heredity show that the possible deviation is not great, except downward in the direction of breaking and marring. The actual may fall far below the possible but cannot transcend it. If the parentage of the coming generation in our country is on an average poorer than our present average, then the average endowment of the coming infant harvest will be below that of the present. To educators and philanthropists this means a poorer raw material on which to work and an increase of the educational difficulties which are at present sufficiently bewildering.

Although he endorsed marriage laws, forced sterilization and segregation of the unfit, Bobbitt did not think it practical or wise to deny them charity or education, which would be a step backwards. In The Curriculum, he developed an educational practice that focused heavily on measuring and assessing individual abilities, with the aim of steering students into their proper slot in the social and employment hierarchies:

An age of science is demanding exactness and particularity.

The technique of scientific method is at present being developed for every important aspect of education. Experimental laboratories and schools are discovering accurate methods of measuring and evaluating different types of educational processes. Bureaus of educational measurement are discovering scientific methods of analyzing results, of diagnosing specific situations, and of prescribing remedies. Scientific method is being applied to the fields of budget-making, child-accounting, systems of grading and promotion, etc.

The central theory is simple. Human life however varied consists in the performance of specific activities. Education that prepares for life is one that prepares definitely and adequately for these specific activities. However numerous and diverse they may be for any social class, they can be discovered. This requires only that one go out into the world of affairs and discover the particulars of which these affairs consist. These will show the abilities, attitudes, habits, appreciations and forms of knowledge that men need. These will be the objectives of the curriculum. They will be numerous definite and particularized. The curriculum will then be that series of experiences which children and youth must have by way of attaining those objectives.

Bobbitt's vision maps the intersection of eugenics, scientific management, the social efficiency movement and the "big data" of his time.

A Little to Disquantity Your Training

In 1917, educator Harold Rugg published Statistical Methods Applied to Education: A Textbook for Students of Education in the Quantitative Study of School Problems. It was a Pearson-influenced overview not merely of intelligence testing, but of quantitative methods applied to every aspect of education practice and administration. Rugg, who had also participated in the development of intelligence testing with Terman and Goddard for the US army, but who was no eugenicist, had a change of heart in 1918: he set course away from statistics and went on to innovate the "social studies" curriculum. 46 Looking back from the perspective of the 1940's, he wrote:

"We lived in one long orgy of tabulation. Mountains of facts were piled up, condensed, summarized and interpreted by the new quantitative technique. The air was full of normal curves, standard deviations, coefficients of correlation, regression equations. I was only one of a very large band of intellectuals, outside the universities as well as inside, who with Lippmann and company were proclaiming salvation through fact finding. The decade just passed had witnessed the establishment of one great foundation after another and the setting aside of hundreds of millions of dollars, the income from which was to be devoted to finding the facts of life... A hundred corporations were setting up research divisions. Half as many universities had long since set trained minds at finding the facts of agriculture, industry, health, education, what not, and five times that number of school systems had already introduced departments of research. The first third of the twentieth century was indeed a great fact finding era in American intellectual life ." 47

In this context, it becomes apparent that contemporary education reform initiatives in the United States, with their emphasis on testing, standardization, and data-gathering, represent a return to past practices, rather than a break from them. The new element is the scale, detail, speed and duration: of the quantification, of the classification, of the analysis, and of the surveillance. A close look at the software, taxonomic schemata and data infrastructure being developed by the Ed-Fi Alliance and its associated vendors would reveal the birth of an immense monitoring system that appears destined to be part of a lifelong engagement with future students even after they leave the school system.

Previsioning the Social

"Scientific management demands prevision - accurate prevision. It demands understanding that sees all factors in true and balanced relation without any distortion due to claims or oppositions of special interests. This means that scientific survey and analysis of human needs must be the method of discovering the objectives of the training that is demanded, not by individuals, but by the conditions of society."
- John Franklin Bobbitt, The Curriculum (1918).

Where's Waldo

After the Second World War, eugenics was generally perceived as discredited, and statistics and genetics were seen as emancipated from their association with it. 48 To some degree, this disassociation had started in the 1930s as a self-preservation strategy of the eugenicists, as well as due to opposing views within genetic research. 49

Scholarship shows that eugenics persisted. In one narrative, it was re-imagined as a "national productivist" eugenics, transitioning, as Alberto Spektorowski and Liza Ireni-Saban assert in The Politics of Eugenics, from a "racial hygiene" eugenics to a "national hygiene" eugenics, reflecting nativist anxiety over economically unproductive, rather than biologically defective, people. 50 They trace this continuity in the policies of the Scandinavian welfare states, which have had legal frameworks promoting voluntary sterilization and genetic counseling, in a context of elite anxiety over demographic change and immigration.

In another common account, eugenic "pseudo-science" was subsumed into the hard science of genetics, went dormant, and is waiting to pop out some time in the future, once it acquires a solid scientific base. This possibility is certainly suggested by the appearance of numerous pro-eugenic books, such as Davenport's Dream, in the early 21st century, along with the advent of the decoding of the human genome. Statistics is still implicated in this scenario: to the extant that intelligence and dispositional traits are genetic, they are polygenic - so widely distributed among the genome that their "location" is not likely to be known. 51 For neo-eugenicists, the biggest big data in our genes would be as elusive as ever, an invisible network inferred via mathematics. The human genome has been sequenced, but to discover what it means, statistics will be the primary approach; the future of statistics is bound to genetics and its enormous pool of data.

Eugenics as a Science of the Social

A provocative thesis is put forward by Sanem Guvenc-Salgirli, in a dissertation, Eugenics As Science Of The Social (2009). In Guvenc-Salgirli's formulation, eugenics was not pseudo-science, but a new scientific paradigm:

...It was not merely an ideology of the professional middle classes as Mackenzie argued, nor a mere social engineering, or social technology that aided the states for better control over the population. It was the science of the new social order that was going to be invented and created by the intellectual aristocrats. In imagining such order, it did not matter where the political interests rested. Left or right, all of them shared similar upbringing, education, culture and ambitions. In other words, all of them knew how to share a dinner table, and it was the economy of that dinner table that they wanted to preserve...

...if we are to argue that eugenics was a proper science then we need to inquire about its object. I claim that object to be the social world, and eugenics as science and eugenicists as scientists tried to create and define their object, i.e. they tried at the same time both to create and define the social world.

Eugenics was not a biological science, but eugenicists used genetics research, as much as they used psychiatry, sociology, and anthropology. But again, it was not a social science. It was not trying to explore and find out about the rules of the social world; it was trying to invent it... Eugenics was a proper science with an improper object; it was the science of the social ...

...They defined social life, the components of it through heredity. However, heredity was not a biological concept for them, but rather, it was social, and as such, it was not reserved for the individual human being per se. The individual was part of a bigger collective and defined by the characteristics of that collective. In other words, the individual was not a solitary human being. Within the individual the eugenicists saw the family and through family they saw a particular class; in short the individual signified the class it belonged to. As such, the unit of analysis of the science of eugenics was never the individual human being, but it was the collective that the individual was part of.

The totality of the social world, i.e. streets, promenades, restaurants, and even homes constituted the elements that needed redefining and reordering...

Through such dissemination, accompanied by a will to dominate the social morality, eugenics went beyond the confines of the genetic science. Moreover, eugenicists were well aware of the pitfalls of Mendelian genetics... Consequentially, there was the recognition that even the genetics of the first half of the twentieth century was not precise. And if eugenics was to become a science it needed such precision. The eugenicists invented statistics instead ... 52

Historian of science Theodore Porter notes that today statistical groups persist, but have also multiplied and narrowed, and are less often dependent on the imposition of broad yet bounding categories like race, nation or ethnicity:

In recent times, there has been a strong move to individualize statistical results. Statistics grew up as the science of mass observation and mass society, of collectives and wide averages. Although the controlled experiments described by Pearson's successor (and bitter rival) R. A. Fisher allowed statistics to be applied to relatively small populations, it was usually with the intention of universalizing the results. But now there are increasingly powerful tools of analysis and data management -- I'm thinking especially of computers -- that allow the manipulation of information regarding all kinds of sub-categories, even rather small ones. In a database of millions or hundreds of millions, few will be so odd or unique that they can't be grouped with others to support a statistical analysis. Beyond that, there is a great interest now in finding ways to combine categories, and thereby to individualize still further.

...Statistics has always proceeded by grouping and counting rather than trying to fathom the infinite complexities of the individual. This aspect of statistics will not melt away, but the ambition of our time is to tighten and narrow the categories rather than to apply great averages indiscriminately. 54

That this targeting apparatus has become narrower to the point of de-individuation, or capable of generating interesting "remixes" of categories where "the ongoing formulation of the social replaces what historically have been considered social formations" 54.1 does not necessarily forbid its use for a conventional social purpose, perhaps as an act of exclusion which validates an existing or innovated whole - for example, a whole which might go by a name which is not commonly understood to be explicitly social, like "the human genome" - which carries a metaphysical heft as effectively as "race."

Eugenics as an Information Science

Guvenc-Salgirli submitted Eugenics as a Science of the Social in 2009, but the thesis focused on medical practice in 1930's Istanbul, nowhere making a connection to information science or to "big data." But building on her concept of eugenics as a science of the social, one could claim with more specificity that eugenics was a creatively ambitious branch of information science: institutionally, eugenics manifested not as prisons, schools or hospitals, but as research organizations whose products were data archives and research materials for use by those institutions.

In 1919, the Carnegie Institute reported that the Eugenics Record Office had modified the Dewey Decimal System for use in its archives:

"A new scheme for classifying filing and indexing all records books and correspondence of the Eugenics Record Office has been worked out and described in detail in a mimeographed pamphlet of 23 pages. These instructions provide for the three types of eugenical records the archives the library and the correspondence files. The new system is based upon the experience of the past decade in classifying and indexing eugenical material. The Dewey Decimal System is incorporated into the library scheme. The plan for classifying the archives is a new one in which the different types of records are distributed among 19 files, each designated by a distinctive letter. The material within each file is classified according to the trait book...

This report brings the total number of index cards reported up to 684,064. Since each card has space for 40 entries (though in most cases there are only 1 or 2 entries) it is certain that the entries must be much over 1,000,000, and probably nearly 2,000,000. Of special field workers reports we have now 56,825 pages. Of the record of family traits there are on file approximately 3,000. 53

The Trait Book had its own internal decimal system classification scheme as well.

The inter-disciplines and research methods of eugenics and information science overlap, spanning information retrieval, classification, categorization, bibliography, taxonomy, nomenclature, archival and library science, mathematics, case studies, histories, interviews, biographies, longitudinal studies, and field research.

Justice Previsioned

In the interview excerpted above, Theodore Porter indirectly references the "original position" and "veil of ignorance" which philosopher John Rawls considered foundational to his theory of justice, implying that the foreknowledge embodied in big data poses a fundamental political risk:

...individuation has negative as well as positive consequences. Many of the most important institutions of the welfare state depend on a statistical perspective. Social insurance involves what we may call statistical communities, communities that depend on ignorance of details. It's like the social contract proposed by John Rawls, in which we are imagined to choose political arrangements before we know anything about our station in life. Since we are all subject to medical hazards and don't know who among us will suffer serious illness, we (more or less) willingly share the risks through taxes or other mandatory contributions.

...Individualized medical prognostication would solve some important problems, but would tend to create new forms of inequality and also new forms of discrimination by employers, business partners, and -- who knows? -- maybe lovers and potential spouses as well. 55

Building upon Guvenc-Salgirli, Porter and Rawls (or any social contract theory reliant on thick or thin veils), one might further claim that eugenics was an information science whose aim was to provide a knowledge infrastructure for the systematic re-creation and administration of the social - not with the political burden of negotiating the "claims or oppositions of special interests," or the a priori duty of voluntarily de-personalizing the ground for those claims in order to forge a social contract, but as an automated regime driven by the embedding of individuals in a web of quantification, rendering what was previously contestable as incontestable, via the testimony of one's phenotype and genotype, medical, social or family history, and intelligence quotient. Seen from a contemporary perspective, this challenges the basis of both the deliberative and agonistic conceptions of democratic politics with which the constitution of a just society is sought in liberal democracies.

Into the West

"Send forth the best ye breed" - This is Kipling's cynical advice to a nation which happily can never follow it. But could it be accepted literally and completely, the nation in time would breed only second rate men. By the sacrifice of their best, or the emigration of the best, and by such influences alone, have races fallen from first rate to second rate in the march of history.

- Blood of a Nation, David Starr Jordan, President of Stanford University, 1903.

"A Western man," says Dr Amos Griswold Warner, "is an Eastern man who has had some additional experiences." The Californian is a man ...who has learned a thing or two he did not know in the East, and perhaps has forgotten some things it would have been as well to remember... The thing that he is most likely to forget is that the escape from public opinion is not escape from the consequences of wrong action.

- David Starr Jordan, California and the Californians, 1907

Love Me or Leave You

In October 2013, Balaji S. Srinivasan, the co-founder of a genomics startup which claims to test more than 3% of all prospective births in the United States, and who has taught data-mining, statistics, and computational biology in the Department of Statistics at Stanford University, made waves by suggesting that Silicon Valley "exit" from the United States. As the New York Times put it, he:

"told a group of young entrepreneurs that the United States had become 'the Microsoft of nations': outdated and obsolescent. When technology companies calcify, Mr. Srinivasan said, you don't reform them. You exit and launch your own. Why not do so with America." 56

Using the exit-voice-loyalty conceptual framework developed by economist Albert Hirschman, Srinivasan somewhat incoherently invoked tropes of national decline, broken governance, immigration, emigration, depoliticization and the engineering project of world betterment:

"A company or a country is in decline, you can try voice, or you can try exit..."

"we're not just a nation of immigrants, we're a nation of emigrants: we're shaped by both voice and exit, starting with the Puritans..."

"What do I mean by Silicon Valley's ultimate exit? It basically means: build an opt-in society, ultimately outside the US, run by technology. And this is actually where the Valley is going. This is where we're going over the next ten years..."

"The Paper Belt may stop us from leaving, and that's actually what I think of as one of the most important things over the next ten years, is to use technology, especially mobile, to reduce the barriers to exit. With it, we can build a world run by software..." 57

Discovering that his expertise sequencing genomes has failed to guarantee an uncontested path to his goal - the commercialization of genomes and the methods with which they are studied - he wants to "give people tools to reduce influence of bad policies on their lives without getting involved in politics." The Paper Belt isn't merely federal regulation, but all state and local government as well. Once the tech elite has fully emigrated, the artifacts of innovation and wealth produced by this deregulated opt-in society would then trickle down from the 1% to the rest of the world.

Srinivasan, who co-founded the direct-to-consumer genetic screening company Counsyl - which retains copyright on the results reports of the individual genetic tests it performs - is also an author of the "Long Tail" paper about the universal prevelance of lethal alleles referenced above.

Scientific Exceptionalism

In the Silicon Valley exit kerfluffle we can discover the persistence of a historical relation of social forces and ideas about remaking the social, except with the suggestion that if "science" can't enlist the cooperation of the state toward this goal, scientists and engineers have the duty to emigrate (metaphorically or in actual space) in order to do so.

From a contemporary perspective, eugenics is perceived as unthinkable, not merely because its effects were so awful, but also because of the very public role of scientists in shaping a eugenic society. Theodore Porter, in his biography of Karl Pearson, and in a paper, "How Science Became Technical," notes the irony in the example of Pearson, "whose vision of statistics was allied to a powerful sense of public reason," but who was also responsible for the narrowing of science to a field of technical expertise - a narrowing greatly achieved through the introduction of statistics into science. 58 It's become unthinkable for scientists to have a public voice expressing normative opinions on contested social issues, to the point where it seems natural to assert that the "social" is an improper object of science - when it does happen, it's perceived as radical. Yet, nineteenth-century scientists, from all branches, had no such qualms.

Power enlists science through a demand for narrow and "detached" technical analyses. Srinivasan's complaint about "voice" needs to be construed, not as an attempt to re-assert the role of scientists in public reason, but as a reaction to the limitation that his expert cost-benefit analyses are subject to any contestation at all. The new society of emigres he imagines would not be one in which scientists regained their former public role and normative voice, but one in which the current trend to a narrow expert analysis continues in an unopposed fashion, unencumbered by a demos of non-experts, and executed with an arrogance which seeks not only to bypass an engaged public, but also the bureaucratic administrators whom the scientific elite once supported.

Conclusion

Eugenics was a cluster of anxieties, ambitions, prejudices, fantasies, techniques, prescriptions. Although its practice was supposed to result in quantifiable probabilities, the record shows - for instance, in the case of the self-described "socialist" Karl Pearson using statistics to marshall arguments against Jewish immigration to Britain - that judgements were clouded, even as they arrogated objectivity and advocated for progress.

The earliest statistical researchers and eugenicists knew nothing about molecular genetics. Observational data and imaginary traits were the proxies for this ignorance. The knowledge of or presence of genes, or the material support provided by the physical sciences, was not required for eugenics to be put into political practice: stoking anxiety over heredity and fertility was enough to persuade citizens, while supercharged statistics swayed the state. The early genetic scientists made incredible advances in understanding the genetic basis of life and the development of statistical science. Concurrently, those same scientists and their elite cohorts, under the mantle of eugenics, made use of statistical science applied to questionable data and normative assumptions, in pursuit of social ends -- ends which they sought to enforce, most often but not exclusively, by lobbying the power of the state. If eugenics has a parallel today, perhaps it is in the form of a statistic-driven science that seeks to measure elusive or imaginary intelligences, traits and behaviors, in order to justify certain social ends, executed with an arrogance that rises from the accomplishment of real scientific achievement; this parallel will not necessarily have a racist component that resembles its infamous predecessor, or normative rhetoric to justify it, and the power lobbied to support the social ends may not in every case be state power; if anything, the power differential between the eugenics movement and "big data" is the extent to which economic forces are at play, directly and rapidly: a swath of the social is being embedded in a continuously recombinant web of data in which the probabilities encoded in our genes are aggregated with ubiquitous surveillance and the ontological tagging of our activities, with the resulting datasets circulating in an automated high-speed trading system, where our social activities become information labor and "personal genomes" become private property.

Big data tends to be thought of either as a technical problem involving scale of storage, retrieval and analysis, or as social problem involving privacy and exploitation; two framings which have their value, but are incomplete. The social problem is not merely that data is circulating, exposed and exploited, but that data is being used to remake, or perhaps destroy, the social field; specifically by narrowing the incursive range of autonomous deliberation and contestation into it, threatening both human agency and the idea of justice.

The purpose of this paper is not to argue that big data is eugenics, or to argue that genetic science is evil, but to show that "Big Data 1.0" was constitutive of eugenics, and to suggest it is worth examining which stars in the constellation of eugenics went dark, which stars remain, and what new stars have appeared. Recent analyses have been tempted to see "big data" as representing a break from the past, a so-called "datalogical turn" in which algorithms "modulate the emergent forms of sociality in their emergence" 58.1 in a manner that moves beyond representation and statistical analysis, but we can see here that the aim of eugenics was precisely to modulate emergent sociality, using the means at its disposal in its time. Today the cluster of ideas and ambitions behind the development of eugenics is largely decoupled from early eugenicists' obsession with sexual fertility and race, at least for the moment - the hereditarians, some of whom think race is a useful concept, are still out there, with newly honed arguments, but still having to devote resources to a push-back against "political correctness." The other parts of the constellation are still operative in social policy, perhaps with some cosmetic adjustments:

When human DNA was fully sequenced in 2003, the National Human Genome Research Institute announced the "era of the genome." NHGRI is obviously not an organization of poets; a poet might have intuited what the etymologist knows, that era is from the Latin aera, meaning "counters used for calculation," 62 and rejected the portentous phrase as doubly banal.

On the Genome.gov website, the official NHGRI announcement from 2003 contains this interesting quote:

"'The Human Genome Project represents one of the remarkable achievements in the history of science..' said Eric Lander, Ph.D., director of the Whitehead-MIT Center for Genome Research. 'Biology is being transformed into an information science...'" 63 .

As we have seen, that is not an innovation.


1 Karl Pearson and Statistics: The Social Origins of Scientific Innovation, Bernard J. Norton, Social Studies of Science, Vol. 8, No. 1, (Feb., 1978), pp. 3-34.

1.1 Eugenics as Science of the Social: A Case from 1930s Istanbul, Sanem Guvenc - Salgirli, Binghamton University University of New York, 2009, ch. 4, ps.169 - 219

1.2 The History of Statistics: The Measurement of Uncertainty Before 1900, Stephen Stigler, p. 265

2 In the Name of Eugenics, Donald Kelves, p. 179 -184, 201 - 202

3 Francis Galton: Pioneer of Heredity and Biometry, Michael Bulmer, introduction

4 Kelves, p. 16.

5 Francis Galton: Pioneer of Heredity and Biometry, Michael Bulmer, introduction.

6 Kelves, p. 14

7 Inquiries Into Human Faculty and Its Development, SIr Francis Galton, p. 14

8 Are Composite Photographs Typical Pictures? McClures Magazine, September 1894.

9 Kelves, p. 17

10 Memories of My Life, Sir Francis Galton, p. 320

11 The Oxford Handbook of the History of Eugenics, Alison Bashford, et al, , page 38.

12 Bashford, et al, p.37

13 Aims and Methods of Eugenical Societies, Science, Oct 7, 1921

14 Statisticians of Their Centuries,  edited by C. C. Heyde, p. 254.

15 Porter, p. 260.

16 Life and Letters of Francis Galton, vol 3a, Karl Pearson, Cambridge University Press, 1930, p. 2.

17 http://www.wired.com/science/discoveries/magazine/16-07/pb_theory

18 Karl Pearson: The Scientific Life in a Statistical Age, Theodore M. Porter, p. 249.

19 See "Practical Eugenics" a Pearson pamphlet from 1912 for elaboration of the statistics and interpretation.

20 Emancipation Through Interaction - How Eugenics and Statistics Converged and Diverged, Francisco Louca, Journal of the History of Biology, 2009.

21 Emancipation Through Interaction - How Eugenics and Statistics Converged and Diverged, Francisco Louca, Journal of the History of Biology, 2009.

22 The History of Statistics, Their Development and Progress in Many Countries, ed. John Koren. 1919.

23 Life of Francis Galton by Karl Pearson Vol 2, p- 305-306.

24 The Oxford Handbook of the History of Eugenics, p. 10

25 The Eugenics Review, Volume 7, p. 86

26 The Metropolitan Life Insurance Company: Its History, Its Present Position In The Insurance World. Its Home Office Building And Its Work Carried On Therein. 1914.

27 Detroit Medical Journal, January 1914, Volume 14, p.149 - 150

28 Crossing Frontiers: Gerontology Emerges as a Science, W. Andrew Achenbaum, Cambridge Univ. Press, 1995, p.42.

29 Office Management, Joseph French Johnson, Alexander Hamilton Institute, 1919.

30 Transactions, vol. XXII, Actuarial Society of America, 1921.

31 A universal carrier test for the long tail of Mendelian disease, Reproductive BioMedicine Online

Volume 21, Issue 4, Pages 537-551, October 2010.

32 A universal carrier test for the long tail of Mendelian disease, Reproductive BioMedicine Online

Volume 21, Issue 4, Pages 537-551, October 2010.

33 Masculinity, Work, and the Fountain of Youth: Irving Fisher and the Life Extension Institute, 1914-31, Laura Davidow Hirshbein, CBMH/BCHM / Volume 16: 1999 / p.110.

34 Ibid, p 110 -111.

35 Programme of the Eugenics Society of the U.S.A., The Eugenics Review, 1923 October; 15(3):

36 Psychometrics of Intelligence, Peter H. Schonemann, Encyclopedia of Social Measurement, p.194.

37 Peter H. Schonemann p.194.

38 The British Journal of Psychology, Volume 5, Cambridge University Press, 1913, p.78

39 The British Journal of Psychology, p78

40 The Measurement of Intelligence, Lewis Madison Terman, Riverside press, p. 5

41 The Molecular Vision of Life: Caltech, The Rockefeller Foundation and the Rise of the new Biology, Lily E. Kay, Oxford, 1993, p. 83.

42 Gentlemen's Disagreement: Alfred Kinsey, Lewis Terman, and the Sexual Politics of Smart Men, Peter Hagerty,U. Chicago, 2013, p. 51

43 Hagerty, p. 51

44 Regional Advantage: Culture and Competition in Silicon Valley and Route 128, AnnaLee Saxenian, Harvard, 1996, p. 31

45 Eugenics and Education: Implications of Ideology, Memory, and History for Education in the United States, Ann Gibson Winfield, Dissertation, Graduate Faculty of North Carolina State University, 2004, p.11.

46 The Rugg Prototype For Democratic Education, Ronald w. Evans, International Journal of Social Education, Volume 22, Number 2, Fall 2007-2008, Pg 101-135

47 That Men May Understand, Harold Rugg, p.182.

48 Emancipation Through Interaction - How Eugenics and Statistics Converged and Diverged, Francisco Louca, Journal of the history of biology, 2009.

49 For the Betterment of the Race, Stefan Kuhl, Palgrave macMillan, 2013, p. 71 -72

50 The Politics of Eugenics, conclusion.

51 Kelves, p. 296

52 Eugenics as Science of the Social: A Case from 1930s Istanbul, Sanem Guvenc - Salgirli, Binghamton University University of New York, 2009, ch. 4, ps.169 - 219

53 Carnegie Institute of Washington, Report of the President , 1919, p. 151

54 Life on the Bell Curve: An Interview with Theodore Porter, Paul Fleming, Cabinet Magazine, Issue 15, Fall 2004.

54.1 The Datalogical Turn, Patricia Ticineto Clough, Karen Gregory, Benjamin Haber, and R. Joshua Scannell, Non-Representational Methodologies, Routledge 2015, p. 157 .

55 Life on the Bell Curve, Fleming., Cabinet Magazine, Issue 15, Fall 2004

56 Silicon Valley Roused by Secession Call, Anand Giridharadas, New York Times, Oct. 28, 2013.

57 http://www.youtube.com/watch?v=cOubCHLXT6A

58 How Science Became Technical, Theodore Porter, Isis, 2009, 100:292-309.

58.1 The Datalogical Turn, Patricia Ticineto Clough, Karen Gregory, Benjamin Haber, and R. Joshua Scannell, Non-Representational Methodologies, Routledge 2015, p. 158 .

59 The Rhetoric of Eugenics in Anglo-American Thought, AvMarouf Arif Hasian, 1996, Univ. of Georgia. , p. 147

60 http://www.wired.com/business/2013/11/bill-gates-wired-essay/all/

61 see To Save Everything, Click Here, Evgeny Morozov, Public Affairs, 2013.

62 http://www.etymonline.com/index.php?term=era&allowed_in_frame=0

63 International Consortium Completes Human Genome Project, http://www.genome.gov/11006929