Data-driven, theory-averse research is fuelled by the rankings hamster wheel
Aug 18, 2024Big data has contributed to a cultural shift towards evidence-based decision-making in academia, industry, and government, which prioritises empirical evidence over theory-based inquiry. It has also been associated with the boom in the publication of shorter journal articles and the decline in the publication of scholarly books, fuelled by the publish-or-perish academic rankings hamster wheel, writes John Howard.
The rapid development of technologies related to big data, machine learning, and artificial intelligence has accelerated the shift from theory-driven to data-driven research. Researchers now have enhanced capabilities to analyse vast datasets, revealing previously undiscovered patterns, correlations, and trends.
The surge in data generated by digital platforms, social media, online transactions, and sensors has further propelled this transformation, enabling empirical research on an unprecedented scale. However, this transition brings potential hazards and disadvantages, particularly concerning science, research, and innovation policy.
The shift from theory-driven to data-driven research began gradually in the mid-twentieth century with the introduction of mainframe computers in the 1950s and 1960s. These computers generated vast quantities of administrative data, such as taxation and income security data, allowing researchers to explore connections between different systems.
The development of relational database technologies in the 1970s and 1980s transformed data storage and retrieval, making it easier to manage large datasets. Structured Query Language (SQL), created in the 1970s, laid the foundation for extensive data-based research.
Starting in the 1990s, the Internet significantly increased the volume of data produced and accessible for research. The early 2000s saw major advances in machine learning and artificial intelligence, enabling researchers to uncover intricate patterns and apparent correlations in data.
Data science emerged as a distinct discipline in the late 2000s and early 2010s, with universities offering data science programs and businesses increasingly requiring data scientists.
Data-driven science can be massively helpful, as in the case of Google’s Deep Mind and protein folding. E-science/AI can assist in creating imaginative, cross-disciplinary hypotheses. The potential complementariness between data and theory has stimulated the emergence of new craft skills among scientists.
Big data technologies have facilitated the processing of enormous amounts of data across distributed computing environments. Cloud computing services enabled scalable and cost-effective data storage and processing, while the proliferation of IoT devices led to an explosion of data collection, increasing the demand for data-driven research methodologies.
Big data has contributed to a cultural shift towards evidence-based decision-making in academia, industry, and government, which prioritises empirical evidence over theory-based inquiry. It has also been associated with the boom in the publication of shorter journal articles and the decline in the publication of scholarly books, fuelled by the publish-or-perish academic rankings hamster wheel.
The foundation of theory-based research lies in formulating hypotheses and subsequently testing them empirically. In contrast, data-driven research explores large datasets without preconceived hypotheses, reducing the focus on hypothesis-driven research and diminishing the potential for generating new ideas and refining existing ones.
The advent of big data has shifted the emphasis of research inquiries, prompting researchers to focus on the “what” rather than the “why” of phenomena. This shift may lead to a greater emphasis on providing detailed descriptions rather than explanations in research, potentially limiting the extent and context of understanding.
To be sure, the shift to data-based empirical research has unlocked previously inaccessible areas of investigation, such as social media data, which provides insights into real-time public opinion and behaviour, and genomic data, which facilitates advances in personalised medicine.
However, excessive dependence on empirical data and statistical methodologies may overshadow qualitative research, curiosity, and theoretical speculation, thereby restricting the scope of scientific investigation. This shift fundamentally alters the essence of evidence and the role of theory in progressing knowledge.
For example, empirical studies on the Research and Development Tax Incentive (RDTI) have shown that it effectively facilitates increased business investments in research and development (R&D). Nevertheless, the proportion of R&D relative to the gross domestic product (GDP) has decreased since 2008.
A theory-based evaluation of the RDTI’s performance in this context would involve a more extensive array of assessment methods, including hypothesis testing, encompassing both qualitative and quantitative research.
Relying too heavily on correlations without investigating causal mechanisms raises the probability of making inaccurate conclusions. Data-driven techniques can propose correlations and statistical significance between variables but do not offer proof of a real relationship, leading to ineffective or potentially detrimental interventions.
Moreover, an excessive focus on data could undermine the fundamental principles of research by obscuring crucial contexts for interpreting occurrences, formulating hypotheses, and designing experiments.
Data-based research is inherently empirical and essential for testing and validating microtheories by providing granular and detailed data on specific variables to reach what are claimed to be evidence-based conclusions. Researchers claim that their models and explanations, based on observable reality rather than speculation, enhance the credibility and reliability of scientific knowledge.
Policymakers and advisers often claim that their work provides insights into specific aspects of larger systems to understand and model more detailed interactions. They argue that by continuously improving theoretical models based on empirical data, they can develop more effective and adaptable solutions to emerging challenges.
However, this claim is contestable. It assumes that microtheories can be universally applied to understand and effectively model larger systems. In particular, database and empiricist micro theories tend to be developed within specific contexts and are difficult to generalise to different and broader systemic factors and macro-level dynamics. Larger systems exhibit emergent properties that are not predictable from micro-level interactions.
Policymakers using micro theories might miss these emergent phenomena, leading to incomplete or incorrect policy prescriptions. Moreover, implying that understanding the parts (micro-level interactions) automatically leads to understanding the whole (macro-level systems) is dangerously reductionist. This approach ignores the intricacy of connections and the nonlinearity inherent in complex socio-cultural systems.
Moreover, data-driven research can stifle radical and disruptive advances that require long-term investment and risk-taking. Theory-based research, on the other hand, can challenge established paradigms, paving the way for findings that data alone may not reveal. However, it necessitates an environment that promotes strategic and long-term experimental endeavours.
This environment includes long-term funding and resources, grants for exploratory projects, investment in research infrastructure, and support for interdisciplinary collaboration. Researchers committed to theory-based research need the freedom to explore unconventional ideas without the pressure to produce immediate results.
Academic institutions and funding bodies should promote intellectual risk-taking, support long-term projects, and value theoretical research. It is crucial to recognise and reward contributions to theory development and create platforms for sharing insights across disciplines. A balanced approach is required to fully benefit from data-driven research and theory-driven inquiry.
The stakes are high. Data-driven research that lacks curiosity, insight, and moderation can lead to “Bad Theories,” resulting in poor policy practice and missed opportunities to create, apply, and use knowledge for economic and social progress.
A balance between data and theory is essential for achieving national goals in the development of new industries, climate change and renewable energy, as well as in reaching social goals covering diversity, equity, and inclusion.
A more complete version of this article can be downloaded from the Acton Institute website.