scholar who stressed experiment and observation

Scientific Method

Mark Cartwright

The Scientific Method was first used during the Scientific Revolution (1500-1700). The method combined theoretical knowledge such as mathematics with practical experimentation using scientific instruments, results analysis and comparisons, and finally peer reviews, all to better determine how the world around us works. In this way, hypotheses were rigorously tested, and laws could be formulated which explained observable phenomena. The goal of this scientific method was to not only increase human knowledge but to do so in a way that practically benefitted everyone and improved the human condition.

A New Approach: Bacon 's Vision

Francis Bacon (1561-1626) was an English philosopher, statesman, and author. He is considered one of the founders of modern scientific research and scientific method, even as "the father of modern science " because he proposed a new combined method of empirical (observable) experimentation and shared data collection so that humanity might finally discover all of nature's secrets and improve itself. Bacon championed the need for systematic and detailed empirical study, as this was the only way to increase humanity's understanding and, for him, more importantly, gain control of nature. This approach sounds quite obvious today, but at the time, the highly theoretical approach of the Greek philosopher Aristotle (l. 384-322 BCE) still dominated thought. Verbal arguments had become more important than what could actually be seen in the world. Further, natural philosophers had become preoccupied with why things happen instead of first ascertaining what was happening in nature.

Bacon rejected the current backward-looking approach to knowledge, that is, the seemingly never-ending attempt to prove the ancients right. Instead, new thinkers and experimenters, said Bacon, should act like the new navigators who had pushed beyond the limits of the known world. Christopher Columbus (1451-1506) had shown there was land across the Atlantic Ocean. Vasco da Gama (c. 1469-1524) had explored the globe in the other direction. Scientists, as we would call them today, had to be similarly bold. Old knowledge had to be rigorously tested to see that it was worth keeping. New knowledge had to be acquired by thoroughly testing nature without preconceived ideas. Reason had to be applied to data collected from experiments, and the same data had to be openly shared with other thinkers so that it could be tested again, comparing it to what others had discovered. Finally, this knowledge must then be used to improve the human condition; otherwise, it was no use pursuing it in the first place. This was Bacon's vision. What he proposed did indeed come about but with three notable factors added to the scientific method. These were mathematics, hypotheses, and technology.

The Importance of Experiments & Instruments

Experiments had always been carried out by thinkers, from ancient figures like Archimedes (l. 287-212 BCE) to the alchemists of the Middle Ages, but their experiments were usually haphazard, and very often thinkers were trying to prove a preconceived idea. In the Scientific Revolution, experimentation became a more systematic and multi-layered activity involving many different people. This more rigorous approach to gathering observable data was also a reaction against traditional activities and methods such as magic, astrology, and alchemy , all ancient and secret worlds of knowledge-gathering that now came under attack.

The Alchemists by Pietro Longhi

At the outset of the Scientific Revolution, experiments were any sort of activity carried out to see what would happen, a sort of anything-goes approach to satisfying scientific curiosity. It is important to note, though, that the modern meaning of scientific experiment is rather different, summarised here by W. E. Burns: "the creation of an artificial situation designed to study scientific principles held to apply in all situations" (95). It is fair to say, though, that the modern approach to experimentation, with its highly specialised focus where only one specific hypothesis is being tested, would not have become possible without the pioneering experimenters of the Scientific Revolution.

The first well-documented practical experiment of our period was made by William Gilbert using magnets; he published his findings in 1600 in On the Magnet . The work was pioneering because "Central to Gilbert's enterprise was the claim that you could reproduce his experiments and confirm his results: his book was, in effect, a collection of experimental recipes" (Wootton, 331).

There remained sceptics of experimentation, those who stressed that the senses could be misled when the reason of the mind could not be. One such doubter was René Descartes (1596-1650), but if anything, he and other natural philosophers who questioned the value of the work of the practical experimenters were responsible for creating a lasting new division between philosophy and what we would today call science. The term "science" was still not widely used in the 17th century, instead, many experimenters referred to themselves as practitioners of "experimental philosophy". The first use in English of the term "experimental method" was in 1675.

The first truly international effort in coordinated experiments involved the development of the barometer. This process began with the efforts of the Italian Evangelista Torricelli (1608-1647) in 1643. Torricelli discovered that mercury could be raised within a glass tube when one end of that tube was placed in a container of mercury. The air pressure on the mercury in the container pushed the mercury in the tube up around 30 inches (76 cm) higher than the level in the container. In 1648, Blaise Pascal (1623-1662) and his brother-in- law Florin Périer conducted experiments using similar apparatus, but this time tested under different atmospheric pressures by setting up the devices at a variety of altitudes on the side of a mountain. The scientists noted that the level of the mercury in the glass tube fell the higher up the mountain readings were taken.

Torricelli's Barometer

The Anglo-Irish chemist Robert Boyle (1627-1691) named the new instrument a barometer and conclusively demonstrated the effect of air pressure by using a barometer inside an air pump where a vacuum was established. Boyle formulated a principle which became known as 'Boyle's Law'. This law states that the pressure exerted by a certain quantity of air varies inversely in proportion to its volume (provided temperatures are constant). The story of the development of the barometer became typical throughout the Scientific Revolution: natural phenomena were observed, instruments were invented to measure and understand these observable facts, scientists collaborated (sometimes even competed), and so they extended the work of each other until, finally, a universal law could be devised which explained what was being seen. This law could then be used as a predictive device in future experiments.

Experiments like Robert Boyle's air pump demonstrations and Isaac Newton 's use of a prism to demonstrate white light is made up of different coloured light continued the trend of experimentation to prove, test, and adjust theories. Further, these endeavours highlight the importance of scientific instruments in the new method of inquiry. The scientific method was employed to invent useful and accurate instruments, which were, in turn, used in further experiments. The invention of the telescope (c. 1608), microscope (c. 1610), barometer (1643), thermometer (c. 1650), pendulum clock (1657), air pump (1659), and balance spring watch (1675) all allowed fine measurements to be made which previously had been impossible. New instruments meant that a whole new range of experiments could be carried out. Whole new specialisations of study became possible, such as meteorology, microscopic anatomy, embryology, and optics.

The scientific method came to involve the following key components:

  • conducting practical experiments
  • conducting experiments without prejudice of what they should prove
  • using deductive reasoning (creating a generalisation from specific examples) to form a hypothesis (untested theory), which is then tested by an experiment, after which the hypothesis might be accepted, altered, or rejected based on empirical (observable) evidence
  • conducting multiple experiments and doing so in different places and by different people to confirm the reliability of the results
  • an open and critical review of the results of an experiment by peers
  • the formulation of universal laws (inductive reasoning or logic) using, for example, mathematics
  • a desire to gain practical benefits from scientific experiments and a belief in the idea of scientific progress

(Note: the above criteria are expressed in modern linguistic terms, not necessarily those terms 17th-century scientists would have used since the revolution in science also caused a revolution in the language to describe it).

Newton's Prism

Scientific Institutions

The scientific method really took hold when it became institutionalised, that is, when it was endorsed and employed by official institutions like the learned societies where thinkers tested their theories in the real world and worked collaboratively. The first such society was the Academia del Cimento in Florence, founded in 1657. Others soon followed, notably the Royal Academy of Sciences in Paris in 1667. Four years earlier, London had gained its own academy with the foundation of the Royal Society . The founding fellows of this society credited Bacon with the idea, and they were keen to follow his principles of scientific method and his emphasis on sharing and communicating scientific data and results. The Berlin Academy was founded in 1700 and the St. Petersburg Academy in 1724. These academies and societies became the focal points of an international network of scientists who corresponded, read each other's works, and even visited each other as the new scientific method took hold.

Official bodies were able to fund expensive experiments and assemble or commission new equipment. They showed these experiments to the public, a practice that illustrates that what was new here was not the act of discovery but the creation of a culture of discovery. Scientists went much further than a real-time audience and ensured their results were printed for a far wider (and more critical) readership in journals and books. Here, in print, the experiments were described in great detail, and the results were presented for all to see. In this way, scientists were able to create "virtual witnesses" to their experiments. Now, anyone who cared to be could become a participant in the development of knowledge acquired through science.

Subscribe to topic Related Content Books Cite This Work License

Bibliography

  • Burns, William E. The Scientific Revolution in Global Perspective. Oxford University Press, 2015.
  • Burns, William E. The Scientific Revolution. ABC-CLIO, 2001.
  • Bynum, William F. & Browne, Janet & Porter, Roy. Dictionary of the History of Science . Princeton University Press, 1982.
  • Henry, John. The Scientific Revolution and the Origins of Modern Science . Red Globe Press, 2008.
  • Jardine, Lisa. Ingenious Pursuits. Nan A. Talese, 1999.
  • Moran, Bruce T. Distilling Knowledge. Harvard University Press, 2006.
  • Wootton, David. The Invention of Science. Harper, 2015.

About the Author

Mark Cartwright

Translations

We want people all over the world to learn about history. Help us and translate this definition into another language!

Questions & Answers

What are the different steps of the scientific method, what was the scientific method in the scientific revolution, related content.

Scientific Revolution

Scientific Revolution

Science

Ancient Greek Science

The Scientific Revolution

The Scientific Revolution

Roman Science

Roman Science

Women Scientists in the Scientific Revolution

Women Scientists in the Scientific Revolution

Free for the world, supported by you.

World History Encyclopedia is a non-profit organization. For only $5 per month you can become a member and support our mission to engage people with cultural heritage and to improve history education worldwide.

Recommended Books

Cite this work.

Cartwright, M. (2023, November 07). Scientific Method . World History Encyclopedia . Retrieved from https://www.worldhistory.org/Scientific_Method/

Chicago Style

Cartwright, Mark. " Scientific Method ." World History Encyclopedia . Last modified November 07, 2023. https://www.worldhistory.org/Scientific_Method/.

Cartwright, Mark. " Scientific Method ." World History Encyclopedia . World History Encyclopedia, 07 Nov 2023. Web. 27 Sep 2024.

License & Copyright

Submitted by Mark Cartwright , published on 07 November 2023. The copyright holder has published this content under the following license: Creative Commons Attribution-NonCommercial-ShareAlike . This license lets others remix, tweak, and build upon this content non-commercially, as long as they credit the author and license their new creations under the identical terms. When republishing on the web a hyperlink back to the original content source URL must be included. Please note that content linked from this page may have different licensing terms.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • News & Views
  • Published: 26 October 2016

In retrospect

Eighty years of stress

  • George Fink 1  

Nature volume  539 ,  pages 175–176 ( 2016 ) Cite this article

7951 Accesses

13 Citations

19 Altmetric

Metrics details

  • Neuroscience
  • Post-traumatic stress disorder

The discovery in 1936 that rats respond to various damaging stimuli with a general response that involves alarm, resistance and exhaustion launched the discipline of stress research.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

C. elegans electrotaxis behavior is modulated by heat shock response and unfolded protein response signaling pathways

  • Shane K. B. Taylor
  • , Muhammad H. Minhas
  •  …  Bhagwati P. Gupta

Scientific Reports Open Access 04 February 2021

Access options

Subscribe to this journal

Receive 51 print issues and online access

185,98 € per year

only 3,65 € per issue

Buy this article

  • Purchase on SpringerLink
  • Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

scholar who stressed experiment and observation

Don Graham/Division de la Gestion de Documents et des Archives, Univ. Montréal

See all news & views

Selye, H. Nature 138 , 32 (1936).

Article   ADS   Google Scholar  

Selye, H. Br. Med. J. 1 , 1383–1392 (1950).

Article   CAS   Google Scholar  

Bernard, C. Leçons sur les propriétés physiologiques et les altérations pathologiques des liquides de l'organisme (Baillière, 1859).

Google Scholar  

Cannon, W. B. The Wisdom of the Body (Norton, 1932).

Book   Google Scholar  

Sherwin, R. R. & Sacca, L. Am. J. Physiol. Endocrinol. Metab. 245 , E157–E165 (1984).

Article   Google Scholar  

Selye, H. J. Hum. Stress 1 , 37–44 (1975).

Selye, H. Stress in Health and Disease (Butterworth, 1976).

Haslbeck, M. & Vierling, E. J. Mol. Biol. 427 , 1537–1548 (2015).

Park, C.-J. & Seo, Y.-S. Plant Pathol. J. 31 , 323–333 (2015).

Pacak, K. et al. Am. J. Physiol. 275 , R1247–R1255 (1998).

Sterling, P. & Eyer, J. in Handbook of Life Stress, Cognition and Health (eds Fisher, S. & Reason, J.) 629–649 (Wiley, 1988).

Schulkin, J. (ed.) Allostasis, Homeostasis, and the Costs of Physiological Adaptation (Cambridge Univ. Press, 2004).

McEwen, B. S. Physiol. Rev. 87 , 873–904 (2007).

Vale, W., Spiess, J., Rivier, C. & Rivier, J. Science 213 , 1394–1397 (1981).

Article   ADS   CAS   Google Scholar  

Fink, G. J. Neuroendocrinol. 23 , 107–117 (2011).

Download references

Author information

Authors and affiliations.

George Fink is at the Florey Institute of Neuroscience and Mental Health, University of Melbourne, Parkville, Victoria 3010, Australia.,

George Fink

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to George Fink .

Related links

Related links in nature research.

Depression: The best way forward

Psychiatry: A molecular shield from trauma

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Fink, G. Eighty years of stress. Nature 539 , 175–176 (2016). https://doi.org/10.1038/nature20473

Download citation

Published : 26 October 2016

Issue Date : 10 November 2016

DOI : https://doi.org/10.1038/nature20473

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

  • Muhammad H. Minhas
  • Bhagwati P. Gupta

Scientific Reports (2021)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

scholar who stressed experiment and observation

SEP logo

  • Table of Contents
  • New in this Archive
  • Chronological
  • Editorial Information
  • About the SEP
  • Editorial Board
  • How to Cite the SEP
  • Special Characters
  • Support the SEP
  • PDFs for SEP Friends
  • Make a Donation
  • SEPIA for Libraries
  • Entry Contents

Bibliography

Academic tools.

  • Friends PDF Preview
  • Author and Citation Info
  • Back to Top

Theory and Observation in Science

Scientists obtain a great deal of the evidence they use by observing natural and experimentally generated objects and effects. Much of the standard philosophical literature on this subject comes from 20 th century logical empiricists, their followers, and critics who embraced their issues and accepted some of their assumptions even as they objected to specific views. Their discussions of observational evidence tend to focus on epistemological questions about its role in theory testing. This entry follows their lead even though observational evidence also plays important and philosophically interesting roles in other areas including scientific discovery, the development of experimental tools and techniques, and the application of scientific theories to practical problems.

The issues that get the most attention in the standard philosophical literature on observation and theory have to do with the distinction between observables and unobservables, the form and content of observation reports, and the epistemic bearing of observational evidence on theories it is used to evaluate. This entry discusses these topics under the following headings:

1. Introduction

2. what do observation reports describe, 3. is observation an exclusively perceptual process, 4. how observational evidence might be theory laden, 5. salience and theoretical stance, 6. semantic theory loading, 7. operationalization and observation reports, 8. is perception theory laden, 9. how do observational data bear on the acceptability of theoretical claims, 10. data and phenomena, 11. conclusion, other internet resources, related entries.

Reasoning from observations has been important to scientific practice at least since the time of Aristotle who mentions a number of sources of observational evidence including animal dissection (Aristotle(a) 763a/30–b/15, Aristotle(b) 511b/20–25). But philosophers didn’t talk about observation as extensively, in as much detail, or in the way we have become accustomed to, until the 20 th century when logical empiricists transformed philosophical thinking about it.

The first transformation was accomplished by ignoring the implications of a long standing distinction between observing and experimenting. To experiment is to isolate, prepare, and manipulate things in hopes of producing epistemically useful evidence. It had been customary to think of observing as noticing and attending to interesting details of things perceived under more or less natural conditions, or by extension, things perceived during the course of an experiment. To look at a berry on a vine and attend to its color and shape would be to observe it. To extract its juice and apply reagents to test for the presence of copper compounds would be to perform an experiment. Contrivance and manipulation influence epistemically significant features of observable experimental results to such an extent that epistemologists ignore them at their peril. Robert Boyle (1661), John Herschell (1830), Bruno Latour and Steve Woolgar (1979), Ian Hacking (1983), Harry Collins (1985) Allan Franklin (1986), Peter Galison (1987), Jim Bogen and Jim Woodward (1988), and Hans-Jörg Rheinberger(1997), are some of the philosophers and philosophically minded scientists, historians, and sociologists of science who gave serious consideration to the distinction between observing and experimenting. The logical empiricists tended to ignore it.

A second transformation, characteristic of the linguistic turn in philosophy, was to shift attention away from things observed in natural or experimental settings and concentrate instead on the logic of observation reports. The shift developed from the assumption that a scientific theory is a system of sentences or sentence like structures (propositions, statements, claims, and so on) to be tested by comparison to observational evidence. Secondly it was assumed that the comparisons must be understood in terms of inferential relations. If inferential relations hold only between sentence like structures, it follows that theories must be tested, not against observations or things observed, but against sentences, propositions, etc. used to report observations. (Hempel 1935, 50–51. Schlick 1935)

Friends of this line of thought theorized about the syntax, semantics, and pragmatics of observation sentences, and inferential connections between observation and theoretical sentences. In doing so they hoped to articulate and explain the authoritativeness widely conceded to the best natural, social and behavioral scientific theories. Some pronouncements from astrologers, medical quacks, and other pseudo scientists gain wide acceptance, as do those of religious leaders who rest their cases on faith or personal revelation, and rulers and governmental officials who use their political power to secure assent. But such claims do not enjoy the kind of credibility that scientific theories can attain. The logical empiricists tried to account for this by appeal to the objectivity and accessibility of observation reports, and the logic of theory testing.

Part of what they meant by calling observational evidence objective was that cultural and ethnic factors have no bearing on what can validly be inferred about the merits of a theory from observation reports. So conceived, objectivity was important to the logical empiricists’ criticism of the Nazi idea that Jews and Aryans have fundamentally different thought processes such that physical theories suitable for Einstein and his kind should not be inflicted on German students. In response to this rationale for ethnic and cultural purging of the German educational system the logical empiricists argued that because of its objectivity, observational evidence, rather than ethnic and cultual factors should be used to evaluate scientific theories.(Galison 1990). Less dramatically, the efforts working scientists put into producing objective evidence attest to the importance they attach to objectivity. Furthermore it is possible, in principle at least, to make observation reports and the reasoning used to draw conclusions from them available for public scrutiny. If observational evidence is objective in this sense , it can provide people with what they need to decide for themselves which theories to accept without having to rely unquestioningly on authorities.

Francis Bacon argued long ago that the best way to discover things about nature is to use experiences (his term for observations as well as experimental results) to develop and improve scientific theories (Bacon1620 49ff). The role of observational evidence in scientific discovery was an important topic for Whewell (1858) and Mill (1872) among others in the 19 th century. Recently, Judaea Pearl, Clark Glymour, and their students and associates addressed it rigorously in the course of developing techniques for inferring claims about causal structures from statistical features of the data they give rise to (Pearl, 2000; Spirtes, Glymour, and Scheines 2000). But such work is exceptional. For the most part, philosophers followed Karl Popper who maintained, contrary to the title of one of his best known books, that there is no such thing as a ‘logic of discovery’.(Popper 1959, 31) Drawing a sharp distinction between discovery and justification, the standard philosophical literature devotes most of its attention to the latter.

Theories are customarily represented as collections of sentences, propositions, statements or beliefs, etc., and their logical consequences. Among these are maximally general explanatory and predictive laws (Coulomb’s law of electrical attraction and repulsion, and Maxwellian electromagnetism equations for example), along with lesser generalizations that describe more limited natural and experimental phenomena (e.g., the ideal gas equations describing relations between temperatures and pressures of enclosed gasses, and general descriptions of positional astronomical regularities). Observations are used in testing generalizations of both kinds.

Some philosophers prefer to represent theories as collections of ‘states of physical or phenomenal systems’ and laws. The laws for any given theory are

…relations over states which determine…possible behaviors of phenomenal systems within the theory’s scope. (Suppe 1977, 710)

So conceived, a theory can be adequately represented by more than one linguistic formulation because it is not a system of sentences or propositions. Instead, it is a non-linguistic structure which can function as a semantic model of its sentential or propositional representations. (Suppe 1977, 221–230) This entry treats theories as collections of sentences or sentential structures with or without deductive closure. But the questions it takes up arise in pretty much the same way when theories are represented in accordance with this semantic account.

One answer to this question assumes that observation is a perceptual process so that to observe is to look at, listen to, touch, taste, or smell something, attending to details of the resulting perceptual experience. Observers may have the good fortune to obtain useful perceptual evidence simply by noticing what’s going on around them, but in many cases they must arrange and manipulate things to produce informative perceptible results. In either case, observation sentences describe perceptions or things perceived.

Observers use magnifying glasses, microscopes, or telescopes to see things that are too small or far away to be seen, or seen clearly enough, without them. Similarly, amplification devices are used to hear faint sounds. But if to observe something is to perceive it, not every use of instruments to augment the senses qualifies as observational. Philosophers agree that you can observe the moons of Jupiter with a telescope, or a heart beat with a stethoscope. But minimalist empiricists like Bas Van Fraassen (1980, 16–17) deny that one can observe things that can be visualized only by using electron (and perhaps even) light microscopes. Many philosophers don’t mind microscopes but find it unnatural to say that high energy physicists observe particles or particle interactions when they look at bubble chamber photographs. Their intuitions come from the plausible assumption that one can observe only what one can see by looking, hear by listening, feel by touching, and so on. Investigators can neither look at (direct their gazes toward and attend to) nor visually experience charged particles moving through a bubble chamber. Instead they can look at and see tracks in the chamber, or in bubble chamber photographs.

The identification of observation and perceptual experience persisted well into the 20 th century—so much so that Carl Hempel could characterize the scientific enterprise as an attempt to predict and explain the deliverances of the senses (Hempel 1952, 653). This was to be accomplished by using laws or lawlike generalizations along with descriptions of initial conditions, correspondence rules, and auxiliary hypotheses to derive observation sentences describing the sensory deliverances of interest. Theory testing was treated as a matter of comparing observation sentences describing observations made in natural or laboratory settings to observation sentences that should be true according to the theory to be tested. This makes it imperative to ask what observation sentences report. Even though scientists often record their evidence non-sententially, e.g., in the form of pictures, graphs, and tables of numbers, some of what Hempel says about the meanings of observation sentences applies to non-sentential observational records as well.

According to what Hempel called the phenomenalist account, observation reports describe the observer’s subjective perceptual experiences.

…Such experiential data might be conceived of as being sensations, perceptions, and similar phenomena of immediate experience. (Hempel 1952, 674)

This view is motivated by the assumption that the epistemic value of an observation report depends upon its truth or accuracy, and that with regard to perception, the only thing observers can know with certainty to be true or accurate is how things appear to them. This means that we can’t be confident that observation reports are true or accurate if they describe anything beyond the observer’s own perceptual experience. Presumably one’s confidence in a conclusion should not exceed one’s confidence in one’s best reasons to believe it. For the phenomenalist it follows that reports of subjective experience can provide better reasons to believe claims they support than reports of other kinds of evidence. Furthermore, if C.I. Lewis had been right to think that probabilities cannot be established on the basis of dubitable evidence, (Lewis 1950, 182) observation sentences would have no evidential value unless they report the observer’s subjective experiences. [ 1 ]

But given the expressive limitations of the language available for reporting subjective experiences we can’t expect phenomenalistic reports to be precise and unambiguous enough to test theoretical claims whose evaluation requires accurate, fine- grained perceptual discriminations. Worse yet, if experiences are directly available only to those who have them, there is room to doubt whether different people can understand the same observation sentence in the same way. Suppose you had to evaluate a claim on the basis of someone else’s subjective report of how a litmus solution looked to her when she dripped a liquid of unknown acidity into it. How could you decide whether her visual experience was the same as the one you would use her words to report?

Such considerations led Hempel to propose, contrary to the phenomenalists, that observation sentences report ‘directly observable’, ‘intersubjectively ascertainable’ facts about physical objects

…such as the coincidence of the pointer of an instrument with a numbered mark on a dial; a change of color in a test substance or in the skin of a patient; the clicking of an amplifier connected with a Geiger counter; etc. (ibid.)

Observers do sometmes have trouble making fine pointer position and color discriminations but such things are more susceptible to precise, intersubjectively understandable descriptions than subjective experiences. How much precision and what degree of intersubjective agreement are required in any given case depends on what is being tested and how the observation sentence is used to evaluate it. But all things being equal, we can’t expect data whose acceptability depends upon delicate subjective discriminations to be as reliable as data whose acceptability depends upon facts that can be ascertained intersubjectively. And similarly for non-sentential records; a drawing of what the observer takes to be the position of a pointer can be more reliable and easier to assess than a drawing that purports to capture her subjective visual experience of the pointer.

The fact that science is seldom a solitary pursuit suggests that one might be able to use pragmatic considerations to finesse questions about what observation reports express. Scientific claims—especially those with practical and policy applications—are typically used for purposes that are best served by public evaluation. Furthermore the development and application of a scientific theory typically requires collaboration and in many cases is promoted by competition. This, together with the fact that investigators must agree to accept putative evidence before they use it to test a theoretical claim, imposes a pragmatic condition on observation reports: an observation report must be such that investigators can reach agreement relatively quickly and relatively easily about whether it provides good evidence with which to test a theory (Cf. Neurath 1913). Feyerabend took this requirement seriously enough to characterize observation sentences pragmatically in terms of widespread decidability. In order to be an observation sentence, he said, a sentence must be contingently true or false, and such that competent speakers of the relevant language can quickly and unanimously decide whether to accept or reject it on the basis what happens when they look, listen, etc. in the appropriate way under the appropriate observation conditions (Feyerabend 1959, 18ff).

The requirement of quick, easy decidability and general agreement favors Hempel’s account of what observation sentences report over the phenomenalist’s. But one shouldn’t rely on data whose only virtue is widespread acceptance. Presumably the data must possess additional features by virtue of which it can serve as an epistemically trustworthy guide to a theory’s acceptability. If epistemic trustworthiness requires certainty, this requirement favors the phenomenalists. Even if trustworthiness doesn’t require certainty, it is not the same thing as quick and easy decidability. Philosophers need to address the question of how these two requirements can be mutually satisfied.

Many of the things scientists investigate do not interact with human perceptual systems as required to produce perceptual experiences of them. The methods investigators use to study such things argue against the idea—however plausible it may once have seemed—that scientists do or should rely exclusively on their perceptual systems to obtain the evidence they need. Thus Feyerabend proposed as a thought experiment that if measuring equipment was rigged up to register the magnitude of a quantity of interest, a theory could be tested just as well against its outputs as against records of human perceptions (Feyerabend 1969, 132–137).

Feyerabend could have made his point with historical examples instead of thought experiments. A century earlier Helmholtz estimated the speed of excitatory impulses traveling through a motor nerve. To initiate impulses whose speed could be estimated, he implanted an electrode into one end of a nerve fiber and ran a current into it from a coil. The other end was attached to a bit of muscle whose contraction signaled the arrival of the impulse. To find out how long it took the impulse to reach the muscle he had to know when the stimulating current reached the nerve. But

[o]ur senses are not capable of directly perceiving an individual moment of time with such small duration…

and so Helmholtz had to resort to what he called ‘artificial methods of observation’ (Olesko and Holmes 1994, 84). This meant arranging things so that current from the coil could deflect a galvanometer needle. Assuming that the magnitude of the deflection is proportional to the duration of current passing from the coil, Helmholtz could use the deflection to estimate the duration he could not see ( ibid ). This ‘artificial observation’ is not to be confused e.g., with using magnifying glasses or telescopes to see tiny or distant objects. Such devices enable the observer to scrutinize visible objects. The miniscule duration of the current flow is not a visible object. Helmholtz studied it by looking at and seeing something else. (Hooke (1705, 16–17) argued for and designed instruments to execute the same kind of strategy in the 17 th century.) The moral of Feyerabend’s thought experiment and Helmholtz’s distinction between perception and artificial observation is that working scientists are happy to call things that register on their experimental equipment observables even if they don’t or can’t register on their senses.

Some evidence is produced by processes so convoluted that it’s hard to decide what, if anything has been observed. Consider functional magnetic resonance images (fMRI) of the brain decorated with colors to indicate magnitudes of electrical activity in different regions during the performance of a cognitive task. To produce these images, brief magnetic pulses are applied to the subject’s brain. The magnetic force coordinates the precessions of protons in hemoglobin and other bodily stuffs to make them emit radio signals strong enough for the equipment to respond to. When the magnetic force is relaxed, the signals from protons in highly oxygenated hemoglobin deteriorate at a detectably different rate than signals from blood that carries less oxygen. Elaborate algorithms are applied to radio signal records to estimate blood oxygen levels at the places from which the signals are calculated to have originated. There is good reason to believe that blood flowing just downstream from spiking neurons carries appreciably more oxygen than blood in the vicinity of resting neurons. Assumptions about the relevant spatial and temporal relations are used to estimate levels of electrical activity in small regions of the brain corresponding to pixels in the finished image. The results of all of these computations are used to assign the appropriate colors to pixels in a computer generated image of the brain. The role of the senses in fMRI data production is limited to such things as monitoring the equipment and keeping an eye on the subject. Their epistemic role is limited to discriminating the colors in the finished image, reading tables of numbers the computer used to assign them, and so on.

If fMRI images record observations, it’s hard to say what was observed—neuronal activity, blood oxygen levels, proton precessions, radio signals, or something else. (If anything is observed, the radio signals that interact directly with the equipment would seem to be better candidates than blood oxygen levels or neuronal activity.) Furthermore, it’s hard to reconcile the idea that fMRI images record observations with the traditional empiricist notion that much as they may be needed to draw conclusions from observational evidence, calculations involving theoretical assumptions and background beliefs must not be allowed (on pain of loss of objectively) to intrude into the process of data production. The production of fMRI images requires extensive statistical manipulation based on theories about the radio signals, and a variety of factors having to do with their detection along with beliefs about relations between blood oxygen levels and neuronal activity, sources of systematic error, and so on.

In view of all of this, functional brain imaging differs, e.g., from looking and seeing, photographing, and measuring with a thermometer or a galvanometer in ways that make it uninformative to call it observation at all. And similarly for many other methods scientists use to produce non-perceptual evidence.

Terms like ‘observation’ and ‘observation reports’ don’t occur nearly as much in scientific as in philosophical writings. In their place, working scientists tend to talk about data . Philosophers who adopt this usage are free to think about standard examples of observation as members of a large, diverse, and growing family of data production methods. Instead of trying to decide which methods to classify as observational and which things qualify as observables, philosophers can then concentrate on the epistemic influence of the factors that differentiate members of the family. In particular, they can focus their attention on what questions data produced by a given method can be used to answer, what must be done to use that data fruitfully, and the credibility of the answers they afford.(Bogen 2016)

It is of interest that records of perceptual observation are not always epistemically superior to data from experimental equipment. Indeed it is not unusual for investigators to use non-perceptual evidence to evaluate perceptual data and correct for its errors. For example, Rutherford and Pettersson conducted similar experiments to find out if certain elements disintegrated to emit charged particles under radioactive bombardment. To detect emissions, observers watched a scintillation screen for faint flashes produced by particle strikes. Pettersson’s assistants reported seeing flashes from silicon and certain other elements. Rutherford’s did not. Rutherford’s colleague, James Chadwick, visited Petterson’s laboratory to evaluate his data. Instead of watching the screen and checking Pettersson’s data against what he saw, Chadwick arranged to have Pettersson’s assistants watch the screen while unbeknownst to them he manipulated the equipment, alternating normal operating conditions with a condition in which particles, if any, could not hit the screen. Pettersson’s data were discredited by the fact that his assistants reported flashes at close to the same rate in both conditions (Steuwer 1985, 284–288).

Related considerations apply to the distinction between observable and unobservable objects of investigation. Some data are produced to help answer questions about things that do not themselves register on the senses or experimental equipment. Solar neutrino fluxes are a frequently discussed case in point. Neutrinos cannot interact directly with the senses or measuring equipment to produce recordable effects. Fluxes in their emission were studied by trapping the neutrinos and allowing them to interact with chlorine to produce a radioactive argon isotope. Experimentalists could then calculate fluxes in solar neutrino emission from Geiger counter measurements of radiation from the isotope. The epistemic significance of the neutrinos’ unobservability depends upon factors having to do with the reliability of the data the investigators managed to produce, and its validity as a source of information about the fluxes. It’s validity will depend, among many other things, on the correctness of the investigators ideas about how neutrinos interact with chlorine (Pinch 1985). But there are also unobservables that cannot be detected, and whose features cannot be inferred from data of any kind. These are the only unobservables that are epistemically unavailable. Whether they remain so depends upon whether scientists can figure out how to produce data to study them. The history of particle physics (see e.g. Morrison 2015) and neuro-science (see e.g., Valenstein 2005).

Empirically minded philosophers assume that the evidential value of an observation or observational process depends on how sensitive it is to whatever it is used to study. But this in turn depends on the adequacy of any theoretical claims its sensitivity may depend on. For example we can challenge the use of a thermometer reading, e , to support a description, prediction, or explanation of a patient’s temperature, t , by challenging theoretical claims, C , having to do with whether a reading from a thermometer like this one, applied in the same way under similar conditions, should indicate the patient’s temperature well enough to count in favor of or against t . At least some of the C will be such that regardless of whether an investigator explicitly endorses, or is even aware of them, her use of e would be undermined by their falsity. All observations and uses of observations evidence are theory laden in this sense. (Cf. Chang 2005), Azzouni 2004.) As the example of the thermometer illustrates, analogues of Norwood Hanson’claim that seeing is a theory laden undertaking apply just as well to equipment generated observations.(Hanson 1958, 19). But if all observations and observational processes are theory laden, how can they provide reality based, objective epistemic constraints on scientific reasoning? One thing to say about this is that the theoretical claims the epistemic value of a parcel of observational evidence depends on may be may be quite correct. If so, even if we don’t know, or have no way to establish their correctness, the evidence may be good enough for the uses to which we put it. But this is cold comfort for investigators who can’t establish it. The next thing to say is that scientific investigation is an ongoing process during the course of which theoretical claims whose unacceptability would reduce the epistemic value of a parcel of evidence can be challenged and defended in different ways at different times as new considerations and investigative techniques are introduced. We can hope that the acceptability of the evidence can be established relative to one or more stretches of time even though success in dealing with challenges at one time is no guarantee that all future challenges can be satisfactorily dealt with. Thus as long as scientists continue their work there need be no time at which the epistemic value of a parcel of evidence can be established once and for all. This should come as no surprise to anyone who is aware that science is fallible. But it is no grounds for skepticism. It can be perfectly reasonable to trust the evidence available at present even though it is logically possible for epistemic troubles to arise in the future.

Thomas Kuhn (1962), Norwood Hanson (1958), Paul Feyerabend (1959) and others cast suspicion on the objectivity of observational evidence in another way by arguing that one can’t use empirical evidence to teat a theory without committing oneself to that very theory. Although some of the examples they use to present their case feature equipment generated evidence, they tend to talk about observation as a perceptual process. Kuhn’s writings contain three different versions of this idea.

K1 . Perceptual Theory Loading. Perceptual psychologists, Bruner and Postman, found that subjects who were briefly shown anomalous playing cards, e.g., a black four of hearts, reported having seen their normal counterparts e.g., a red four of hearts. It took repeated exposures to get subjects to say the anomalous cards didn’t look right, and eventually, to describe them correctly. (Kuhn 1962, 63). Kuhn took such studies to indicate that things don’t look the same to observers with different conceptual resources. (For a more up-to-date discussion of theory and conceptual perceptual loading see Lupyan 2015.) If so, black hearts didn’t look like black hearts until repeated exposures somehow allowed subjects to acquire the concept of a black heart. By analogy, Kuhn supposed, when observers working in conflicting paradigms look at the same thing, their conceptual limitations should keep them from having the same visual experiences (Kuhn 1962, 111, 113–114, 115, 120–1). This would mean, for example, that when Priestley and Lavoisier watched the same experiment, Lavioisier should have seen what accorded with his theory that combustion and respiration are oxidation processes, while Priestley’s visual experiences should have agreed with his theory that burning and respiration are processes of phlogiston release. K2 . Semantical Theory Loading: Kuhn argued that theoretical commitments exert a strong influence on observation descriptions, and what they are understood to mean (Kuhn 1962, 127ff, Longino 1979,38-42). If so, proponents of a caloric account of heat won’t describe or understand descriptions of observed results of heat experiments in the same way as investigators who think of heat in terms of mean kinetic energy or radiation. They might all use the same words (e.g., ‘temperature’) to report an observation without understanding them in the same way. K3 . Salience: Kuhn claimed that if Galileo and an Aristotelian physicist had watched the same pendulum experiment, they would not have looked at or attended to the same things. The Aristotelian’s paradigm would have required the experimenter to measure …the weight of the stone, the vertical height to which it had been raised, and the time required for it to achieve rest (Kuhn 1992, 123)

and ignore radius, angular displacement, and time per swing (Kuhn 1962, 124).

These last were salient to Galileo because he treated pendulum swings as constrained circular motions. The Galilean quantities would be of no interest to an Aristotelian who treats the stone as falling under constraint toward the center of the earth (Kuhn 1962, 123). Thus Galileo and the Aristotelian would not have collected the same data. (Absent records of Aristotelian pendulum experiments we can think of this as a thought experiment.)

Taking K1, K2, and K3 in order of plausibility, K3 points to an important fact about scientific practice. Data production (including experimental design and execution) is heavily influenced by investigators’ background assumptions. Sometimes these include theoretical commitments that lead experimentalists to produce non-illuminating or misleading evidence. In other cases they may lead experimentalists to ignore, or even fail to produce useful evidence. For example, in order to obtain data on orgasms in female stumptail macaques, one researcher wired up females to produce radio records of orgasmic muscle contractions, heart rate increases, etc. But as Elisabeth Lloyd reports, “… the researcher … wired up the heart rate of the male macaques as the signal to start recording the female orgasms. When I pointed out that the vast majority of female stumptail orgasms occurred during sex among the females alone, he replied that yes he knew that, but he was only interested in important orgasms” (Lloyd 1993, 142). Although female stumptail orgasms occuring during sex with males are atypical, the experimental design was driven by the assumption that what makes features of female sexuality worth studying is their contribution to reproduction (Lloyd 1993, 139).

Fortunately, such things don’t always happen. When they do, investigators are often able eventually to make corrections, and come to appreciate the significance of data that had not originally been salient to them. Thus paradigms and theoretical commitments actually do influence saliency, but their influence is neither inevitable nor irremediable.

With regard to semantic theory loading (K2), it’s important to bear in mind that observers don’t always use declarative sentences to report observational and experimental results. They often draw, photograph, make audio recordings, etc. instead or set up their experimental devices to generate graphs, pictorial images, tables of numbers, and other non-sentential records. Obviously investigators’ conceptual resources and theoretical biases can exert epistemically significant influences on what they record (or set their equipment to record), which details they include or emphasize, and which forms of representation they choose (Daston and Galison 2007,115–190 309–361). But disagreements about the epistemic import of a graph, picture or other non-sentential bit of data often turn on causal rather than semantical considerations. Anatomists may have to decide whether a dark spot in a micrograph was caused by a staining artifact or by light reflected from an anatomically significant structure. Physicists may wonder whether a blip in a Geiger counter record reflects the causal influence of the radiation they wanted to monitor, or a surge in ambient radiation. Chemists may worry about the purity of samples used to obtain data. Such questions are not, and are not well represented as, semantic questions to which K2 is relevant. Late 20 th century philosophers may have ignored such cases and exaggerated the influence of semantic theory loading because they thought of theory testing in terms of inferential relations between observation and theoretical sentences.

With regard to sentential observation reports, the significance of semantic theory loading is less ubiquitous than one might expect. The interpretation of verbal reports often depends on ideas about causal structure rather than the meanings of signs. Rather than worrying about the meaning of words used to describe their observations, scientists are more likely to wonder whether the observers made up or withheld information, whether one or more details were artifacts of observation conditions, whether the specimens were atypical, and so on.

Kuhnian paradigms are heterogeneous collections of experimental practices, theoretical principles, problems selected for investigation, approaches to their solution, etc. Connections between components are loose enough to allow investigators who disagree profoundly over one or more theoretical claims to agree about how to design, execute, and record the results of their experiments. That is why neuroscientists who disagreed about whether nerve impulses consisted of electrical currents could measure the same electrical quantities, and agree on the linguistic meaning and the accuracy of observation reports including such terms as ‘potential’, ‘resistance’, ‘voltage’ and ‘current’.

The issues this section touches on are distant, linguistic descendents of issues that arose in connection with Locke’s view that mundane and scientific concepts (the empiricists called them ideas) derive their contents from experience (Locke 1700, 104–121,162–164, 404–408).

Looking at a patient with red spots and a fever, an investigator might report having seen the spots, or measles symptoms, or a patient with measles. Watching an unknown liquid dripping into a litmus solution an observer might report seeing a change in color, a liquid with a PH of less than 7, or an acid. The appropriateness of a description of a test outcome depends on how the relevant concepts are operationalized. What justifies an observer to report having observed a case of measles according to one operationalization might require her to say no more than that she had observed measles symptoms, or just red spots according to another.

In keeping with Percy Bridgman’s view that

…in general, we mean by a concept nothing more than a set of operations; the concept is synonymous with the corresponding sets of operations. (Bridgman 1927, 5)

one might suppose that operationalizations are definitions or meaning rules such that it is analytically true, e.g., that every liquid that turns litmus red in a properly conducted test is acidic. But it is more faithful to actual scientific practice to think of operationalizations as defeasible rules for the application of a concept such that both the rules and their applications are subject to revision on the basis of new empirical or theoretical developments. So understood, to operationalize is to adopt verbal and related practices for the purpose of enabling scientists to do their work. Operationalizations are thus sensitive and subject to change on the basis of findings that influence their usefulness (Feest, 2005).

Definitional or not, investigators in different research traditions may be trained to report their observations in conformity with conflicting operationalizations. Thus instead of training observers to describe what they see in a bubble chamber as a whitish streak or a trail, one might train them to say they see a particle track or even a particle. This may reflect what Kuhn meant by suggesting that some observers might be justified or even required to describe themselves as having seen oxygen, transparent and colorless though it is, or atoms, invisible though they are. (Kuhn 1962, 127ff) To the contrary, one might object that what one sees should not be confused with what one is trained to say when one sees it, and therefore that talking about seeing a colorless gas or an invisible particle may be nothing more than a picturesque way of talking about what certain operationalizations entitle observers to say. Strictly speaking, the objection concludes, the term ‘observation report’ should be reserved for descriptionsthat are neutral with respect to conflicting operationalizations.

If observational data are just those utterances that meet Feyerabend’s decidability and agreeability conditions, the import of semantic theory loading depends upon how quickly, and for which sentences reasonably sophisticated language users who stand in different paradigms can non-inferentially reach the same decisions about what to assert or deny. Some would expect enough agreement to secure the objectivity of observational data. Others would not. Still others would try to supply different standards for objectivity.

The example of Pettersson’s and Rutherford’s scintillation screen evidence (above) attests to the fact that observers working in different laboratories sometimes report seeing different things under similar conditions. It’s plausible that their expectations influence their reports. It’s plausible that their expectations are shaped by their training and by their supervisors’ and associates’ theory driven behavior. But as happens in other cases as well, all parties to the dispute agreed to reject Pettersson’s data by appeal to results of mechanical manipulations both laboratories could obtain and interpret in the same way without compromising their theoretical commitments.

Furthermore proponents of incompatible theories often produce impressively similar observational data. Much as they disagreed about the nature of respiration and combustion, Priestley and Lavoisier gave quantitatively similar reports of how long their mice stayed alive and their candles kept burning in closed bell jars. Priestley taught Lavoisier how to obtain what he took to be measurements of the phlogiston content of an unknown gas. A sample of the gas to be tested is run into a graduated tube filled with water and inverted over a water bath. After noting the height of the water remaining in the tube, the observer adds “nitrous air” (we call it nitric oxide) and checks the water level again. Priestley, who thought there was no such thing as oxygen, believed the change in water level indicated how much phlogiston the gas contained. Lavoisier reported observing the same water levels as Priestley even after he abandoned phlogiston theory and became convinced that changes in water level indicated free oxygen content (Conant 1957, 74–109).

The moral of these examples is that although paradigms or theoretical commitments sometimes have an epistemically significant influence on what observers perceive, it can be relatively easy to nullify or correct for their effects.

Typical responses to this question maintain that the acceptability of theoretical claims depends upon whether they are true (approximately true, probable, or significantly more probable than their competitors) or whether they “save” observable phenomena. They then try to explain how observational data argue for or against the possession of one or more of these virtues.

Truth. It’s natural to think that computability, range of application, and other things being equal, true theories are better than false ones, good approximations are better than bad ones, and highly probable theoretical claims are better than less probable ones. One way to decide whether a theory or a theoretical claim is true, close to the truth, or acceptably probable is to derive predictions from it and use observational data to evaluate them. Hypothetico-Deductive (HD) confirmation theorists propose that observational evidence argues for the truth of theories whose deductive consequences it verifies, and against those whose consequences it falsifies (Popper 1959, 32–34). But laws and theoretical generalization seldom if ever entail observational predictions unless they are conjoined with one or more auxiliary hypotheses taken from the theory they belong to. When the prediction turns to be false, HD has trouble explaining which of the conjuncts is to blame. If a theory entails a true prediction, it will continue to do so in conjunction with arbitrarily selected irrelevant claims. HD has trouble explaining why the prediction doesn’t confirm the irrelevancies along with the theory of interest.

Ignoring details, large and small, bootstrapping confirmation theories hold that an observation report confirms a theoretical generalization if an instance of the generalization follows from the observation report conjoined with auxiliary hypotheses from the theory the generalization belongs to. Observation counts against a theoretical claim if the conjunction entails a counter-instance. Here, as with HD, an observation argues for or against a theoretical claim only on the assumption that the auxiliary hypotheses are true (Glymour 1980, 110–175).

Bayesians hold that the evidential bearing of observational evidence on a theoretical claim is to be understood in terms of likelihood or conditional probability. For example, whether observational evidence argues for a theoretical claim might be thought to depend upon whether it is more probable (and if so how much more probable) than its denial conditional on a description of the evidence together with background beliefs, including theoretical commitments. But by Bayes’ theorem, the conditional probability of the claim of interest will depend in part upon that claim’s prior probability. Once again, one’s use of evidence to evaluate a theory depends in part upon one’s theoretical commitments. (Earman 1992, 33–86. Roush 2005, 149–186)

Francis Bacon (Bacon 1620, 70) said that allowing one’s commitment to a theory to determine what one takes to be the epistemic bearing of observational evidence on that very theory is, if anything, even worse than ignoring the evidence altogether. HD, Bootstrap, Bayesian, and related accounts of conformation run the risk of earning Bacon’s disapproval. According to all of them it can be reasonable for adherents of competing theories to disagree about how observational data bear on the same claims. As a matter of historical fact, such disagreements do occur. The moral of this fact depends upon whether and how such disagreements can be resolved. Because some of the components of a theory are logically and more or less probabilistically independent of one another, adherents of competing theories can often can find ways to bring themselves into close enough agreement about auxiliary hypotheses or prior probabilities to draw the same conclusions from the evidence.

Saving observable phenomena. Theories are said to save observable phenomena if they satisfactorily predict, describe, or systematize them. How well a theory performs any of these tasks need not depend upon the truth or accuracy of its basic principles. Thus according to Osiander’s preface to Copernicus’ On the Revolutions, a locus classicus, astronomers ‘…cannot in any way attain to true causes’ of the regularities among observable astronomical events, and must content themselves with saving the phenomena in the sense of using

…whatever suppositions enable …[them] to be computed correctly from the principles of geometry for the future as well as the past…(Osiander 1543, XX)

Theorists are to use those assumptions as calculating tools without committing themselves to their truth. In particular, the assumption that the planets rotate around the sun must be evaluated solely in terms of how useful it is in calculating their observable relative positions to a satisfactory approximation.

Pierre Duhem’s Aim and Structure of Physical Theory articulates a related conception. For Duhem a physical theory

…is a system of mathematical propositions, deduced from a small number of principles, which aim to represent as simply and completely, and exactly as possible, a set of experimental laws. (Duhem 1906, 19)

‘Experimental laws’ are general, mathematical descriptions of observable experimental results. Investigators produce them by performing measuring and other experimental operations and assigning symbols to perceptible results according to pre-established operational definitions (Duhem 1906, 19). For Duhem, the main function of a physical theory is to help us store and retrieve information about observables we would not otherwise be able to keep track of. If that’s what a theory is supposed to accomplish, its main virtue should be intellectual economy. Theorists are to replace reports of individual observations with experimental laws and devise higher level laws (the fewer, the better) from which experimental laws (the more, the better) can be mathematically derived (Duhem 1906, 21ff).

A theory’s experimental laws can be tested for accuracy and comprehensiveness by comparing them to observational data. Let EL be one or more experimental laws that perform acceptably well on such tests. Higher level laws can then be evaluated on the basis of how well they integrate EL into the rest of the theory. Some data that don’t fit integrated experimental laws won’t be interesting enough to worry about. Other data may need to be accommodated by replacing or modifying one or more experimental laws or adding new ones. If the required additions, modifications or replacements deliver experimental laws that are harder to integrate, the data count against the theory. If the required changes are conducive to improved systematization the data count in favor of it. If the required changes make no difference, the data don’t argue for or against the theory.

It is an unwelcome fact for all of these ideas about theory testing that data are typically produced in ways that make it impossible to predict them from the generalizations they are used to test, or to derive instances of those generalizations from data and non ad hoc auxiliary hypotheses. Indeed, it’s unusual for many members of a set of reasonably precise quantitative data to agree with one another, let alone with a quantitative prediction. That is because precise, publicly accessible data typically cannot be produced except through processes whose results reflect the influence of causal factors that are too numerous, too different in kind, and too irregular in behavior for any single theory to account for them. When Bernard Katz recorded electrical activity in nerve fiber preparations, the numerical values of his data were influenced by factors peculiar to the operation of his galvanometers and other pieces of equipment, variations among the positions of the stimulating and recording electrodes that had to be inserted into the nerve, the physiological effects of their insertion, and changes in the condition of the nerve as it deteriorated during the course of the experiment. There were variations in the investigators’ handling of the equipment. Vibrations shook the equipment in response to a variety of irregularly occurring causes ranging from random error sources to the heavy tread of Katz’s teacher, A.V. Hill, walking up and down the stairs outside of the laboratory. That’s a short list. To make matters worse, many of these factors influenced the data as parts of irregularly occurring, transient, and shifting assemblies of causal influences.

With regard to kinds of data that should be of interest to philosophers of physics, consider how many extraneous causes influenced radiation data in solar neutrino detection experiments, or spark chamber photographs produced to detect particle interactions. The effects of systematic and random sources of error are typically such that considerable analysis and interpretation are required to take investigators from data sets to conclusions that can be used to evaluate theoretical claims.

This applies as much to clear cases of perceptual data as to machine produced records. When 19 th and early 20 th century astronomers looked through telescopes and pushed buttons to record the time at which they saw a moon pass a crosshair, the values of their data points depended, not only upon light reflected from the moon, but also upon features of perceptual processes, reaction times, and other psychological factors that varied non-systematically from time to time and observer to observer. No astronomical theory has the resources to take such things into account. Similar considerations apply to the probabilities of specific data points conditional on theoretical principles, and the probabilities of confirming or disconfirming instances of theoretical claims conditional on the values of specific data points.

Instead of testing theoretical claims by direct comparison to raw data, investigators use data to infer facts about phenomena, i.e., events, regularities, processes, etc. whose instances, are uniform and uncomplicated enough to make them susceptible to systematic prediction and explanation (Bogen and Woodward 1988, 317). The fact that lead melts at temperatures at or close to 327.5 C is an example of a phenomenon, as are widespread regularities among electrical quantities involved in the action potential, the periods and orbital paths of the planets, etc. Theories that cannot be expected to predict or explain such things as individual temperature readings can nevertheless be evaluated on the basis of how useful they they are in predicting or explaining phenomena they are used to detect. The same holds for the action potential as opposed to the electrical data from which its features are calculated, and the orbits of the planets in contrast to the data of positional astronomy. It’s reasonable to ask a genetic theory how probable it is (given similar upbringings in similar environments) that the offspring of a schizophrenic parent or parents will develop one or more symptoms the DSM classifies as indicative of schizophrenia. But it would be quite unreasonable to ask it to predict or explain one patient’s numerical score on one trial of a particular diagnostic test, or why a diagnostician wrote a particular entry in her report of an interview with an offspring of a schizophrenic parents (Bogen and Woodward, 1988, 319–326).

The fact that theories are better at predicting and explaining facts about or features of phenomena than data isn’t such a bad thing. For many purposes, theories that predict and explain phenomena would be more illuminating, and more useful for practical purposes than theories (if there were any) that predicted or explained members of a data set. Suppose you could choose between a theory that predicted or explained the way in which neurotransmitter release relates to neuronal spiking (e.g., the fact that on average, transmitters are released roughly once for every 10 spikes) and a theory which explained or predicted the numbers displayed on the relevant experimental equipment in one, or a few single cases. For most purposes, the former theory would be preferable to the latter at the very least because it applies to so many more cases. And similarly for theories that predict or explain something about the probability of schizophrenia conditional on some genetic factor or a theory that predicted or explained the probability of faulty diagnoses of schizophrenia conditional on facts about the psychiatrist’s training. For most purposes, these would be preferable to a theory that predicted specific descriptions in a case history.

In view of all of this, together with the fact that a great many theoretical claims can only be tested directly against facts about phenomena, it behooves epistemologists to think about how data are used to answer questions about phenomena. Lacking space for a detailed discussion, the most this entry can do is to mention two main kinds of things investigators do in order to draw conclusions from data. The first is causal analysis carried out with or without the use of statistical techniques. The second is non-causal statistical analysis.

First, investigators must distinguish features of the data that are indicative of facts about the phenomenon of interest from those which can safely be ignored, and those which must be corrected for. Sometimes background knowledge makes this easy. Under normal circumstances investigators know that their thermometers are sensitive to temperature, and their pressure gauges, to pressure. An astronomer or a chemist who knows what spectrographic equipment does, and what she has applied it to will know what her data indicate. Sometimes it’s less obvious. When Ramon y Cajal looked through his microscope at a thin slice of stained nerve tissue, he had to figure out which if any of the fibers he could see at one focal length connected to or extended from things he could see only at another focal length, or in another slice.

Analogous considerations apply to quantitative data. It was easy for Katz to tell when his equipment was responding more to Hill’s footfalls on the stairs than to the electrical quantities is was set up to measure. It can be harder to tell whether an abrupt jump in the amplitude of a high frequency EEG oscillation was due to a feature of the subjects brain activity or an artifact of extraneous electrical activity in the laboratory or operating room where the measurements were made. The answers to questions about which features of numerical and non-numerical data are indicative of a phenomenon of interest typically depend at least in part on what is known about the causes that conspire to produce the data.

Statistical arguments are often used to deal with questions about the influence of epistemically relevant causal factors. For example, when it is known that similar data can be produced by factors that have nothing to do with the phenomenon of interest, Monte Carlo simulations, regression analyses of sample data, and a variety of other statistical techniques sometimes provide investigators with their best chance of deciding how seriously to take a putatively illuminating feature of their data.

But statistical techniques are also required for purposes other than causal analysis. To calculate the magnitude of a quantity like the melting point of lead from a scatter of numerical data, investigators throw out outliers, calculate the mean and the standard deviation, etc., and establish confidence and significance levels. Regression and other techniques are applied to the results to estimate how far from the mean the magnitude of interest can be expected to fall in the population of interest (e.g., the range of temperatures at which pure samples of lead can be expected to melt).

The fact that little can be learned from data without causal, statistical, and related argumentation has interesting consequences for received ideas about how the use of observational evidence distinguishes science from pseudo science, religion, and other non-scientific cognitive endeavors.First, scientists aren’t the only ones who use observational evidence to support their claims; astrologers and medical quacks use them too. To find epistemically significant differences, one must carefully consider what sorts of data they use, where it comes from, and how it is employed. The virtues of scientific as opposed to non-scientific theory evaluations depend not only on its reliance on empirical data, but also on how the data are produced, analyzed and interpreted to draw conclusions against which theories can be evaluated. Secondly, it doesn’t take many examples to refute the notion that adherence to a single, universally applicable “scientific method” differentiates the sciences from the non-sciences. Data are produced, and used in far too many different ways to treat informatively as instance of any single method. Thirdly, it is usually, if not always, impossible for investigators to draw conclusions to test theories against observational data without explicit or implicit reliance on theoretical principles. This means that when counterparts to Kuhnian questions about theory loading and its epistemic significance arise in connection with the analysis and interpretation of observational evidence, such questions must be answered by appeal to details that vary from case to case.

Grammatical variants of the term ‘observation’ have been applied to impressively different perceptual and non-perceptual process and to records of the results they produce. Their diversity is a reason to doubt whether general philosophical accounts of observation, observables, and observational data can tell epistemologists as much as local accounts grounded in close studies of specific kinds of cases. Furthermore, scientists continue to find ways to produce data that can’t be called observational without stretching the term to the point of vagueness.

It’s plausible that philosophers who value the kind of rigor, precision, and generality to which l logical empiricists and other exact philosophers aspired could do better by examining and developing techniques and results from logic, probability theory, statistics, machine learning, and computer modeling, etc. than by trying to construct highly general theories of observation and its role in science. Logic and the rest seem unable to deliver satisfactory, universally applicable accounts of scientific reasoning. But they have illuminating local applications, some of which can be of use to scientists as well as philosophers.

  • Aristotle(a), Generation of Animals in Complete Works of Aristotle (Volume 1), J. Barnes (ed.), Princeton: Princeton University Press, 1995, pp. 774–993
  • Aristotle(b), History of Animals in Complete Works of Aristotle (Volume 1), J. Barnes (ed.), Princeton: Princeton University Press, 1995, pp. 1111–1228.
  • Azzouni, J., 2004, “Theory, Observation, and Scientific Realism,” British Journal for the Philosophy of Science , 55(3): 371-92.
  • Bacon, Francis, 1620, Novum Organum with other parts of the Great Instauration , P. Urbach and J. Gibson (eds. and trans.), La Salle: Open Court, 1994.
  • Bogen, J., 2016, “Empiricism and After,”in P. Humphreys (ed.), Oxford Handbook of Philosophy of Science , Oxford: Oxford Univesity Press, 779-795.
  • Bogen, J, and Woodward, J., 1988, “Saving the Phenomena,” Philosophical Review , XCVII (3): 303–352.
  • Boyle, R., 1661, The Sceptical Chymist , Montana: Kessinger (reprint of 1661 edition).
  • Bridgman, P., 1927, The Logic of Modern Physics , New York: Macmillan.
  • Chang, H., 2005, “A Case for Old-fashioned Observability, and a Reconstructive Empiricism,” Philosophy of Science , 72(5): 876–887.
  • Collins, H. M., 1985 Changing Order , Chicago: University of Chicago Press.
  • Conant, J.B., 1957, (ed.) “The Overthrow of the Phlogiston Theory: The Chemical Revolution of 1775–1789,” in J.B.Conant and L.K. Nash (eds.), Harvard Studies in Experimental Science , Volume I, Cambridge: Harvard University Press, pp. 65–116).
  • Duhem, P., 1906, The Aim and Structure of Physical Theory , P. Wiener (tr.), Princeton: Princeton University Press, 1991.
  • Earman, J., 1992, Bayes or Bust? , Cambridge: MIT Press.
  • Feest, U., 2005, “Operationism in psychology: what the debate is about, what the debate should be about,” Journal of the History of the Behavioral Sciences , 41(2): 131–149.
  • Feyerabend, P.K., 1959, “An Attempt at a Realistic Interpretation of Expeience,” in P.K. Feyerabend, Realism, Rationalism, and Scientific Method (Philosophical Papers I), Cambridge: Cambridge University Press, 1985, pp. 17–36.
  • Feyerabend, P.K., 1969, “Science Without Experience,” in P.K. Feyerabend, Realism, Rationalism, and Scientific Method (Philosophical Papers I), Cambridge: Cambridge University Press, 1985, pp. 132–136.
  • Franklin, A., 1986, The Neglect of Experiment , Cambridge: Cambridge University Press.
  • Galison, P., 1987, How Experiments End , Chicago: University of Chicago Press.
  • Galison, P., 1990, “Aufbau/Bauhaus: logical positivism and architectural modernism,” Critical Inquiry , 16 (4): 709–753.
  • Galison, P., and Daston, L., 2007, Objectivity , Brooklyn: Zone Books.
  • Glymour, C., 1980, Theory and Evidence , Princeton: Princeton University Press.
  • Hacking, I, 1983, Representing and Intervening , Cambridge: Cambridge University Press.
  • Hanson, N.R., 1958, Patterns of Discovery , Cambridge, Cambridge University Press.
  • Hempel, C.G., 1935, “On the Logical Positivists’ Theory of Truth,” Analysis , 2 (4): 50–59.
  • Hempel, C.G., 1952, “Fundamentals of Concept Formation in Empirical Science,” in Foundations of the Unity of Science , Volume 2, O. Neurath, R. Carnap, C. Morris (eds.), Chicago: University of Chicago Press, 1970, pp. 651–746.
  • Herschel, J. F. W., 1830, Preliminary Discourse on the Study of Natural Philosophy , New York: Johnson Reprint Corp., 1966.
  • Hooke, R., 1705, “The Method of Improving Natural Philosophy,” in R. Waller (ed.), The Posthumous Works of Robert Hooke , London: Frank Cass and Company, 1971.
  • Jeffrey, R.C., 1983, The Logic of Decision , Chicago: University Press.
  • Kuhn, T.S., The Structure of Scientific Revolutions , 1962, Chicago: University of Chicago Press, reprinted,1996.
  • Latour, B., and Woolgar, S., 1979, Laboratory Life, The Construction of Scientific Facts , Princeton: Princeton University Press, 1986.
  • Lewis, C.I., 1950, Analysis of Knowledge and Valuation , La Salle: Open Court.
  • Lloyd, E.A., 1993, “Pre-theoretical Assumptions In Evolutionary Explanations of Female Sexuality,”, Philosophical Studies , 69: 139–153.
  • Lupyan, G., 2015, “Cognitive Penetrability of Perception in the Age of Prediction – Predictive Systems are Penetrable Systems,” Review of Philosophical Psychology , 6(4): 547–569. doi:10.1007/s13164-015-0253-4
  • Longino, H., 1979, “Evidence and Hypothesis: An Analysis of Evidential Relations,” Philosophy of Science , 46(1): 35-56.
  • Morrison, M., 2015, Reconstructing Reality , New York: Oxford University Press.
  • Neurath, O., 1913, “The Lost Wanderers of Descartes and the Auxilliary Motive,” in O. Neurath, Philosophical Papers , Dordrecht: D. Reidel, 1983, pp. 1–12.
  • Olesko, K.M. and Holmes, F.L., 1994, “Experiment, Quantification and Discovery: Helmholtz’s Early Physiological Researches, 1843–50,” in D. Cahan, (ed.), Hermann Helmholtz and the Foundations of Nineteenth Century Science , Berkeley: UC Press, pp. 50–108)
  • Osiander, A., 1543, “To the Reader Concerning the Hypothesis of this Work,” in N. Copernicus On the Revolutions , E. Rosen (tr., ed.), Baltimore: Johns Hopkins University Press, 1978, p. XX.
  • Pearl, J., 2000, Causality , Cambridge: Cambridge University Press.
  • Pinch, T., 1985, “Towards an Analysis of Scientific Observation: The Externality and Evidential Significance of Observation Reports in Physics,” in Social Studies of Science , 15, pp. 3–36.
  • Popper, K.R.,1959, The Logic of Scientific Discovery , K.R. Popper (tr.), New York: Basic Books.
  • Rheinberger, H. J., 1997, Towards a History of Epistemic Things: Synthesizing Proteins in the Test Tube , Stanford: Stanford University Press.
  • Roush, S., 2005, Tracking Truth , Cambridge: Cambridge University Press.
  • Schlick, M., 1935, “Facts and Propositions,” in Philosophy and Analysis , M. Macdonald (ed.), New York: Philosophical Library, 1954, pp. 232–236.
  • Spirtes, C., Glymour, C., and Scheines, R., 2000, Causation, Prediction, and Search , Cambridge: MIT Press.
  • Steuer, R.H., “Artificial Distintegration and the Cambridge-Vienna Controversy,” in P. Achinstein and O. Hannaway (eds.), Observation, Experiment, and Hypothesis in Modern Physical Science , Cambridge: MIT Press, 1985, 239–307)
  • Suppe, F., 1977, in F. Suppe (ed.) The Structure of Scientific Theories , Urbana: University of Illinois Press.
  • Valenstein, E.S., 2005, The War of the Soups and the Sparks , New York: Columbia University Press.
  • Van Fraassen, B.C, 1980, The Scientific Image , Oxford: Clarendon Press.
  • Whewell, W., 1858, Novum Organon Renovatum , Book II, in William Whewell Theory of Scientfic Method , R.E. Butts (ed.), Indianapolis: Hackett Publishing Company, 1989, pp. 103–249.
How to cite this entry . Preview the PDF version of this entry at the Friends of the SEP Society . Look up this entry topic at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entry at PhilPapers , with links to its database.
  • Confirmation , by Franz Huber, in the Internet Encyclopedia of Philosophy .
  • Transcript of Katzmiller v. Dover Area School District (on the teaching of intelligent design).

Bacon, Francis | Bayes’ Theorem | constructive empiricism | Duhem, Pierre | empiricism: logical | epistemology: Bayesian | Lewis, Clarence Irving | Locke, John | -->logical positivism --> | physics: experiment in | science: and pseudo-science

Copyright © 2017 by James Bogen < rtjbog @ comcast . net >

Support SEP

Mirror sites.

View this site from another server:

  • Info about mirror sites

Stanford Center for the Study of Language and Information

The Stanford Encyclopedia of Philosophy is copyright © 2020 by The Metaphysics Research Lab , Center for the Study of Language and Information (CSLI), Stanford University

Library of Congress Catalog Data: ISSN 1095-5054

Our systems are now restored following recent technical disruption, and we’re working hard to catch up on publishing. We apologise for the inconvenience caused. Find out more: https://www.cambridge.org/universitypress/about-us/news-and-blogs/cambridge-university-press-publishing-update-following-technical-disruption

We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings .

Login Alert

  • > Journals
  • > Philosophy of Science
  • > Volume 49 Issue 4
  • > The Concept of Observation in Science and Philosophy

scholar who stressed experiment and observation

Article contents

The concept of observation in science and philosophy.

Published online by Cambridge University Press:  01 April 2022

Through a study of a sophisticated contemporary scientific experiment, it is shown how and why use of the term ‘observation’ in reference to that experiment departs from ordinary and philosophical usages which associate observation epistemically with perception. The role of “background information” is examined, and general conclusions are arrived at regarding the use of descriptive language in and in talking about science. These conclusions bring out the reasoning by which science builds on what it has learned, and, further, how that process of building consists not only in adding to our substantive knowledge, but also in increasing our ability to learn about nature, by extending our ability to observe it in new ways. The argument of this paper is thus a step toward understanding how it is that all our knowledge of nature rests on observation.

Access options

This paper is part of a chapter of a book, of the same title, to be published by Oxford University Press. The paper is a revision of one which has been circulated privately and read on numerous occasions, in various versions, over the past several years. The present version is based on one written in 1981 during a visit at the Institute for Advanced Study, Princeton, N.J., an opportunity for which I am grateful. I also wish to express my thanks to John Bahcall for his help with the technical material in this paper and related work.

Crossref logo

This article has been cited by the following publications. This list is generated based on data provided by Crossref .

  • Google Scholar

View all Google Scholar citations for this article.

Save article to Kindle

To save this article to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle .

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

  • Volume 49, Issue 4
  • Dudley Shapere (a1)
  • DOI: https://doi.org/10.1086/289075

Save article to Dropbox

To save this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Dropbox account. Find out more about saving content to Dropbox .

Save article to Google Drive

To save this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Google Drive account. Find out more about saving content to Google Drive .

Reply to: Submit a response

- No HTML tags allowed - Web page URLs will display as text only - Lines and paragraphs break automatically - Attachments, images or tables are not permitted

Your details

Your email address will be used in order to notify you when your comment has been reviewed by the moderator and in case the author(s) of the article or the moderator need to contact you directly.

You have entered the maximum number of contributors

Conflicting interests.

Please list any fees and grants from, employment by, consultancy for, shared ownership in or any close relationship with, at any time over the preceding 36 months, any organisation whose interests may be affected by the publication of the response. Please also list any non-financial associations or interests (personal, professional, political, institutional, religious or other) that a reasonable reader would want to know about in relation to the submitted work. This pertains to all the authors of the piece, their spouses or partners.

The philosophy of scientific experimentation: a review

  • Hans Radder 1 , 2  

Automated Experimentation volume  1 , Article number:  2 ( 2009 ) Cite this article

24k Accesses

25 Citations

3 Altmetric

Metrics details

Practicing and studying automated experimentation may benefit from philosophical reflection on experimental science in general. This paper reviews the relevant literature and discusses central issues in the philosophy of scientific experimentation. The first two sections present brief accounts of the rise of experimental science and of its philosophical study. The next sections discuss three central issues of scientific experimentation: the scientific and philosophical significance of intervention and production, the relationship between experimental science and technology, and the interactions between experimental and theoretical work. The concluding section identifies three issues for further research: the role of computing and, more specifically, automating, in experimental research, the nature of experimentation in the social and human sciences, and the significance of normative, including ethical, problems in experimental science.

The rise of experimental science

Over the past decades the historical development of experimental science has been studied in detail. One focus has been on the nature and role of experiment during the rise of the natural sciences in the sixteenth and seventeenth centuries. Earlier accounts of this so-called Scientific Revolution emphasized the universalization of the mathematical method or the mechanization of the world-view as the decisive achievement. In contrast, the more recent studies of sixteenth and seventeenth century science stress the great significance of a new experimental practice and a new experimental knowledge. Major figures were Francis Bacon, Galileo Galilei, and Robert Boyle. The story of the controversy of the latter with Thomas Hobbes, during the late 1650s and early 1660s, has become a paradigm of the recent historiography of scientific experimentation [ 1 ]. While Hobbes defended the 'old' axiomatic-deductive style of the geometric tradition, Boyle advocated the more modest acquisition of probable knowledge of experimental 'matters of fact'. Simultaneously at stake in this controversy were the technical details of Boyle's air-pump experiments, the epistemological justification of the experimental knowledge and the social legitimacy of the new experimental style of doing science.

A more wide-ranging account of the role of experimentation in the natural sciences has been proposed by Thomas Kuhn [ 2 ]. He claims that the rise of modern physical science resulted from two simultaneous developments. On the one hand, a radical conceptual and world-view change occurred in what he calls the classical, or mathematical, sciences, such as astronomy, statics and optics. On the other, the novel type of Baconian, or experimental, sciences emerged, dealing with the study of light, heat, magnetism and electricity, among other things. Kuhn argues that it was not before the second half of the nineteenth century that a systematic interaction and merging of the experimental and mathematical traditions took place. An example is the transformation of the Baconian science of heat into an experimental-mathematical thermodynamics during the first half of the nineteenth century. At about the same time, the interactions between (at first, mainly experimental) science and technology increased substantially. Important results of this scientification of technology were chemical dye stuffs and artificial fertilizers.

Starting in the second half of the nineteenth century, extensive experimentation also took root in various other sciences. This happened in medicine, in particular in physiology, somewhat later in psychology, and still later in the social sciences. A characteristic feature of many experiments in those sciences is a strong reliance on statistical methods (see, e.g., [ 3 ]).

The rise of the philosophy of scientific experimentation

Alongside the actual practices of experimentation, a variety of authors--both philosophers and philosophy-minded scientists--have reflected upon the nature and function of scientific experiments. Among the better-known examples are Bacon's and Galileo's advocacy of the experimental method. John Stuart Mill (around the middle of the nineteenth century) and Ernst Mach (late nineteenth-early twentieth century) provided some methodological and epistemological analyses of experimentation. Claude Bernard promoted and analyzed the use of the experimental method in medicine. His Introduction to the Study of Experimental Medicine [ 4 ] influenced a number of twentieth century French writers, including Pierre Duhem, Gaston Bachelard and Georges Canguilhem. While those authors addressed some aspects of experimentation in their accounts of science, a substantial and coherent tradition in the philosophy of scientific experimentation did not yet arise.

Such a tradition did spring up in Germany, in the second half of the twentieth century. Within this German tradition two approaches may be distinguished. One developed Hugo Dingler's pioneering work [ 5 ]. Dingler emphasized the manipulation and intervention character of experimentation, and hence its kinship to technology. One of his aims was to show how the basic theoretical concepts of physics, such as length or mass, could be grounded in concrete experimental actions. During the 1960s and 1970s, this part of Dingler's views was taken up and systematically developed by several other German philosophers, including Paul Lorenzen, Klaus Holzkamp and Peter Janich. More recently, the emphasis on the methodical construction of theoretical concepts in terms of experimental actions has given way to a more culturalistic interpretation of experimental procedures and results [ 6 ].

A second approach within the German tradition took its departure even more directly from the kinship between experiment and technology. The major figure here is the early Jürgen Habermas. In his work from the 1960s, Habermas conceived of (empirical-analytical) science as 'anticipated technology', the crucial link being experimental action [ 7 ]. In the spirit of Karl Marx, Martin Heidegger and Herbert Marcuse, Habermas' aim was not merely to develop a theory of (scientific) knowledge but rather a critique of technocratic reason. More recently, attempts have been made to connect this German tradition to Anglo-Saxon philosophy of experiment [ 8 , 9 ] and to contemporary social studies of science and technology [ 10 ]. Recent work on 'science as technology' by Srđan Lelas [ 11 ] can be characterized as, broadly, inspired by this second branch of the German tradition.

In the English-speaking world, a substantial number of studies of scientific experimentation have been written since the mid-1970s. They resulted from the Kuhnian 'programs in history and philosophy of science'. In their studies of (historical or contemporary) scientific controversies, sociologists of scientific knowledge often focused on experimental work (e.g., [ 12 ]), while so-called laboratory studies addressed the ordinary practices of experimental scientists (e.g., [ 13 ]). An approach that remained more faithful to the history and philosophy of science idea started with Ian Hacking's argument for the relative autonomy of experimentation and his plea for a philosophical study of experiment as a topic in its own right [ 14 ]. It includes work by Allan Franklin, Peter Galison, David Gooding and Hans-Jörg Rheinberger, among many others (see the edited volumes [ 15 , 16 ] and [ 17 ]).

More recently, several philosophers argue that a further step should be taken by combining the results of the historical and sociological study of experiment with more developed theoretical-philosophical analyses [ 18 ]. A mature philosophy of experiment, they claim, should not be limited to summing up its practical features but attempt to provide a systematic analysis of experimental practice and experimental knowledge. The latter is often lacking in the sociological and historical literature on scientific experimentation.

Intervention and production, and their philosophical implications

Looking at the specific features of experiments within the overall practice of science, there is one feature that stands out. In order to perform experiments, whether they are large-scale or small-scale, experimenters have to intervene actively in the material world; moreover, in doing so they produce all kinds of new objects, substances, phenomena and processes. More precisely, experimentation involves the material realization of the experimental system (that is to say, the object(s) of study, the apparatus, and their interaction) as well as an active intervention in the environment of this system. In this respect, experiment contrasts with theory even if theoretical work is always attended with material acts (such as the typing or writing down of a mathematical formula). Hence, a central issue for a philosophy of experiment is the question of the nature of experimental intervention and production, and their philosophical implications. To be sure, at times scientists devise and discuss so-called thought experiments [ 19 ]. However, such 'experiments'--in which the crucial aspect of intervention and production is missing--are better conceived as not being experiments at all but rather as particular types of theoretical argument, which may or may not be materially realizable in experimental practice.

Clearly, not just any kind of intervention in the material world counts as a scientific experiment. Quite generally, one may say that successful experiments require, at least, a certain stability and reproducibility, and meeting this requirement presupposes a measure of control of the experimental system and its environment as well as a measure of discipline of the experimenters and the other people involved in realizing the experiment.

Experimenters employ a variety of strategies for producing stable and reproducible experiments (see, e.g., [ 20 , 21 ] and [ 6 ]). One such strategy is to attempt to realize 'pure cases' of experimental effects. For example, in some early electromagnetic experiments carried out in the 1820s, André Ampère investigated the interaction between an electric current and a freely suspended magnetic needle [ 22 ]. He systematically varied a number of factors of his experimental system and examined whether or not they were relevant, that is to say, whether or not they had a destabilizing impact on the experimental process.

Furthermore, realizing a stable object-apparatus system requires knowledge and control of the (actual and potential) interactions between this system and its environment. Depending on the aim and design of the experiment, specific interactions may be necessary (and hence required), allowed (but irrelevant), or forbidden (because disturbing). Thus, in his experiments on electromagnetism, Ampère anticipated a potential disturbance exerted by the magnetism of the earth. In response, he designed his experiment in such a way that terrestrial magnetism constituted an allowed rather than a forbidden interaction.

A further aspect of experimental stability is implied by the notion of reproducibility [ 9 ]. A successful performance of an experiment by the original experimenter is an achievement that may depend on certain idiosyncratic aspects of a local situation. Yet, a purely local experiment that cannot be carried out in other experimental contexts will, in the end, be unproductive for science. However, since the performance of an experiment is a complex process, no repetition will be strictly identical to the original experiment and many repetitions may be dissimilar in several respects. For this reason, we need to specify what we take or require to be reproducible (for instance, a particular aspect of the experimental process or a certain average over different runs). Furthermore, there is the question of who should be able to reproduce the experiment (for instance, the original experimenter, contemporary scientists, or even any scientist or human being). Investigating these questions leads to different types and ranges of experimental reproducibility, which can be observed to play different roles in experimental practice.

Laboratory experiments in physics, chemistry and molecular biology often allow one to control the objects under investigation to such an extent that the relevant objects in successive experiments may be assumed to be in identical states. Hence, statistical methods are employed primarily to further analyze or process the data (see, for instance, the error-statistical approach by Deborah Mayo [ 23 ]). In contrast, in field biology, medicine, psychology and social science, such a strict experimental control is often not feasible. To compensate for this, statistical methods in these areas are used directly to construct groups of experimental subjects that are presumed to possess identical average characteristics. It is only after such groups have been constructed that one can start the investigation of hypotheses about the research subjects. One can phrase this contrast in a different way by saying that in the former group of sciences statistical considerations mostly bear upon linking experimental data and theoretical hypotheses, while in the latter group it is often the case that statistics already play a role at the stage of producing the actual individual data.

The intervention and production aspect of scientific experimentation carries implications for several philosophical questions. A general lesson, already drawn by Bachelard, appears to be this: the intervention and production character of experimentation entails that the actual objects and phenomena themselves are, at least in part, materially realized through human interference. Hence, it is not just the knowledge of experimental objects and phenomena but also their actual existence and occurrence that prove to be dependent on specific, productive interventions by the experimenters. This fact gives rise to a number of important philosophical issues. If experimental objects and phenomena have to be realized through active human intervention, does it still make sense to speak of a 'natural' nature or does one merely deal with artificially produced laboratory worlds? If one does not want to endorse a fully-fledged constructivism, according to which the experimental objects and phenomena are nothing but artificial, human creations, one needs to develop a more differentiated categorization of reality. In this spirit, various authors (e.g., [ 20 , 9 ]) have argued that an appropriate interpretation of experimental science needs some kind of dispositional concepts, such as powers, potentialities, or tendencies. These human-independent dispositions would then underlie and enable the human construction of particular experimental processes.

A further important question is whether scientists, on the basis of artificial experimental intervention, can acquire knowledge of a human-independent nature. Some philosophers claim that, at least in a number of philosophically significant cases, such 'back inferences' from the artificial laboratory experiments to their natural counterparts can be justified. Another approach accepts the constructed nature of much experimental science, but stresses the fact that its results acquire a certain endurance and autonomy with respect to both the context in which they have been realized in the first place and later developments. In this vein, Davis Baird [ 24 ] offers an account of 'objective thing knowledge', the knowledge encapsulated in material things, such as Watson and Crick's material double helix model or the Indicator of Watt and Southern's steam engine.

Another relevant feature of experimental science is the distinction between the working of an apparatus and its theoretical accounts. In actual practice it is often the case that experimental devices work well, even if scientists disagree on how they work. This fact supports the claim that variety and variability at the theoretical level may well go together with a considerable stability at the level of the material realization of experiments. This claim can then be exploited for philosophical purposes, for example to vindicate entity realism [ 14 ] or referential realism [ 8 ].

The relationship between (experimental) science and technology

Traditionally, philosophers of science have defined the aim of science as, roughly, the generation of reliable knowledge of the world. Moreover, as a consequence of explicit or implicit empiricist influences, there has been a strong tendency to take the production of experimental knowledge for granted and to focus on theoretical knowledge. However, if one takes a more empirical look at the sciences, both at their historical development and at their current condition, this approach must be qualified as one-sided. After all, from Archimedes' lever-and-pulley systems to the cloned sheep Dolly, the development of (experimental) science has been intricately interwoven with the development of technology ([ 25 , 26 ]). Experiments make essential use of (often specifically designed) technological devices, and, conversely, experimental research often contributes to technological innovations. Moreover, there are substantial conceptual similarities between the realization of experimental and that of technological processes, most significantly the implied possibility and necessity of the manipulation and control of nature. Taken together, these facts justify the claim that the science-technology relationship ought to be a central topic for the study of scientific experimentation.

One obvious way to study the role of technology in science is to focus on the instruments and equipment employed in experimental practice. Many studies have shown that the investigation of scientific instruments is a rich source of insights for a philosophy of scientific experimentation (see, e.g. [ 15 , 17 , 18 ] and [ 27 ]). One may, for example, focus on the role of visual images in experimental design and explore the wider problem of the relationship between thought and vision. Or one may investigate the problem of how the cognitive function of an intended experiment can be materially realized, and what this implies for the relationship between technological functions and material structures. Or one may study the modes of representation of instrumentally mediated experimental outcomes and discuss the question of the epistemic or social appraisal of qualitative versus quantitative results.

In addition to such studies, several authors have proposed classifications of scientific instruments or apparatus. One suggested distinction is that between instruments that represent a property by measuring its value (e.g., a device that registers blood pressure), instruments that create phenomena that do not exist in nature (e.g., a laser), and instruments that closely imitate natural processes in the laboratory (e.g., an Atwood machine, which mimics processes and properties of falling objects).

Such classifications form an excellent starting point for investigating further philosophical questions on the nature and function of scientific instrumentation. They demonstrate, for example, the inadequacy of the empiricist view of instruments as mere enhancers of human sensory capacities. Yet, an exclusive focus on the instruments as such may tend to ignore two things. First, an experimental setup often includes various 'devices', such as a concrete wall to shield off dangerous radiation, a support to hold a thermometer, a spoon to stir a liquid, curtains to darken a room, and so on. Such devices are usually not called instruments, but they are equally crucial to a successful performance and interpretation of the experiment and hence should be taken into account. Second, a strong emphasis on instruments may lead to a neglect of the environment of the experimental system, especially of the requirement to control the interactions between the experimental system and its environment. Thus, a comprehensive view of scientific experimentation needs to go beyond an analysis of the instrument as such by taking full account of the specific setting in which this instrument needs to function.

Finally, there is the issue of the general philosophical significance of the experiment-technology relationship. Some of the philosophers who emphasize the importance of technology for science endorse a 'science-as-technology' account. That is to say, they advocate an overall interpretation in which the nature of science--not just experimental but also theoretical science--is seen as basically or primarily technological (see for instance, [ 5 , 7 ] and [ 11 ]). Other authors, however, take a less radical view by criticizing the implied reduction of science to technology and by arguing for the sui generis character of theoretical-conceptual and formal-mathematical work. Thus, while stressing the significance of the technological--or perhaps, more precisely, the intervention and production dimension of science--these views nevertheless see this dimension as complementary to a theoretical dimension (see, e.g., [ 8 , 24 ] and [ 28 ]).

The role of theory in experimentation

This brings us to a further central theme in the study of scientific experimentation, namely the relationship between experiment and theory. The theme can be approached in two ways. One approach addresses the question of how theories or theoretical knowledge may arise from experimental practices. Thus, Franklin [ 21 ] has provided detailed descriptions and analyses of experimental confirmations and refutations of theories in twentieth century physics. Giora Hon [ 28 ] has put forward a classification of experimental error, and has argued that the notion of error may be exploited to elucidate the transition from the material, experimental processes to propositional, theoretical knowledge (see also [ 29 ]).

A second approach to the experiment-theory relationship examines the question of the role of existing theories, or theoretical knowledge, within experimental practices. Over the last 25 years, this question has been debated in detail. Are experiments, factually or logically, dependent on prior theories, and if so, in which respects and to what extent? The remainder of this section reviews some of the debates on this question.

The strongest version of the claim that experimentation is theory dependent says that all experiments are planned, designed, performed, and used from the perspective of one or more theories about the objects under investigation. In this spirit, Justus von Liebig and Karl Popper, among others, advocated the view that all experiments are explicit tests of existing theories. This view completely subordinates experimental research to theoretical inquiry. However, on the basis of many studies of experimentation published during the last 25 years, it can be safely concluded that this claim is false. For one thing, quite frequently the aim of experiments is just to realize a stable phenomenon or a working device. Yet, the fact that experimentation involves much more than theory testing does not, of course, mean that testing a theory may not be an important goal in particular scientific settings.

At the other extreme, there is the claim that, basically, experimentation is theory-free. The older German school of 'methodical constructivism' (see [ 6 ]) came close to this position. A somewhat more moderate view is that, in important cases, theory-free experiments are possible and do occur in scientific practice. This view admits that performing such 'exploratory' experiments does require some ideas about nature and apparatus, but not a well-developed theory about the phenomena under scrutiny. Ian Hacking [ 14 ] and Friedrich Steinle [ 22 ] make this claim primarily on the basis of case studies from the history of experimental science. Michael Heidelberger [ 30 ] aims at a more systematic underpinning of this view. He distinguishes between theory-laden and causally-based instruments and claims that experiments employing the latter type of instruments are basically theory-free.

Another view admits that not all concrete activities that can be observed in scientific practice are guided by theories. Yet, according to this view, if certain activities are to count as a genuine experiment, they require a theoretical interpretation (see [ 8 , 9 , 28 ] and [ 31 ]). More specifically, performing and understanding an experiment depends on a theoretical interpretation of what happens in materially realizing the experimental process. In general, quite different kinds of theory may be involved, such as general background theories, theories or theoretical models of the (material, mathematical, or computational) instruments, and theories or theoretical models of the phenomena under investigation.

One argument for such claims derives from the fact that an experiment aims to realize a reproducible correlation between an observable feature of the apparatus and a feature of the object under investigation. The point is that materially realizing this correlation and knowing what can be learned about the object from inspecting the apparatus depends on theoretical insights about the experimental system and its environment. Thus, these insights pertain to those aspects of the experiment that are relevant to obtaining a reproducible correlation. It is not necessary, and in practice it will usually not be the case, that the theoretical interpretation offers a full understanding of any detail of the experimental process.

A further argument for the significance of theory in experimentation notes that a single experimental run is not enough to establish a stable result. A set of different runs, however, will almost always produce values that are, more or less, variable. The questions then are: What does this fact tell us about the nature of the property that has been measured? Does the property vary within the fixed interval? Is it a probabilistic property? Or is its real value constant and are the variations due to random fluctuations? In experimental practice, answers to such questions are based on an antecedent theoretical interpretation of the nature of the property that has been measured.

Regarding these claims, it is important to note that, in actual practice, the theoretical interpretation of an experiment will not always be explicit and the experimenters will not always be aware of its use and significance. Once the performance of a particular experiment or experimental procedure becomes routine, the theoretical assumptions drop out of sight: they become like an (invisible) 'window to the world'. Yet, in a context of learning to perform and understand the experiment or in a situation where its result is very consequential or controversial, the implicit interpretation will be made explicit and subjected to empirical and theoretical scrutiny. This means that the primary locus of the theoretical interpretation is the relevant scientific community and not the individual experimenter.

In conclusion: further issues in scientific experimentation

As we have seen, the systematic philosophical study of scientific experimentation is a relatively recent phenomenon. Hence, there are a number of further issues that have received some attention but merit a much more detailed account. In concluding this review paper, three such issues will be briefly discussed.

First, recent scientific practice shows an ever-increasing use of 'computer experiments'. These involve various sorts of hybrids of material intervention, computer simulation, and theoretical and mathematical modeling techniques (see [ 32 ]). Often, more traditional experimental approaches are challenged and replaced by approaches resting fully or primarily on computer simulations (sometimes this replacement is based on budgetary considerations only). More generally, there is a large variety of uses of computer science and technology in performing, analyzing and interpreting experiments and in visualizing, storing and disseminating their results. Automated experimentation constitutes a significant part of these developments.

These new developments raise important questions for the scholarly study of scientific experimentation. First, although some pioneering work has been done (see, for instance, [ 33 ] about the role of databases, and, more generally, bioinformatics in research in the life sciences), we need many more empirical studies that chart this new terrain. Furthermore, new methodological questions arise about how to do this automated experimentation in innovative, yet plausible, ways. As the history of Artificial Intelligence teaches us, expectations about automation can sometimes be overenthusiastic and unfounded ([ 34 , 35 ]). For this reason, a critical assessment of what can, and what cannot, be achieved through automation is particularly important (for the cases of formal symbol manipulation and neural network approaches to AI, see [ 36 ], chaps. 5 and 12). Related to this is the epistemological question of the justifiability of the results of the new approaches. Should experiments always involve a substantial material component or are simulated experiments equally reliable and useful (see [ 37 ])? Finally, computer experiments are regularly applied to complex and large-scale systems, for instance in climate science. Often, in such contexts, scientific and policy problems are intimately connected. This connection also constitutes an important topic for the study of scientific experimentation (see, e.g., [ 38 ]).

A second issue that merits more attention is the nature and role of experimentation in the social and human sciences, such as economics, sociology, medicine, and psychology. Practitioners of those sciences often label substantial, or even large, parts of their activities as 'experimental'. So far, this fact is not reflected in the philosophical literature on experimentation, which has primarily focused on the natural sciences. Thus, a challenge for future research is to connect the primarily methodological literature on experimenting in economics, sociology, medicine, and psychology with the philosophy of science literature on experimentation in natural science (see, e.g., [ 39 ] and [ 40 ]).

One subject that will naturally arise in philosophical reflection upon the similarities and dissimilarities of natural and social or human sciences is this: In experiments on human beings, the experimental subjects will often have their own interpretation of what is going on in these trials, and this interpretation may influence their responses over and above the behavior intended by the experimenters. As a methodological problem (of how to avoid 'biased' responses) this is of course well known to practitioners of the human and social sciences. However, from a broader philosophical or socio-cultural perspective the problem is not necessarily one of bias. It may also reflect a clash between a scientific and a common-sense interpretation of human beings. In case of such a clash, social and ethical issues are at stake, since the basic question is who is entitled to define the nature of human beings: the scientists or the people themselves? The methodological, ethical, and social issues springing from this question will continue to be a significant theme for the study of experimentation in the human and social sciences.

This brings us to a last issue. The older German tradition explicitly addressed wider normative questions surrounding experimental science and technology. The views of Habermas, for example, have had a big impact on broader conceptualizations of the position of science and technology in society. Thus far, the more recent Anglophone approaches within the philosophy of scientific experimentation have primarily dealt with more narrowly circumscribed scholarly topics. In so far as normative questions have been taken into account, they have been mostly limited to epistemic normativity, for instance to questions of the proper functioning of instruments or the justification of experimental evidence. Questions regarding the connections between epistemic and social or ethical normativity are hardly addressed.

Yet, posing such questions is not far-fetched. For instance, those experiments that use animals or humans as experimental subjects are confronted with a variety of normative issues, often in the form of a tension between methodological and ethical requirements [ 41 ]. Other normatively relevant questions relate to the issue of the artificial and the natural in experimental science and science-based technology. Consider, for example, the question of whether experimentally isolated genes are natural or artificial entities. This question is often discussed in environmental philosophy, and different answers to it entail a different environmental ethics and politics. More specifically, the issue of the contrast between the artificial and the natural is crucial to debates about patenting, in particular the patenting of genes and other parts of organisms. The reason is that discoveries of natural phenomena are not patentable while inventions of artificial phenomena are [ 42 ].

Although philosophers of experiment cannot be expected to solve all of those broader social and normative problems, they may be legitimately asked to contribute to the debate on possible approaches and solutions. In this respect, the philosophy of scientific experimentation could profit from its kinship to the philosophy of technology, which has always shown a keen sensitivity to the interconnectedness between technological and social or normative issues.

Shapin S, Schaffer S: Leviathan and the Air-Pump: Hobbes, Boyle, and the Experimental Life. 1985, Princeton: Princeton University Press

Google Scholar  

Kuhn TS: Mathematical versus experimental traditions in the development of physical science. The Essential Tension. 1977, Chicago: University of Chicago Press, 31-65.

Campbell DT, Stanley JC: Experimental and Quasi-Experimental Designs for Research. 1963, Chicago: Rand McNally College Publishing Company

Bernard C: An Introduction to the Study of Experimental Medicine. 1957, New York: Dover Publications

Dingler H: Das Experiment. Sein Wesen und seine Geschichte. 1928, München: Verlag Ernst Reinhardt

Janich P: Konstruktivismus und Naturerkenntnis. 1996, Frankfurt am Main: Suhrkamp

Habermas J: Knowledge and Human Interests. 1978, London: Heinemann, 2

Radder H: The Material Realization of Science. 1988, Assen: Van Gorcum

Radder H: In and about the World. 1996, Albany: State University of New York Press

Feenberg A: Questioning Technology. 1999, London: Routledge

Lelas S: Science and Modernity: Toward an Integral Theory of Science. 2000, Dordrecht: Kluwer

Book   Google Scholar  

Collins HM: Changing Order: Replication and Induction in Scientific Practice. 1985, London: Sage

Latour B, Woolgar S: Laboratory Life: The Social Construction of Scientific Facts. 1979, London: Sage

Hacking I: Representing and Intervening. 1983, Cambridge: Cambridge University Press

Gooding D, Pinch T, Schaffer S, Eds: The Uses of Experiment. 1989, Cambridge: Cambridge University Press

Buchwald JZ, Ed: Scientific Practice: Theories and Stories of Doing Physics. 1995, Chicago: University of Chicago Press

Heidelberger M, Steinle F, Eds: Experimental Essays--Versuche zum Experiment. 1998, Baden-Baden: Nomos Verlagsgesellschaft

Radder H, Ed: The Philosophy of Scientific Experimentation. 2003, Pittsburgh: University of Pittsburgh Press

Brown JR: The Laboratory of the Mind: Thought Experiments in the Natural Sciences. 1991, London: Routledge

Bhaskar R: A Realist Theory of Science. 1978, Hassocks: Harvester Press

Franklin A: The Neglect of Experiment. 1986, Cambridge: Cambridge University Press

Steinle F: Exploratives vs. theoriebestimmtes Experimentieren: Ampères erste Arbeiten zum Elektromagnetismus. In [17];. 1998, 272-297.

Mayo DG: Error and the Growth of Experimental Knowledge. 1996, Chicago: University of Chicago Press

Baird D: Thing Knowledge: A Philosophy of Scientific Instruments. 2004, Berkeley: University of California Press

Tiles M, Oberdiek H: Living in a Technological Culture. 1995, London: Routledge

Radder H: Science, technology and the science-technology relationship. Philosophy of Technology and Engineering Sciences. Edited by: Meijers AWM. 2009, Amsterdam: Elsevier, 65-91.

Chapter   Google Scholar  

Rothbart D: Philosophical Instruments. Minds and Tools at Work. 2007, Urbana: University of Illinois Press

Hon G: The idols of experiment: transcending the 'etc. list'. In [18];. 2003, 174-197.

Hon G, Schickore J, Steinle F, Eds: Going Amiss in Experimental Research. 2009, New York: Springer

Heidelberger M: Theory-ladenness and scientific instruments in experimentation. In [18];. 2003, 138-151.

Morrison M: Theory, intervention and realism. Synthese. 1990, 82: 1-22. 10.1007/BF00413667.

Article   Google Scholar  

Lenhard J, Küppers G, Shinn T, Eds: Simulation: Pragmatic Constructions of Reality. 2007, Dordrecht: Springer

Leonelli S: Packaging data for re-use: databases in model organism biology. How Well Do 'Facts' Travel?. Edited by: Howlett P, Morgan MS. Cambridge: Cambridge University Press,

Dreyfus HL: What Computers Still Can't Do. 1992, Cambridge, MA: MIT Press

Collins HM: Artificial Experts: Social Knowledge and Intelligent Machines. 1990, Cambridge, MA: MIT Press

Radder H: The World Observed/The World Conceived. 2006, Pittsburgh: University of Pittsburgh Press

Morgan MS: Experiments without material intervention. Model experiments, virtual experiments, and virtually experiments. In [18];. 2003, 216-235.

Petersen AC: Simulating Nature. A Philosophical Study of Computer-Simulation Uncertainties and their Role in Climate Science and Policy Advice. 2006, Apeldoorn: Het Spinhuis

Winston AS, Blais DJ: What counts as an experiment?: A transdisciplinary analysis of textbooks, 1930-1970. American Journal of Psychology. 1996, 109: 599-616. 10.2307/1423397.

Guala F: The Methodology of Experimental Economics. 2005, Cambridge: Cambridge University Press

Resnik DB: The Ethics of Science. 1998, London: Routledge

Sterckx S, Ed: Biotechnology, Patents and Morality. 2000, Aldershot: Ashgate, 2

Download references

Acknowledgements

This article draws on material from an earlier publication. Copyright ( © 2006) From The Philosophy of Science. An Encyclopedia , edited by Sahotra Sarkar and Jessica Pfeifer. Reproduced by permission of Taylor and Francis Group, LLC, a division of Informa plc.

Author information

Authors and affiliations.

Faculty of Philosophy, VU University Amsterdam, the Netherlands

Hans Radder

De Boelelaan 1105, 1081 HV, Amsterdam, the Netherlands

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Hans Radder .

Additional information

Competing interests.

The author declares that he has no competing interests.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article.

Radder, H. The philosophy of scientific experimentation: a review. Autom Exp 1 , 2 (2009). https://doi.org/10.1186/1759-4499-1-2

Download citation

Received : 06 May 2009

Accepted : 29 October 2009

Published : 29 October 2009

DOI : https://doi.org/10.1186/1759-4499-1-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Scientific Experimentation
  • Experimental Knowledge
  • Theoretical Interpretation
  • Experimental Practice
  • Philosophical Significance

Automated Experimentation

ISSN: 1759-4499

scholar who stressed experiment and observation

Who stressed the use of experiments and observation in seeking knowledge?

User Avatar

Francis Bacon.

Both Bacon and Descartes rejected Aristotle's scientific assumptions. They also challenged the scholarly traditions of the medieval universities that sought to make the physical world fit in with the teachings of the Church. Both argued that truth it not known at the beginning of inquiry but at the end, after a long process of investigation.

Bacon and Descartes differed in their methods, however. Bacon stressed experimentation and observation. He wanted science to make life better for people by leading to practical technologies. Descartes emphasized human reasoning as the best road to understanding. In his Discourse on Method , he explains how he decided to discard all traditional authorities and search for provable knowledge. Left only with doubt, he concluded that the doubter had to exist and made his famous statement, "I think, therefore I am."

Becon and Descarts

Add your answer:

imp

What are the Various methods of training needs identification?

METHODS OF TRAINING NEEDS IDENTIFICATION In addition to -PERFORMANACE APPRAISAL -BEI there are other methods like -ONE TO ONE INTERVIEWS [ in person / by telephone] [ one to one information gathering] -FOCUS GROUPS [meetings of individuals who share an interest in the subject exchange ] -USING QUESTIONNAIRE [ information gathering on paper] -DOCUMENT ANALYSIS [reviewing the existing documents/ analysing] -OBSERVATION [observing / reviewing people performing on the job] -SEEKING INPUTS FROM LINE MANAGERS [ assessment of line managers of their staff] -ORGANIZATIONWIDE SKILL AUDIT [ auditing operation process ] -CUSTOMER SATISFACTION SURVEY [ source of inputs ] -ORGANIZATIONAL METHODS [ changes/ impact on the organization] -JOB ANALYSES METHODS [changes / impact on the individual jobs] -INDUSTRY ANALYSES METHODS [ changes in industry characterisitics / impact on the organization] -BEHAVIORAL ANALYSES [data collection by observation ] -CRITICAL INCIDENTS [ reports /descriptions of things ] -HUMAN ANLYSES METHODS [paper pencil dianostic tests of knowledge/opinions etc] -ADVISORY COMMITTEE METHODS [ seeking advice of departmental heads ]

The word centripetal means center?

center-seeking

What is the purpose of using a control in scientific experiments?

The experimental control provides a base-line result or set of results, from which you can compare the variables' effects against. It's designed to minimize the effects of variables (other than the single independent variable). Control groups are often included in medical or psychological experiments so that the results of an experiment are considered reliable and trustworthy.Example:A mystery-drug cure is being tested. One group of patients is given the drug and the other group is not. The group without the drug will be subject to all the same conditions that the other group are under, thereby seeking to eliminate any unforeseen effecting environmental factors. This makes it possible to compare, and therefore measure, the impact any drug would have.

What will happen if your baby drank mold from an electric kettle?

I suggest seeking medical advice.

Requesting objects rejecting interactions sharing ideas and seeking social interaction are examples of?

Communicative Functions

imp

Top Categories

Answers Logo

WH Ch 14, S 5 Review

Scholar who stressed experiment and observation

Get better grades with Learn

82% of students achieve A’s after using Learn

Ways of the World: A Global History 3rd Edition by Robert W. Strayer

Ways of the World: A Global History

America's History for the AP Course 9th Edition by Eric Hinderaker, James A. Henretta, Rebecca Edwards, Robert O. Self

America's History for the AP Course

Galen's ancient works about ___ were incorrect in many ways. Choose matching term 1 gravity 2 human anatomy 3 Francis Bacon 4 scientific method Don't know?

Image: Francis Bacon

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

The PMC website is updating on October 15, 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of nutrients

The Relative Merits of Observational and Experimental Research: Four Key Principles for Optimising Observational Research Designs

Associated data.

Not applicable.

The main barrier to the publication of observational research is a perceived inferiority to randomised designs with regard to the reliability of their conclusions. This commentary addresses this issue and makes a set of recommendations. It analyses the issue of research reliability in detail and fully describes the three sources of research unreliability (certainty, risk and uncertainty). Two of these (certainty and uncertainty) are not adequately addressed in most research texts. It establishes that randomised designs are vulnerable as observation studies to these two sources of unreliability, and are therefore not automatically superior to observational research in all research situations. Two key principles for reducing research unreliability are taken from R.A. Fisher’s early work on agricultural research. These principles and their application are described in detail. The principles are then developed into four key principles that observational researchers should follow when they are designing observational research exercises in nutrition. It notes that there is an optimal sample size for any particular research exercise that should not be exceeded. It concludes that best practice in observational research is to replicate this optimal sized observational exercise multiple times in order to establish reliability and credibility.

1. Introduction

‘Does A cause B’? is one of the most common questions that is asked within nutrition research. Usually ‘A’ is a dietary pattern, and ‘B’ is a health, development or morbidity outcome [ 1 ]. In agricultural nutrition, the standard approach to such questions is to use a randomised experimental design [ 2 ]. These research tools were in fact developed within agricultural science in the Nineteen Twenties for exactly this purpose [ 3 ]. It remains extremely difficult to publish agricultural research that makes causality inferences without using such a design [ 4 ]. Other scientific disciplines have enthusiastically borrowed these experimental tools from agricultural science [ 5 ].

However, in human research, ethical or practical issues often make it impossible to use a randomised design to address such ‘does A cause B’ type questions [ 6 ]. As scientific and social imperatives require that these research questions still have to be addressed somehow, a variety of alternative approaches have been developed that are broadly grouped under the description of ‘observational research’ [ 7 ] (Observational research is confusingly defined in two ways within human research. In business research and some branches of psychology, observational research is defined as research where human behaviour is observed in a non-intrusive manner (e.g., watching shopper behaviour in a supermarket or eye tracking) as opposed to an intrusive approach such as a questionnaire [ 8 ]. In disciplines such as medicine and nutrition ‘observational research’ is defined as research in which the subjects’ allocation to a treatment condition is not randomised, and may not be under the control of the researcher [ 9 ]. In every other respect an observational study may follow recognised experimental procedures—the lack of randomisation is the key point of difference. This article addresses the second, medical/nutrition, form of observational research). Despite the absolute requirement to use these techniques in research environments which make randomisation a practical impossibility, researchers in human nutrition face the problem that observational approaches are often considered to be inferior to the ‘gold’ standard’ randomised experimental techniques [ 10 , 11 ]. The situation is aggravated by the association of observational research with the rather unfortunately termed ‘retrospective convenience sampling’ [ 12 ].

This negative assessment of observational research continues to dominate, despite reviews of the relevant literature that have indicated that research based upon observational and randomised controlled experiments have a comparable level of reliability/consistency of outcome [ 13 , 14 , 15 ].

This lack of clear cut advantage for randomisation in these reviews may well be due to the fact any ‘randomised’ sample where less than 100% of those selected to participate actually do participate is not randomised, as the willingness to participate may be linked to the phenomena being studied which can create a non-participation bias [ 16 ]. It is a fact that in any society that is not a fully totalitarian state 100% participation of a randomly selected sample is very rarely achievable [ 17 ]. In practice, participation rates in ‘random’ nutrition research samples may be well under 80%, but the use of such samples continues to be supported [ 18 , 19 ].

This credibility gap between randomised and observational studies is both a problem and potentially a barrier to the production and publication of otherwise useful observational research. It is summed up well by Gershon [ 15 ]:

“Despite the potential for observational studies to yield important information, clinicians tend to be reluctant to apply the results of observational studies into clinical practice. Methods of observational studies tend to be difficult to understand, and there is a common misconception that the validity of a study is determined entirely by the choice of study design.” [ 15 ] (p. 860)

Closing up this credibility gap is thus a priority for observational researchers in a competitive publication arena where their research may be disadvantaged if their approach has a perceived lack of credibility. The gap may be closed by progress in two directions—(1) by increasing the relative credibility of observational research, and (2) by reducing the relative credibility of experimental research when applied to equivalent questions in equivalent situations.

The former approach is well summarised in the book by Rosenbaum [ 20 ] and many of the (9000+) published research articles that cite this work. The latter approach may appear at first to be both negative and destructive. It is nevertheless justified if randomised experimental techniques are perceived to have specific powers that they simply do not possess when applied to human nutritional research.

This commentary article adopts both approaches in order to assist those who seek to publish observational research studies, but not via statistics. It explains why the randomisation process does not confer experimental infallibility, but only an advantage that applies in certain situations. It demonstrates that via an over-focus on statistical risk it is perfectly possible to create a seemingly ‘low risk’ randomised experiment that is actually extremely unreliable with regard to its outcomes.

It concludes that consequently it is perfectly possible that a well-designed observational experimental design will comfortably outperform a poorly designed randomised experimental design with regard to an equivalent research objective. It concludes with a set of principles for researchers who are designing observational studies that will enable them to increase the actual and perceived reliability and value of their research.

2. Certainty, Risk and Uncertainty in Experimental and Observational Research

On 2 February 2002 in a press briefing, the then US Secretary of Defence, Donald Rumsfeldt, made the following statement:

“… as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don’t know we don’t know … it is the latter category that tends to be the difficult ones.” [ 21 ] (p. 1)

While it has often been parodied, e.g., Seely [ 22 ], this statement efficiently sums up the situation faced by all researchers when they are setting up a research exercise. Any researcher will be dealing with three specific groups of knowledge when they are in this situation, which can be summarised for this purpose as below ( Table 1 ). It is critical that researchers fully understand these three groups and how they relate to each other within a human research environment.

The division of knowledge in research design (after Rumsfeldt).

Knowledge GroupDescriptionDefinition
1. What we know
“…
The information available via earlier research and observation. Not actually a certainty ( = 0), but routinely treated as such.Certainty
2. What we know we don’t know
“…
The target relationship(s) of the research, and potentially a small number of other relationships and interactions. Usually quantified and described in the reporting process via a value ( < )Risk
3. What we don’t know we don’t know
All other relationships and interactions within the proposed dataset, including interactions of these unknown variables with the variables in Group 2 above. These cannot be specifically described or quantified. Additionally their potential impact is not usually discussed in any depth, or not at all, at any stage in the research design or reporting process.Uncertainty

2.1. What We Know (Group 1—Certainty)

While it is often treated as a certainty, Group 1 information is not actually so. Previous research results that may be used as Group 1 information are reported either qualitatively with no measure of the probability of it being right, or quantitively, via a statistically derived ‘ p ’ value (the chance of it being incorrect), which is always greater than zero [ 23 ] (The author is aware that the definition and use of p values is dispute, e.g., Sorkin et al. [ 24 ], and that a liberty is taken by describing and applying them to the discussion in this rather general manner, but the issue is too complex to be addressed here). Assuming that p = 0 for this pre-existing information does not usually cause serious issues with the design and outcomes of causal research as long as p is small enough, but this is not always so. Structural Equation Modelling (SEM) is one widely used instance where it can give rise to significant validity issues in research reporting [ 25 ]. The quote below is from an article specifically written to defend the validity of SEM as a tool of casual research:

“As we explained in the last section, researchers do not derive causal relations from an SEM. Rather, the SEM represents and relies upon the causal assumptions of the researcher. These assumptions derive from the research design, prior studies, scientific knowledge, logical arguments, temporal priorities, and other evidence that the researcher can marshal in support of them. The credibility of the SEM depends on the credibility of the causal assumptions in each application.” [ 26 ] (p. 309)

Thus, an SEM model relies upon a covariance matrix dataset, which contains no causal information whatsoever, which is then combined with the ‘credible’ causal assumptions of the researcher—normally made ‘credible’ and supported by cited results from prior research. Bolen and Pearl acknowledge this credibility generation later on the same page of their article. When they put an assumption-based arrow on a covariance-based relationship in an SEM model, the researcher that is constructing it is assuming that p = 0 for that relationship. In fact, p is never zero, and is never reported as such by prior primary research. It may be a very low number, but even if it is a very low number, the accumulated risk of the entire model being wrong can become significant if the SEM model is large and many such assumptions are made within it.

In a recent article in ‘Nutrients’ [ 27 ] (Figure 6, p. 18) present an SEM with 78 unidirectional arrows. Leaving all other matters aside, what is the chance of this model being ‘right’ with regard to just the causal direction of all these 78 arrows? If one sanguinely assumes a p value of 0.01 for all 78 individual causal assumptions, and a similar level for p in the research itself, the probability of this model being ‘right’ can be calculated as 0.99 79 = 4.5%. This very low level of probability is not a marginal outcome, and it is based upon a universally accepted probability calculation [ 28 ] and an authoritative account in support of SEM describing how SEM uses information with a high p value to establish causality [ 26 ]. It becomes even more alarming when one considers that once published, such research can then be used as a ‘credible’ secondary causal assumption input to further related SEM based primary research with its reliability/validity as Group 1 information ‘prior research’ readjusted up from 4% to 100%.

The conclusion is that ‘certainty’ in research is never actually so, and that consequently the more ‘certainty’ that a researcher includes in their theoretical development, the less ‘certain’ the platform from which they will launch their own research becomes. This is not an issue that is restricted to SEM based research—SEM just makes the process and its consequences manifest. The conclusion is that theoretical simplicity closely equates to theoretical and research reliability.

2.2. What We Know We Don’t Know (Group 2—Risk)

Identifying and acquiring specific information that we know we do not know is the basis of any contribution made by either experimental or observational causal research. These Group 2 relationships will thus be clearly defined by the researcher, and an enormous literature exists as to how such relationships may then be studied by either approach, and how the risk relating to the reliability of any conclusions may be quantified by statistics and expressed as a p value [ 29 ].

Typically Group 2 relationships will be few in number in any causal research exercise because a trade off exists between the number of variables that may be studied and the amount of data required to generate a statistically significant result with regard to any conclusions drawn [ 30 , 31 , 32 ]. The amount of data required usually increases exponentially, as does the number of potential interactions between the variables [ 30 , 31 , 32 ]. So, for example, a 4 2 full factorial with six levels of each variable and 30 observations in each cell would require 480 observations to fully compare the relationships between 2 independent variables and one dependent variable. By contrast a 4 4 full factorial would require 7680 observations to study the relationships between four independent variables and one dependent variable to the same standard.

This has led to the development of techniques that use less data to achieve the same level of statistical significance to express the risk related to multiple causal relationships [ 33 , 34 ]. Unsurprisingly these techniques, such as Conjoint Analysis, have proved to be extremely popular with researchers [ 35 , 36 ]. However, there is no ‘free lunch’, once again there is a trade-off. Conjoint Analysis, for example, is based upon a fractional factorial design [ 37 ]. The researcher specifies which relationships are of interest, and the programme removes parts of the full factorial array that are not relevant to those relationships [ 36 ]. As with any fractional factorial design, the researcher thus chooses to ignore these excluded relationships, within the fractional design, usually via the (credible) assumption that their main effects and interactions are not significant [ 38 ].

By doing so the researcher chooses to not know something that they do not know. These relationships are removed from the risk calculations relating to the variables that are of interest to the researcher. They and their effects on the research outcomes do not however disappear! They are transformed from visible Group 2 knowledge (risk) to invisible Group 3 knowledge (uncertainty). If the researcher’s assumptions are wrong and these excluded relationships are significant, then they have the potential to significantly distort the outcomes of the apparently authoritative analysis of the risk related to the visible Group 2 relationships that are eventually reported by the researcher. While techniques such as Conjoint Analysis that routinely rely upon highly fractionated fractional factorial designs are vulnerable in this regard [ 38 ], it is rarely acknowledged with regard to results that rely upon them. As with the SEM example above, the p value associated with the conclusion is routinely readjusted to zero on citation, and it thus graduates to the status of Group 1 knowledge (certainty).

2.3. What We Don’t Know We Don’t Know (Group 3—Uncertainty)

This category of knowledge, as Donald Rumsfeldt observed, is the one that creates most difficulty. It is also invariably the largest category of knowledge in any ‘living’ research environment, and it is at its most complex in human research environments. Its impact on data cannot be separated or quantified and thus must be treated as uncertainty rather than risk.

To illustrate this, take the situation where a researcher wishes to study the causal relationship between fructose intake and attention span for adolescents. The sample will be 480 adolescents aged between 12 and 16. For each adolescent, measures for fructose intake and attention span are to be established by the researcher.

The researcher may also presume that other factors than fructose intake will have an effect on attention span, and they may seek to capture and control for the impact of these ‘extraneous’ variables by a variety of methods such as high order factorials and ANOVA, conjoint analysis or linear mixed model designs. Whatever method is used, the capacity to include additional variables is always restricted by the amount of information relating to the impact of an independent variable set that can be extracted from any dataset, and the conclusions relating to them that can have a meaningful measure of risk attached to them via a p value.

Thus, in this case the researcher designs the research to capture the main effects of three other extraneous independent variables in addition to fructose intake: parental education, household income and the child’s gender. These relationships thus become Group 2 information.

This accounts for four variables that might well significantly impact upon the relationship between sugar intake and attention span, but it leaves many others uncontrolled for and unaccounted for within the research environment. These Group 3 uncertainty inputs (variables) may include, but are by no means restricted to, the diet of the household (which includes many individual aspects), the number of siblings in the household, the school that the adolescent attends and level of physical activity, etc. etc. These Group 3 uncertainty variables may be colinear with one or more of the Group 2 variables, they may be anticolinear with them, or they may be simply unconnected (random).

To take ‘school attended’ for example—If the sample are drawn from a small number of equivalent schools, one of which has a ‘crusading’ attitude to attention span, this Group 3 variable is likely to have a significant impact upon the dataset depending upon how it ends up distributed within its groups. If the effect is ‘random’ in its impact in relation to any one of the Group 2 variables, the effect of it will end up in the error term, increasing the possibility of a Type II error with regard to that Group 2 variable (as it might be with regard to gender if the school is coeducational). If the impact is collinear with any one of the Group 2 variables, then its effect will end up in the variation that is attached to that variable, thus increasing the possibility of a Type I error (as it certainly will if the crusading school is single sex).

The key issue here is that the researcher simply does not know about these Group 3 uncertainty variables and their effects. Their ignorance of them is either absolute, or it is qualified because they have been forced to exclude them from the analysis. A researcher will be very fortunate indeed if one or more of these Group 3 uncertainty variables within their chosen human research environment do not have the capacity to significantly impact upon their research results. This researcher for example had an experimental research exercise on olive oil intake destroyed by a completely unsuspected but very strong aversion to Spanish olive oil within the research population. The incorporation of Spanish origin into the packaging of one of the four branded products involved (treated as an extraneous variable with which the ‘Spanish effect’ was fully colinear) produced a massive main effect for package treatment, and substantial primary and secondary interactions with other Group 2 variables that rendered the dataset useless.

Group 3 uncertainty variables will always be present in any living environment. Because they are unknown and uncontrolled for, they are incorrigible via any statistical technique that might reduce them to risk. Consequently, the uncertainty that they generate has the capacity to affect the reliability of both experimental and observational studies to a significant degree. To illustrate, this the fructose and attention span causal example introduced above will be used. Table 2 shows how the Group 3 uncertainty variable (school attended) would affect a comparable experimental and observational study if its impact was significant.

The impact of Group 3 uncertainty variables on experimental and observational research outcomes.

Experimental StudyObservational Study
2 factorial design—480 subjects recruited as eight matched groups of 60 on the basis of parental education, household income and gender. Within each group 30 randomly allocated to a high fructose diet and 30 to a low one, and attention span observed. 480 subjects recruited as eight matched groups of 60 on the basis of parental education, household income and gender. Each group of 60 divided up into two groups of 30 (high and low) on the basis of their reported fructose consumption and attention span observed.
The school attended effect will uniformly increase variation within the two randomly allocated experimental groups for high and low fructose diet. This increase in variation will end up in the error term of the analysis of variance, reducing the F ratio for fructose intake and for parental education, income and gender (trending to a Type I error).
  As the groups for parental education, household income and the child’s gender are not randomly allocated, the school effect will either end up in the error term of the analysis of variance thereby depressing the F ratio for parental education, income and gender if it is not colinear, or it will end up in the error that is related to these variables, and thus increase the F ratio if it is colinear.
  Therefore, results could trend towards a Type I or Type II error with regard to any or all of these Group 2 variables, depending on the level of and nature of the collinearity between them and the Group 3 variable.
  The school effect would be likely to be strongly colinear with all of these three Group 2 variables if the attention span crusading school was perceived to be the ‘good’ school in the area.
The school attended variable will impact upon the parental education, household income and child’s gender variables exactly as it does in the experimental design opposite.
  The impact of the school attended variable upon the fructose intake variable will depend upon its degree of collinearity with it. If it is not collinear, then the allocation to the two groups will effectively be random, and the variation will thus end up in the error term depressing the F ratio for fructose intake, and tending towards a Type I error.
  If school attended has any collinearity with fructose intake, then the allocation will not be random and the impact of school attended will be apportioned into the variation associated with fructose intake.
  Depending whether the effect of school attended is complementary or anticomplementary to the effect of fructose intake, the result is a trend towards either a Type I (suppressed F ratio) or a Type II error (increased F ratio).

Experiments are distinguished from observational studies by the capacity of the researcher to randomly allocate to treatment conditions that they control. Table 2 shows that randomisation may confer a significant advantage over non-randomly allocated observation in an equivalent causal research situation. However, Table 2 also shows that while experimentation may confer advantage over observation in comparable situations, it is a case of ‘may’, and not ‘will’. Randomisation does not confer infallibility, and this is because researcher knowledge and control only relates to Group 2 variables and the random allocation of subjects to them. Control does not extend to any Group 3 variable and is thus not absolute in any human research situation. The outcome is that significant uncertainty, unlike significant risk, cannot be eliminated by random allocation.

Therefore, it is perfectly possible to design an experiment that is less reliable than an observational exercise when investigating causal relationships. Because it cannot be eliminated, how the uncertainty that is generated by Group 3 variables is managed at the design phase of research is one aspect that can significantly impact upon the reliability of causal research that is conducted using either experimental or observational techniques. Perhaps more than any other, it is this aspect of agricultural research method, the management of uncertainty, and the generation of the ‘clean’ data by design that can minimise uncertainty, that has failed to transfer to human research disciplines.

3. Managing Risk and Uncertainty in Experimental and Observational Research—Fisher’s Principals

The development of modern, systematic experimental technique for living environments is usually associated with the publication of “The design and analysis of experiments’ and ‘Statistical methods for research workers’ by Sir Ronald Fisher [ 30 , 38 , 39 ]. Although Fisher’s work is most heavily recognised and cited in the role of risk reduction and the manipulation of Group 2 variables via random allocation between treatments, Fisher also was well aware of the potential impact of Group 3 variables and uncertainty on experimental reliability. In order to design ‘effective’ experimental research that dealt with the issue of Group 3 variables and uncertainty, Fisher proposed two ‘main’ principles:

“… the problem of designing economical and effective field experiments is reduced to two main principles (i) the division of the experimental area into plots as small as possible …; (ii) the use of [experimental] arrangements which eliminate a maximum fraction of soil heterogeneity, and yet provide a valid estimate of residual errors.” [ 40 ] (p. 510)

The overall objective of Fisher’s principles is very simple. They aim to minimise the contribution of Group 3 variation to the mean square for error in the analysis of variance table, as the mean square for error forms the denominator of the fraction that is used to calculate the F ratio for significance for any Group 2 variable. The mean square for the variance of that Group 2 variable forms the denominator of the fraction. Therefore, reducing Group 3 variation increases Group 2 ‘F’ ratios and thus their significance in the ANOVA table as expressed by a ‘ p ’ value. Fisher’s principles achieve this by increasing sample homogeneity, which is in turn achieved by reducing sample size.

Fisher’s second principle for experimental design for theory testing is also closely aligned with the much older and more general principal of parsimony in scientific theory generation known ‘Occam’s Razor, which is usually stated as: “Entities are not to be multiplied without necessity” (Non sunt multiplicanda entia sine necessitate) [ 41 ] (p. 483). Occam’s Razor, like Fisher’s principles, is not a ‘hard’ rule, but a general principle to be considered when conducting scientific research [ 42 ].

This is as far as Fisher ever went with regard to these two ‘main’ principles for dealing with Group 3 variation and uncertainty. Exactly why they were not developed further in his writing is a mystery, but Fisher may have assumed that these principles were so obvious to his audience of primarily agricultural researchers that no further development was necessary, and that the orally transmitted experimental ‘method’ discussed earlier in this article would suffice to ensure that these two principles were applied consistently to any experimental research design.

The author’s personal experience is that Fisher’s assumptions were justified with regard to agricultural research, but not the medical, biological and social sciences to which his experimental techniques were later transferred without their accompanying method. To a certain degree this may be due to the fact that the application of Fisher’s principles for the reduction of experimental uncertainty are also easier to visualise and understand in their original agricultural context, and so they will be initially explained in that context here ( Figure 1 ).

An external file that holds a picture, illustration, etc.
Object name is nutrients-14-04649-g001.jpg

Fisher’s principles and Group 3 variables in the experimental environment.

Figure 1 a shows a living environment, in this case an agricultural research paddock. On first inspection it might appear to be flat and uniform, but it actually has significant non-uniformities within it with regard to soil, elevation, slope, sunlight and wind. The researcher either does not know about these non-uniformities (e.g., the old watercourse) or simply has to put up with them (slope, elevation and wind) in certain circumstances. These are all Group 3 variables in any research design. While Fisher used the term ‘soil heterogeneity as the input he wished to eliminate, he would have been more correct to use the term ‘environmental heterogeneity’.

In Figure 1 b, a 3 × 4 fractionally replicated Latin Square experiment that is able to separate the main effects of three independent Group 2 variables, with the ability to detect the presence of non-additivity (interaction) between them has been set up (Youden & Hunter 1955). The experiment follows Fisher’s first principle in that the individual plots (samples) are as small as it is possible to make them without creating significant ‘edge effects’ [ 43 ]. It also follows Fisher’s second principle in that this form of fractionally replicated Latin Square is the most efficient design for dealing with this set of three Group 2 variables and simple non-additivity [ 5 ]. In Figure 1 b the researcher has used the small size to avoid non-uniformity of sun and wind, and they have also fortuitously avoided any variations due to the river bed, if they were not aware of it.

In Figure 1 c the researcher has breached Fisher’s first principle in that the plot sizes of the experiment have been increased beyond the minimum on the basis of ‘the bigger the sample the better’ philosophy that dominates most experimental and observational research design. This increase in plot size may reduce random measurement error, thus reducing the proportion of variance ending up in the error term and thus potentially increasing the F ratios for the Group 2 variables. However, the increase in accuracy will be subject to diminishing returns.

Furthermore, the design now includes all the variations in Group 3 variables in the environment. This may do one of two things. Firstly, variation generated by the Group 3 variables may simply increase apparent random variation, which will reduce the F ratio and induce a Type I error. Secondly, as is shown in this case, Group 3 variation may fortuitously create an apparently systematic variation via collinearity with a Group 2 variable. As the old water course is under all the ‘level I’ treatments for the third Group 2 independent variable, all the variations due to this Group 3 variable will become collinear with those of the third Group 2 independent variable. This will apparently increase the F ratio for that variable, and also simultaneously reduce that for the Youden & Hunter test for non-additivity of effects thereby creating a significant potential for a Type II error. (The Youden and Hunter test for non-additivity [ 44 ] estimates experimental error directly by comparing replications of some treatment conditions in the design. Non-additivity is then estimated via the residual variation in the ANOVA table. In this case, the three main design plots for Group 2 Variable 3, treatment level I are all in the watercourse, while the single replication at this level is on the bottom left corner of the design on the elevated slope. This replicated level I plot is likely to return a significantly different result than the three main plots, thus erroneously increasing the test’s estimate of overall error, and concomitantly erroneously reducing its estimate of non-additivity.)

In Figure 1 d the researcher, who is only interested in three Group 2 main effects and the presence or not of interaction between them, has breached Fisher’s second principle by using a less efficient ‘overkill’ design for this specific purpose. They are using an 3 × 3 × 3 full factorial, but with the initial small plot size. This design has theoretically greater statistical power with regard to Group 2 variation, and also has the capacity to identify and quantify first, second and third order interactions between them—information that they do not need. The outcome of this is the same as breaching Fisher’s first principle, in that major variations in Group 3 variables are incorporated into the enlarged dataset that is required by this design. It is purely a matter of chance as to whether this Group 3 variation will compromise the result by increasing apparent random error, but this risk increases exponentially with increasing sample size. The randomisation of plots over the larger area makes a Type II error much less likely, but the chance of a Type I error is still significantly increased.

The design of an experiment that breached both of Fisher’s principles by using both the larger design and the larger plot size cannot be shown in Figure 1 as it would be too large, but the experiment’s dataset would inevitably incorporate even greater Group 3 variation than is shown in the figure, with predictably dire results for the reliability of any research analysis of the Group 2 variables.

It is important to note that that Fisher’s principles do not dictate that all experiments should be exceedingly small. Scale does endow greater reliability, but not as a simple matter of course. This scale must be achieved via replication of individual exercises that do conform to Fisher’s principles. ‘Internal ‘intra-study’ replication, where a small-sample experimental exercise is repeated multiple times to contribute to a single result does not breach Fisher’s principles, and it increases accuracy, power and observable reliability. It is thus standard agricultural research practice. Intra-study replications in agricultural research routinely occur on a very large scale [ 45 ], but it is rare to see it in human research disciplines [ 46 , 47 ]. The process is shown in Figure 1 e, where the experiment from Figure 1 a is replicated three times. With this design, variation in environment can be partitioned in the analysis of variance table as a sum of squares for replication. A large/significant figure in this category (likely in the scenario shown in Figure 1 e) may cause the researcher to conduct further investigations as to the potential impact of Group 3 variables on the overall result.

Figure 1 f shows a situation that arises in human rather than agricultural research, but places it into the same context as the other examples. In agricultural research, participation of the selected population is normally one hundred percent. In human research this is very rarely the case, and participation rates normally fall well below this level. Figure 1 f shows a situation where only around 25% of the potentially available research population is participating as a sample.

Fractional participation rates increase the effective size of the sample proportionately (shown by the dotted lines) of the actual plots from which the sample would be drawn. The reported sample numbers would make this look like the situation in Figure 1 b, but when it is shown laid out in Figure 1 f, it can be seen that the actual situation is more analogous to Figure 1 c, with a very large underlying research population that incorporates the same level of Group 3 variance as Figure 1 c, but without the advantage of greater actual sample size, thereby magnifying the potential effect of Group 3 variables beyond that in Figure 1 c. The outcome is an effective breach of Fisher’s first principle, and an increased chance that both Type I and Type II errors will occur.

Subject participation rate is therefore a crucial factor when assessing the potential impact of Group 3 variables on experimental research reliability. This derivative of Fisher’s first principle holds whether the experimental analysis of Group 2 variation is based upon a randomised sample or not.

Moving forward from these specific agricultural examples, the general application of Fisher’s principles with regard to the sample size used in any experiment can be visualised as in Figure 2 .

An external file that holds a picture, illustration, etc.
Object name is nutrients-14-04649-g002.jpg

Graphical representation of the interaction of risk, uncertainty and unreliability as a function of experimental sample size.

As sample size increases, then ‘ceteris paribus’, the risk (R) of making a Type I or II error with regard to any Group 2 variable decreases geometrically, and is expressed via statistics in a precise and authoritative manner by ‘ p ’ value. As a consequence of this precision, this risk can be represented by a fine ‘hard’ solid line (R) in Figure 2 .

By contrast, the uncertainty that is generated by the influence of Group 3 variables within the sample increases as the sample size itself increases. Unlike risk, it cannot be analysed, and no specific source or probability can be assigned to it—yet its increase in any living environment is inevitable as sample size increases. As it is fundamentally amorphous in nature it cannot be expressed as a ‘hard’ line, but is shown as a shaded area (U) in Figure 2 .

The overall unreliability of research (T) is the sum of these two inputs. It is not expressed as a line in Figure 2 , but as a shape that starts as a hard black line when the sample size is small and risk is the dominant input, and as a wide shaded area as sample size increases and uncertainty become the dominant input. The shape of the unreliability plot (T) is significant. As risk reduces geometrically, and uncertainty increases at least linearly with sample size, unreliability (T) takes the form of an arc, with a specific minimum point ‘O’ on the sample size axis where risk and uncertainty contribute equally to unreliability.

This indicates that there is a theoretical ‘optimal’ sample size at which unreliability is at its lowest, which is represented by a point (O) at the bottom of the arc (T). ‘O’, however, is not the optimal size of any experimental design. The point where sample size reaches point ‘O’, uncertainty is also the point at which uncertainty becomes the dominant contributor to overall experimental unreliability. However, as uncertainty is amorphous, the exact or even approximate location of ‘O’, and the sample size that corresponds to it, therefore cannot be reliably established by the researcher.

Given that ‘O’ cannot be reliably located, then the researcher must endeavour to stay well on the right side of it. It is clear from Figure 2 that, if there is a choice that is to be made between them, then it is better to favour risk over uncertainty, and to design an experiment that has specific risk contributing the maximum, and amorphous uncertainty the minimum, amount to its overall experimental unreliability for a given and acceptable value of p .

The logical reaction of any experimental designer to this conclusion is to ‘hug’ the risk line (R). This means that the minimum sample size that is required to achieve an acceptable not minimal level of experimental risk is selected, and further scale is achieved by replication of the entire exercise. This point is represented by the vertical dotted line ‘S1′ for p = 0.10 if the designer takes this to be the required level of risk for the experiment. If the designer reduces p to 0.05 and increases the sample accordingly, then they reduce the apparent risk, but they do not know with any certainty whether they are doing the same for overall unreliability, as uncertainty is now contributing more to the overall unreliability of the experiment (line S2). If risk is further reduced to p = 0.01, then the geometric increase in the sample size required increases the impact of Group 3 variable derived uncertainty to the point that it generates an apparently lower risk experiment that actually has a significantly higher (but amorphous and hidden) level of overall unreliability (represented by the double headed arrow on line S3).

It is this logical design reaction to the situation outlined in Figure 1 that is expressed by Fisher in his two principles. It should be noted that the required risk is the cardinal input. The acceptable level of risk must be established first, and this choice should be driven by the research objectives and not by the research design process. Fisher’s principles are then applied to minimise the contribution of uncertainty to experimental designs that are capable of achieving that level of risk.

4. Certainty, Risk, Uncertainty and the Relative Merits of Experimentation and Observational Research

All the foregoing remarks apply equally to randomised experimental research, and also to observational research that uses any form of organised comparison as the basis for their conclusions. Indeed, many observational research designs are classical experimental designs in all facets bar the randomisation of their treatment conditions.

In both cases poor design that does not address the potential contribution of Group 1 (certainty) and Group 3 (uncertainty) variation to their data can produce a highly unreliable research outcome that can nevertheless report a low level of risk. This outcome is made even more undesirable when this unreliable outcome is authoritatively presented as a low-risk result on the basis of a design and statistical analysis that focusses purely on the contribution of Group 2 (risk) variation to the data. The situation is further aggravated if the practice becomes widespread, and if there is a lack of routine testing of such unreliable results via either intra-study or inter study replication.

The answer to this problem is the application of method to reduce uncertainty and thus unreliability—Fisher’s two principles form only a small part of this body of method. At present the situation is that method is widely considered to be of little importance As Gershon et al. note [ 15 ] “ Methods of observational studies tend to be difficult to understand…” Method is indeed difficult to report as it is both complex and case specific. My personal experience is that I have struggled to retain any methodological commentary in any article that I have published in the human research literature—It is just not perceived to be important by reviewers and editors—and thus presumably not worth understanding. Consequently, deletion is its routine fate.

One of the main barriers to the use, reporting and propagation of good method is that it is a fungible entity. While the techniques from Figure 1 such as Latin Square or ANOVA may applied to thousands of research exercises via a single, specific set of written rules, method is applied to research designs on a case-by-case basis via flexible and often unwritten guidelines. This is why ‘Fisher’s principles’, are principles and not rules. Thus, this article concludes by developing Fisher’s principles into a set of four methodological ‘principles’ for conducting observational research in nutrition—and for subsequently engaging with editors and reviewers:

Randomisation confers advantage over observation in specific situations rather than absolute infallibility. Therefore a researcher may make a reasonable choice between them when designing an experiment to maximise reliability.

Many observational studies are conducted because random allocation is not possible. If this is the case, then the use of observation may not need to be justified. If, however, the researcher faces the option of either a randomised or observational approach, then they need to look very carefully at whether the random design actually offers the prospect of a more reliable result. Ceteris paribus it does, but if randomisation is going to require a larger/less efficient design, or makes recruitment more difficult, thereby increasing the effective size of the individual samples, then level of uncertainty will be increased within the results t the degree that a reduction in reliability might reasonably be assumed. An observational approach may thus be justified via Fisher’s first or second principles.

Theoretical simplicity confers reliability. Therefore simpler theories and designs should be favoured.

All theoretical development involves an assumption of certainty for inputs when reality falls (slightly) short of this. This is not an issue when the inputs and assumptions related to the research theory are few, but can become an issue if a large number are involved.

There is no free lunch in science. The more hypotheses that the researcher seeks to test, the larger and more elaborate the research design and sample will have to be. Elaborate instruments make more assumptions and also tend to reduce participation, thus increasing effective individual sample size. All of these increase the level of uncertainty, and thus unreliability, for any research exercise.

The researcher should therefore use the simplest theory and related research design that is capable of addressing their specific research objectives.

There is an optimal sample size for maximum reliability—Big is not always better. Therefore the minimum sample size necessary to achieve a determined level of risk for any individual exercise should be selected.

The researcher should aim to use the smallest and most homogenous sample that is capable of delivering the required level of risk for a specific research design derived from Principle 2 above. Using a larger sample than is absolutely required inevitably decreases the level of homogeneity within the sample that can be achieved by the researcher, and thereby increases the uncertainty of Group 3 variables that are outside the control or awareness of the researcher. Unlike risk, uncertainty cannot be estimated, so the logical approach is not to increase sample size beyond the point at which risk is at the required level.

Scale is achieved by intra-study replication—more is always better. Therefore, multiple replications should be the norm in observational research exercises.

While there is an optimal sample size to an individual experimental/observational research exercise, the same does not apply to the research sample as a whole if scale is achieved by intra-study replication. Any observational exercise should be fully replicated at least once, and preferably multiple times within any study that is being prepared for publication. Replication can be captured within a statistical exercise and can thus be used to significantly reduce the estimate of risk related to Group 2 variables.

Far more importantly for observational researchers, replication stability also confers a subjective test of overall reliability of their research, and thus the potential uncertainty generated by Group 3 variables. A simple observational exercise that conforms with Principles 1–3 that is replicated three times with demonstrated stability to replication has a far more value, and thus a far higher chance of being published than a single more elaborate and ‘messy’ observational exercise that might occupy the same resource and dataset.

Clearly the research may not be stable to replication. However, this would be an important finding in and of itself, and the result may allow the researcher to develop some useful conclusions as to why this result occurred, what its implications are, and which Group 3 variable might be responsible for it. The work thus remains publishable. This is a better situation than that faced by the author of the single large and messy exercise noted above—The Group 3 variation would be undetected in their data. Consequently, the outcome would be an inconclusive/unpublishable result and potentially a Type 1 error.

5. Conclusions

Observational researchers will always have to face challenges with regard to the perceived reliability of their research. As they defend their work it is important for them to note that random designs are not infallible and that observational designs are therefore not necessarily less reliable than their randomised counterparts. Observation thus represents a logical path to reliability in many circumstances. If they follow the four principles above, then their work should have a demonstrably adequate level of reliability to survive these challenges and to make a contribution to the research literature.

Publishing experimental research of this type that takes a balanced approach to maximising experimental reliability by minimising both risk and uncertainty is likely to remain a challenging process in the immediate future. This is largely due to an unbalanced focus by reviewers, book authors and editors on statistical techniques that focus on the reduction of risk over any other source of experimental error [ 48 ].

Perhaps the key conclusion is that replication is an essential aspect of both randomised and observational research. The human research literature remains a highly hostile environment to inter-study replications of any type. Hopefully this will change. However, in the interim, intra-study replication faces no such barriers, and confers massive advantages, particularly to observational researchers. Some may approach replication with some trepidation. After forty years of commercial and academic research experience in both agricultural and human environments, my observation is that those who design replication based research exercises that conform to Fisher’s principles have much to gain and little to fear from it.

6. Final Thought: The Application of Fisher’s Principles to Recall Bias and within Individual Variation

One reviewer raised an important point with regard to the application of Fisher’s principles to two important nutritional variables:

“There are some features on methods of data collection in nutritional studies that require attention, for example recall bias or within individual variation. The authors did not mention these at all.”

The researcher operates in food marketing where both of these issues can cause major problems. There are significant differences between them. Recall bias as its name suggests is a systematic variation, where a reported phenomenon is consistently either magnified or reduced upon recollection within a sample. Bias of any type is a real issue when an absolute measure of a phenomenon is required (e.g., total sugar intake). However, due to its systematic nature, it would not necessarily be an issue if the research exercise involves a comparison between two closely comparable sample groups to measure the impact of an independent variable upon total sugar intake (e.g., an experiment/observational exercise where the impact of education on total sugar intake was studied by recruiting two groups with high and low education, and then asking them to report their sugar intake). If the two groups were comparable in their systematic recall bias, then the systematic recall effect would cancel out between the samples and would disappear in the analysis of the impact of education upon total sugar intake.

However, this requires that the two groups are truly comparable with regard to their bias. The chances of this occurring are increased in both random allocation (experimental) and systematic allocation (observational) environments if the sample sizes are kept as small as possible while all efforts are taken to achieve homogeneity within them. Response bias is a type 3 (uncertainty) variable. If the population from which the two samples above are drawn increases in size, then the two samples will inevitably become less homogenous in their characteristics. This also applies to their bias, which thus ceases to be homogenous response bias, and instead becomes increasingly random response variation—the impact of which, along with all the other type 3 uncertainty variables, now ends up in the error term of any analysis, thus decreasing the research reliability (See Figure 2 ). Response bias can thus best be managed using Fisher’s principles.

Similar comments can be made about within individual variation. The fact that people are not consistent in their behaviour is a massive issue in both nutrition and food marketing research. However, this seemingly random variation in behaviour is usually driven by distinct and predictable changes in behaviour which are driven by both time and circumstance/opportunity. For example, you consistently eat different food for breakfast and dinner (temporal pattern). You also consistently tend to eat more, and less responsibly, if you go out to eat (circumstance/opportunity pattern). If time/circumstance/opportunity for any group can be tightened up enough and made homogenous within that group, then this seemingly random within individual variation thus becomes a consistent within individual bias, and can be eliminated as a factor between study groups in the manner shown above.

Thus, within individual variation is a Group 3 (uncertainty) variable, and it too can be managed via Fisher’s principles. Although most research looks at recruiting demographically homogenous samples, less attention is paid to also recruiting samples that are also temporally and environmentally homogenous. Thus, a researcher should not only collect demographically homogenous samples but should also recruit temporally and environmentally homogenous samples by recruiting at the same time and location. This temporal and environmental uniformity has the effect of turning a significant proportion of within consumer variation into within consumer bias for any sample. The effect of this bias is then eliminated by the experimental/observational comparison. The small experiments/observational exercises are then replicated as many times as necessary to create the required sample size and Group 2 risk.

Funding Statement

This research received no external funding.

Institutional Review Board Statement

Informed consent statement, data availability statement, conflicts of interest.

The author declares no conflict of interest.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

  • DOI: 10.1080/07373939708917359
  • Corpus ID: 120364491

Image Analysis, Homogenization, Numerical Simulation and Experiment as Complementary Tools to Enlighten the Relationship between Wood Anatomy and Drying Behavior

  • Published 1 October 1997
  • Environmental Science, Materials Science
  • Drying Technology

24 Citations

A heterogeneous three-dimensional computational model for wood drying, a mesoscopic drying model applied to the growth rings of softwood: mesh generation and simulation results, a heterogeneous wood drying computational model that accounts for material property variation across growth rings, hydromechanical behavior of wood during drying studied by nir spectroscopy and image analysis, modelling and simulation of deformation behaviour during drying using a concept of linear difference method, a framework and numerical solution of the drying process in porous media by using a continuous model, mass and heat transport models for analysis of the drying process in porous media: a review and numerical implementation.

  • Highly Influenced

Moisture Transport in Wood: A Study of Physical-Mathematical Models and their Numerical Implementation

The use of implicit flux limiting schemes in the simulation of the drying process: a new maximum flow sensor applied to phase mobilities, determination of the material property variations across the growth ring of softwood for use in a heterogeneous drying model part 1. capillary pressure,tracheid model and absolute permeability, 5 references, advances in transport phenomena during convective drying with superheated steam and moist air, drying with internal vaporisation : introducing the concept of identity drying card (idc), medical ct-scanners for non-destructive wood density and moisture content measurements, mesure ponctuelle de l'humidité du bois, la prédiction des propriétés macroscopiques du matériau bois par sa structure anatomique: besoin ou moyen de caractésiser la paroi, related papers.

Showing 1 through 3 of 0 Related Papers

Buckling Failure of Multi-layer Surrounding Rock: Insights from Flexural Test of Laminated Plates Under Multi-directional Constraints

  • Original Paper
  • Published: 27 September 2024

Cite this article

scholar who stressed experiment and observation

  • Yang-Yang Cui 1 , 2 ,
  • Yang-Yi Zhou 1 , 2 ,
  • Xiao-Jun Yu 1 , 2 &
  • Jian-Po Liu 1 , 2  

Buckling failure is one of the common failure modes of rock mass after the excavation and unloading of deep-buried tunnels, which brings major safety risks to the construction of deep engineering. According to the force characteristics of slabbing rock mass, it is simplified to the bending failure of rock plate under multi-directional constraints. Multi-directional constraints buckling failure tests of single-layer plates (thickness 10 mm, 20 mm) and laminated plates were carried out, and the influence of plate thickness and interlayer contact pressure on buckling failure were analyzed. The results show that the bending stiffness and ultimate bearing capacity of the rock plate are affected by the thickness of the single-layer plates and the interlayer friction of the laminated rock plates. When the lateral pressure is small, with the increase of the lateral pressure, the plasticity of the three types of rock plates increases first and then weakens. The failure mode of the single-layer rock plates gradually changes from X-shaped failure to spalling failure, while the laminated rock plate is mainly X-shaped failure. The X-shaped cracks and spalling shape of the plates are controlled by the ratio of lateral pressure. When the lateral pressure and constraints of the upper and lower plates of the laminated rock plate are consistent, the deformation of the two plates is compatible. When the lateral pressure and constraints of the two plates are inconsistent, the two plates may be delaminated. Through numerical simulation, it can be seen that with the increase of the bending degree of the plate, the force mode of the laminated plate gradually changes from point load to circle load, and the contact pressure area between the layers also changes.

Buckling failure in deep-buried tunnels is simplified to the bending of thin plate under multi-directional constraints.

Strength, deformation, and failure characteristics of single-layer and laminated plates are analyzed.

Mechanism of strong–weak plasticity transformation of rock plates is explained.

Evolutions of the rock plates buckling failure are analyzed by endoscope.

Bending characteristics of the laminated rock plates under different constraints are obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

scholar who stressed experiment and observation

Data availability

The data that support the findings of this study are available on request from the corresponding author ([email protected]) upon reasonable request.

Cui YY, Zhou YY, Yu XJ, Wang FY (2023) Buckling failure mechanism of a rock plate under multi-directional constraints. Rock Mech Rock Eng 56(11):8135–8149

Article   Google Scholar  

Feng XT, Xu H, Qiu SL, Li SJ, Yang CX, Guo HS, Cheng Y, Gao YH (2018) In situ observation of rock spalling in the deep tunnels of the China Jinping underground laboratory (2400 m depth). Rock Mech Rock Eng 51:1193–1213

Feng XT, Yu XJ, Zhou YY, Yang CX, Wang FY (2023) A rigid true triaxial apparatus for analyses of deformation and failure features of deep weak rock under excavation stress paths. J Rock Mech Geotech 15(5):1065–1075

Ghasemi M, Corkum AG (2023) Experiment and numerical study on three hinge buckling. Rock Mech Rock Eng 56(3):2049–2063

Gong FQ, Wu WX, Li TB, Si XF (2019) Experimental simulation and investigation of spalling failure of rectangular tunnel under different three-dimensional stress states. Int J Rock Mech Min Sci 122:104081

Gu LJ, Feng XT, Kong R, Yang CX (2023) Effect of principal stress direction interchange on the failure characteristics of hard rock. Int J Rock Mech Min Sci 164:105365

He MC, Miao JL, Li DJ, Wang CG (2007) Experimental study on rockburst processes of granite specimen at great depth. Chin J Rock Mech Eng 26:865 ( (In Chinese) )

Google Scholar  

He MC, Li JY, Liu DQ, Ling K, Ren FQ (2021) A novel true triaxial apparatus for simulating strain bursts under high stress. Rock Mech Rock Eng 54:759–775

Hu XC, Su GS, Li ZY, Xu CS, Yan XY, Liu YX, Yan LB (2021) Suppressing rockburst by increasing the tensile strength of rock surface: an experimental study. Tunn Undergr Sp Tech 107:103645

Huang RQ, Huang D, Duan SH, Wu Q (2011) Geomechanics mechanism and characteristics of surrounding rock mass deformation failure in construction phase for underground powerhouse of Jinping I hydropower station. Chin J Rock Mech Eng 30(1):23–35 ( in Chinese )

Jiang Q, Feng XT, Fan Y (2017) In situ experimental investigation of basalt spalling in a large underground powerhouse cavern[J]. Tunn Undergr Sp Tech 68:82–94

Li DY, Li CC, Li XB (2011) Influence of sample height-to-width ratios on failure mode for rectangular prism samples of hard rock loaded in uniaxial compression. Rock Mech Rock Eng 44:253–267

Ling K, Wang Y, Liu DQ, Guo YP, Zhou Z, Zhang LL, He MC (2023) Experimental study on rockburst and spalling failure in circular openings for deep underground engineering. Rock Mech Rock Eng 56(4):2607–2631

Luo Y, Gong FQ, Liu DQ, Wang SY, Si XF (2019) Experimental simulation analysis of the process and failure characteristics of spalling in D-shaped tunnels under true-triaxial loading conditions. Tunn Undergr Sp Tech 90:42–61

Martin CD, Read RS, Martino JB (1997) Observations of brittle failure around a circular test tunnel. Int J Rock Mech Min Sci 34(7):1065–1073

McGarr A (1997) A mechanism for high wall-rock velocities in rockbursts. Pure Appl Geophys 150:381–391

Poincloux S, Chen T, Audoly B, Reis MP (2021) Bending response of a book with internal friction. Phys Rev Lett 126(21):218004

Article   CAS   Google Scholar  

Qian QH, Zhou XP (2018) Failure behaviors and rock deformation during excavation of Underground Cavern Group for Jinping I Hydropower Station. Rock Mech Rock Eng 51(8):2639–2651

Qin JF, Zhuo JS (2011) A discussion on rock velocity in rockburst. Rock Soil Mech 32(5):1365–1368 ( in Chinese )

Qin XF, Su HJ, Feng YJ, Zhao HH, Pham TN (2022) Fracture and deformation behaviors of saturated and dried single-edge notched beam sandstones under three-point bending based on DIC. Theoret Appl Fract Mech 117:103204

Qiu SL, Feng XT, Zhang CQ, Xiang TB (2014) Estimation of rockburst wall-rock velocity invoked by slab flexure sources in deep tunnels. Can Geotech J 51(5):520–539

Shan ZJ, Porter I, Nemcik J, Baafi E (2019) Investigating the behaviour of fibre reinforced polymers and steel mesh when supporting coal mine roof strata subject to buckling. Rock Mech Rock Eng 52(6):1857–1869

Si XF, Gong FQ (2020) Strength-weakening effect and shear-tension failure mode transformation mechanism of rockburst for fine-grained granite under triaxial unloading compression. Int J Rock Mech Min Sci 131:104347

Si XF, Li XB, Gong FQ, Huang LQ, Ma CD (2022) Experimental investigation on rockburst process and characteristics of a circular opening in layered rock under three-dimensional stress conditions. Tunn Undergr Sp Technol 127:104603

Su GS, Chen YX, Jiang Q, Li CJ, Cai Wei (2023) Spalling failure of deep hard rock caverns. J Rock Mech Geotech 15(8):2083–2104

Szwedzicki T (2003) Rock mass behaviour prior to failure. Int J Rock Mech Min 40(4):573–584

Wang S, Hagan P, Hu B, Gamage K, Yan C, Xu D (2014) Rock-arch instability characteristics of the sandstone plate under different loading conditions. Adv Mater Sci Eng 2014(1):950870

Wei XJ, Chen TT, Wang X, Zhu HH (2020) Mechanical analysis of slab buckling rockburst in circular tunnel considering the interaction between rock plates. Rock Soil Mech 41(11):3680–3686 ( in Chinese )

Weng L, Li XB, Zhou ZL, Liu KW (2016) Occurrence mechanism and time-dependency effect of buckling rock burst. J Min Saf Eng 33(1):172–178 ( in Chinese )

Wu WX, Gong FQ, Ren L, He L (2023) Strain rockburst failure characteristics and mechanism of high stress circular hard rock tunnel triggered by dynamic impact load. Int J Rock Mech Min Sci 171:105575

Zhang H, Fu DH, Song HP, Kang YL, Huang GY, Qi G, Li JY (2015) Damage and fracture investigation of three-point bending notched sandstone beams by DIC and AE techniques. Rock Mech Rock Eng 48:1297–1303

Zhao XG, Wang J, Cai M, Cheng C, Ma LK, Su R, Zhao F, Li DJ (2014) Influence of unloading rate on the strainburst characteristics of Beishan granite under true-triaxial unloading conditions. Rock Mech Rock Eng 47:467–483

Zhou H, Xu RC, Lu JJ, Zhang CQ, Meng FZ, Shen Z (2015) Study on mechanisms and physical simulation experiment of slab buckling rockburst in deep tunnel. Chin J Rock Mech Eng 34(S2):3658–3666 ( in Chinese )

Zhou YY, Feng XT, Xu DP, Chen DF, Li SJ (2016) Experimental study of mechanical response of thin-bedded limestone under bending conditions. Rock Soil Mech 37(7):1895–1902 ( in Chinese )

Zhou YY, Xu DP, Gu GK, Liu K, Wan LP, Wang TL, Yang JB (2019) The failure mechanism and construction practice of large underground caverns in steeply dipping layered rock masses. Eng Geol 250:45–64

Download references

This study was financially supported by the National Natural Science Foundation of China under Grant No. 52079027.

Author information

Authors and affiliations.

Key Laboratory of Ministry of Education On Safe Mining of Deep Metal Mines, Northeastern University, Shenyang, 110819, Liaoning, China

Yang-Yang Cui, Yang-Yi Zhou, Xiao-Jun Yu & Jian-Po Liu

Key Laboratory of Liaoning Province On Deep Engineering and Intelligent Technology, Northeastern University, Shenyang, 110819, China

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Yang-Yi Zhou .

Ethics declarations

Competing interest.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cui, YY., Zhou, YY., Yu, XJ. et al. Buckling Failure of Multi-layer Surrounding Rock: Insights from Flexural Test of Laminated Plates Under Multi-directional Constraints. Rock Mech Rock Eng (2024). https://doi.org/10.1007/s00603-024-04169-x

Download citation

Received : 05 November 2023

Accepted : 11 September 2024

Published : 27 September 2024

DOI : https://doi.org/10.1007/s00603-024-04169-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Buckling failure
  • Single-layer rock plate
  • Laminated rock plate
  • X-shaped failure
  • Spalling failure
  • Compatible deformation
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. PPT

    scholar who stressed experiment and observation

  2. Stressed Asian Male Scientist Feeling Tired, Upset or Having Problems

    scholar who stressed experiment and observation

  3. Stressed Male Scientist Feeling Tired, Upset or Having Problems Results

    scholar who stressed experiment and observation

  4. Premium Photo

    scholar who stressed experiment and observation

  5. Stressed young Asian male scientist upset with his chemical experiment

    scholar who stressed experiment and observation

  6. Premium Photo

    scholar who stressed experiment and observation

VIDEO

  1. physics experiment Observation of characteristics of Zener Diode

  2. The Final Experiment observation

  3. Feeling Lonely

  4. Investigating the impact of sleep on brain and mental health: Professor Russell Foster

  5. Magical power of OBSERVATION

  6. Observation and STRESS

COMMENTS

  1. Full article: The legacy of Hans Selye and the origins of stress

    Abstract. Hans Selye's single author short letter to Nature (1936, 138(3479):32) inspired a huge and still growing wave of medical research. His experiments with rats led to recognition of the "general adaptation syndrome", later renamed by Selye "stress response": the triad of enlarged adrenal glands, lymph node and thymic atrophy, and gastric erosions/ulcers.

  2. Hans Selye (1907-1982): Founder of the stress theory

    The word 'stress' is used in physics to refer to the interaction between a force and the resistance to counter that force, and it was Hans Selye who first incorporated this term into the medical lexicon to describe the " nonspecific response of the body to any demand ". Selye, who is known as the 'father of stress research ...

  3. Putting Stress in Life: Hans Selye and the Making of Stress Theory

    Abstract. Hans Selye discovered Stress in 1935 as a syndrome occurring in laboratory rats. In the modern world, Stress has become a universal explanation for human behaviour in industrial society. Selye's discovery arose out of widespread interest in the stability of bodily systems in 1930s' physiology; however, his findings were rejected by ...

  4. The philosophy of scientific experimentation: a review

    Abstract. Practicing and studying automated experimentation may benefit from philosophical reflection on experimental science in general. This paper reviews the relevant literature and discusses central issues in the philosophy of scientific experimentation. The first two sections present brief accounts of the rise of experimental science and ...

  5. Scientific Method

    Science is an enormously successful human enterprise. The study of scientific method is the attempt to discern the activities by which that success is achieved. Among the activities often identified as characteristic of science are systematic observation and experimentation, inductive and deductive reasoning, and the formation and testing of ...

  6. Theory and Observation in Science

    Although theory testing dominates much of the standard philosophical literature on observation, much of what this entry says about the role of observation in theory testing applies also to its role in inventing, and modifying theories, and applying them to tasks in engineering, medicine, and other practical enterprises. 2.

  7. Scientific Method

    The scientific method, developed during the Scientific Revolution (1500-1700), changed theoretical philosophy into practical science when experiments to demonstrate observable results were used to confirm, adjust, or deny specific hypotheses. Experimental results were then shared and critically reviewed by peers until universal laws could be made.

  8. Let's Talk about Stress: History of Stress Research

    The reference to stress is ubiquitous in modern society, yet it is a relatively new field of research. The following article provides an overview of the history of stress research and its iterations over the last century. In this article, I provide an overview of the earliest stress research and theories introduced through physiology and ...

  9. Eighty years of stress

    The discovery in 1936 that rats respond to various damaging stimuli with a general response that involves alarm, resistance and exhaustion launched the discipline of stress research.

  10. Theory and Observation in Science

    Theory and Observation in Science. First published Tue Jan 6, 2009; substantive revision Tue Mar 28, 2017. Scientists obtain a great deal of the evidence they use by observing natural and experimentally generated objects and effects. Much of the standard philosophical literature on this subject comes from 20 th century logical empiricists ...

  11. The Concept of Observation in Science and Philosophy

    Abstract. Through a study of a sophisticated contemporary scientific experiment, it is shown how and why use of the term 'observation' in reference to that experiment departs from ordinary and philosophical usages which associate observation epistemically with perception. The role of "background information" is examined, and general ...

  12. Newton's 'Experimental Philosophy'

    Newton's first published use of "experimental philosophy" appears in the widely quoted penultimate paragraph of the General Scho-. lium that was added to the second edition of the Principia in 1713, in which he asserted that although he had not found the cause of gravity, he had nonetheless demonstrated that it exists.

  13. Design of Observational Studies

    An observational study is an empiric investigation of effects caused by treatments when randomized experimentation is unethical or infeasible. Observational studies are common in most fields that study the effects of treatments on people, including medicine, economics, epidemiology, education, psychology, political science and sociology.

  14. The psychology of experimental psychologists: Overcoming cognitive

    The past decade has been a bruising one for experimental psychology. The publication of a paper by Simmons, Nelson, and Simonsohn (2011) entitled "False-positive psychology" drew attention to problems with the way in which research was often conducted in our field, which meant that many results could not be trusted. Simmons et al. focused on "undisclosed flexibility in data collection ...

  15. Google Scholar

    Google Scholar provides a simple way to broadly search for scholarly literature. Search across a wide variety of disciplines and sources: articles, theses, books, abstracts and court opinions.

  16. Observation, Experiment, and Scientific Practice

    Surprisingly, the distinction between observation and experiment is comparatively less discussed in the philosophical literature, despite its importance to the scientific community and beyond. ... Web of Science ® Google Scholar. Colaço, D. 2020. "Recharacterizing Scientific Phenomena." European Journal for Philosophy of Science 10 (2): 1 ...

  17. The philosophy of scientific experimentation: a review

    Practicing and studying automated experimentation may benefit from philosophical reflection on experimental science in general. This paper reviews the relevant literature and discusses central issues in the philosophy of scientific experimentation. The first two sections present brief accounts of the rise of experimental science and of its philosophical study. The next sections discuss three ...

  18. Who stressed the use of experiments and observation in seeking

    Bacon stressed experimentation and observation. He wanted science to make life better for people by leading to practical technologies. Descartes emphasized human reasoning as the best road to ...

  19. WH Ch 14, S 5 Review Flashcards

    Study with Quizlet and memorize flashcards containing terms like Francis Bacon, Nicolaus Copernicus, gravity and more.

  20. The Relative Merits of Observational and Experimental Research: Four

    1. Introduction 'Does A cause B'? is one of the most common questions that is asked within nutrition research. Usually 'A' is a dietary pattern, and 'B' is a health, development or morbidity outcome [].In agricultural nutrition, the standard approach to such questions is to use a randomised experimental design [].These research tools were in fact developed within agricultural ...

  21. Experimental Study: Design of Experiments

    The restriction to two levels for all factors makes a minimum of observations possible for a complete factorial experiment with all two-way and higher-order interactions. 15.2.10 Confounding. If the number of factors or levels increase in a factorial experiment, then the number of treatment combinations increases rapidly.

  22. Introduction: Emancipation from Metaphysics? Natural ...

    It considers the evolving dynamics among these disciplines as well as the role played by natural history in modern hermeneutics and aesthetics. The collected papers examine how early modern natural history acquires a growing importance in the study of nature, while observation and experiment gain epistemic priority among experimental philosophers.

  23. Image Analysis, Homogenization, Numerical Simulation and Experiment as

    Two steps are involved : •microscopic observation and experiments to predict the macroscopic properties •use of the macroscopic properties (the previous step + experiments carried our at the macroscopic scale) to predict the drying behavior. In this approach, each prediction comes from a model, that means assumptions, formulations ...

  24. Buckling Failure of Multi-layer Surrounding Rock: Insights from

    The observation of sandstone samples with the optical microscope of single and orthogonal polarization, as shown in Fig. 2, it can be observed that the average particle size of the sandstone samples is about 80-250μm, and it is the fine feldspathic sandstone, mainly composed of feldspar (45%), quartz (30%), rock debris + mica (10%), clay ...