Imre Lakatos

Falsification and the Methodology of Scientific Research Programmes

A METHODOLOGY OF SCIENTIFIC RESEARCH PROGRAMMES

I have discussed the problem of objective appraisal of scientific growth in terms of progressive and degenerating problemshifts in series of scientific theories. The most important such series in the growth of science are characterized by a certain continuity which connects their members. This continuity evolves from a genuine research programme adumbrated at the start. The programme consists of methodological rules: some tell us what paths of research to avoid (negative heuristic), and others what paths to pursue (positive heuristic).1

Even science as a whole can be regarded as a huge research programme with Popper's supreme heuristic rule : `devise conjectures which have more empirical content than their predecessors.' Such methodological rules may be formulated, as Popper pointed out, as metaphysical principles.2 For instance, the universal anti-conventiolnalist rule against exception-barring may be stated as the metaphysical principle: `Nature does not allow exceptions'. This is why Watkins called such rules `influential metaphysics'.3

But what I have primarily in mind is not science as a whole, but rather particular research programmes, such as the one known as `Cartesian metaphysics'. Cartesian metaphysics, that is, the mechanistic theory of the universe-according to which the universe is a huge clockwork (and system of vortices) with push as the only cause of motion-functioned as a powerful heuristic principle. It discouraged work on scientific theories - like (the `essentialist' version of) Newton's theory of action at a distanee-which were inconsistent with it (negative heuristic). On the other hand, it encouraged work on auxiliary hypotheses which might have saved it from apparent counterevidence -like Keplerian ellipses (positive heuristic).4

(a) Negative heuristic: the `hard core' of the programme.

All scientific research programmes may be characterized by their `hard core'. The negative heuristic of the programme forbids us to direct the modus tollens at this `hard core'. Instead, we must use our ingenuity to articulate or even invent `auxiliary hypotheses', which form a protective belt around this core, and wye must redirect the modus tollens to these. It is this protective belt of auxiliary hypotheses which has to bear the brunt of tests and get adjusted and re-adjusted, or even conmpletely replaced, to defend the thus-hardened core. A research programme is successful if all this leads to a progressive problemshift; unsuccessful if it leads to a degenerating problemshift.

The classical example of a successful research programme is Newton's gravitational theory : possibly the most successful research programme ever. When it was first produced, it was submerged in an ocean of `anomalies' (or, if you wish, `counterexamples'5), and opposed by the observational theories supporting these anomalies. But Newtonians turned, with brillianttenacity and ingenuity, one counter-instance after another into corroborating instances, primarily by overthrowing the original observational theories in the light of which this `contrary evidence' was established. In the process they themselves produced new counter-examples which they again resolved. They `turned each new dificulty into a new victory of their programme.6

In Newton's programme the negative heuristic bids us to divert the

modus tollens from Newton's three laws of dynamics and his law of gravitation. This `core' is `irrefutable' by the methodological decision of its protagonists: anomalies must lead to changes only in the `protective' belt of auxiliary, `observational' hypothesis and initial conditions7.

I have given a contrived micro-example of a progressive Newtonian problemshift.8 If we analyse it, it turns out that each successive link in this exercise predicts some new fact; each step represents an increase in empirical content: the example constitutes a consistently progressive theoretical shift. Also, each prediction is in the end verified ; although on three subsequent occasions they may have seemed momentarily to be `refuted'.9

While `theoretical progress' (in the sense here deseribed) may be verified immediately,10 `empirical progress' cannot, and in a research programme we may be frustrated by a long series of `refutations' before ingenious and lucky content-inereasing auxiliary hypotheses turn a chain of defeats -with hindsight- into a resounding success story, either by revising some false `facts' or by adding novel auxiliary hypotheses. We may then say that we must require that each step of a research programme be consistently ontent-inereasing: that each step constitute a consistently progressive heoretical problemshift. All we need in addition to this is that at least every now and then the increase in content should be seen to be retrospectively corroborated: the programme as a whole should also display an intermittently progressive emnpirical shift. We do not demand that each step produce immediately an observed new fact. Our term `intermittently' gives sufficient rational scope for dogmatic adherence to a programme in face of prima facie `refutations'.

The idea of `negative heuristic' of a scientific research programme rationalizes classical conventionalism to a considerable extent. We may rationally decide not to allow `refutations' to transmit falsity to the hard core as long as the corroborated empirical content of the protecting belt of auxiliary hypotheses inerceases. But our approach differs from PoincarU's justificationist conventionalism in the sense that, unlike PoincarU's, we maintain that if and when the programme ceases to anticipate novel facts, its hard core might have to be abandoned: that is, our hard core, unlike PoinearU's, may crumble under certain conditions. In this sense we side with Duhem who thought that such a possibility must be allowed for11 but for Duhem the reason for such crumbling is purely aesthetic11, while for us it is mainly logical and empirical.

(b) Positive heuristic: the construction of the `protective belt' and the relative autonomy of theoretical science.

Research programmes, besides their negative heuristic, are also characterized by their positive heuristic.

Even the most rapidly and consistently progressive research programmes can digest their `counter-evridence' only piecemeal: anomalies are never completely exhausted. But it should not be thought that yet unexplained anomalies-`puzzles' as Kuhn might call them- are taken in random order, and the protective belt built up in an eclectic fashion, without any preconceived order. The order is usually decided in the theoreticials's cabinet, independently of the known anomalies. Few theoretical scientists engaged in a research programme pay undue attention to `refutations'. They have a long-term research policy which anticipates these refutations. This research policy, or order of research, is set out-in more or less detail-in the positive heuristic of the research programme. The negative heuristic specifies the `hard core' of the programme which is `irrefutable' by the methodological decision of its protagonists; the positive heuristic consists of a partially articulated set of suggestions or hints on how to change, develop the `refutable variants' of the research-programme, how

to modify, sophisticate, the `refutable' protective belt.

The positive heuristic of the programme saves the scientist from becoming confused by the ocean of anomalies. The positive heuristic sets out a programme wwhich lists a chain of ever more complicated models simulating reality : the scientist's attention is riveted on building his models following instructions which are laid down in the positive part of his programme. He ignores the actual counterexamples, the available `data'13. Newton first worked out his programme for a planetary system with a fixed point-like sun and one single point-like planet. It was in this model that he derived his inverse square lavw for Kepler's ellipse. But this model was forbidden by Newton's own third law of dynamics, therefore the model had to be replaced by one in which both sun and planet revolved round their common centre of gravity. This change was not motivated by any observation (the data did not suggest an `anomaly' here) but by a theoretical difficulty in developing the programme. Then he worked out the programme for more planets as if there were only heliocentric but no interplanetary forces. Then he worked out the case where the sun and planets were not mass-points but mass-balls. Again, for this change he did not need the observation of an anomaly ; infinite density was forbidden by an (inarticulated) touchstone theory, therefore planets had to be extended. This change involved considerable mathematical difficulties, held up Newton's work-and delayed the publication of the Principia by more than a decade. Having solved this `puzzle',he started work on spinning balls and their wwobbles. Then he admitted interplanetary forces and started work on perturbations. At this point he started to look more anxiously at the facts. Many of them were beautifully explained (qualitatively) by this model, many were not. It was then that he started to work on bulging planets, rather than round planets, etc.

Newton despised people who, like Hooke, stumbled on a first naive model but did not have the tenacity and ability to develop it into a research programme, and who thought that a first version, a mere aside, constituted a `discovery'. He held up publication until his programme had achieved a remarkable progressive shift.14

Most, if not all, Newtonian puzzles leading to a series of new variants superseding each other, were forseeable at the time of Newton's first naive model and no doubt Newton and his colleagues did forsee them: Newton must have been fully aware of the blatant falsity of his first variants.15 \Nothing shows the existence of a positive heuristic of a research programme clearer than this fact: this is why one speaks of `models' in research programmes. A `mlodel' is a set of initial conditions (possibly together with some of the observational theories) which one knows is bound to be replaced during the further development of the programme, and one even knows,more or less, how. This shows once more how irrelevant `refutations' of any specific variant are in a research programme : their existence is fully expected, the positive heuristic is there as the strategy both for predicting (producing) and digesting them. Indeed, if the positive heuristic is clearly spelt out, the difficulties of the programme are mathematical rather than empirical.16

One may formulate the `positive heuristic' of a research programme as a `metaphysical' principle. For instance one may formulate Newton's programme like this : `the planets are essentially gravitating spinning-tops of roughly spherical shape'. This idea vwas never rigidly maintained: the planets are not just gravitational, they have also, for example, electromagnetic characteristics which may influence their motion. Positive heuristic is thus in general more flexible than negative heuristic. Moreover, it occasionally happens that when a research programme gets into a degenerating phase, a little revolution or a creative shift in its positive heuristic may push it forward again.17 It is better therefore to separate the `hard core' from the more flexible metaphysical principles expressing the positive heuristic.

Our considerations shovw that the positivre heuristic forges ahead wvith almost complete disregard of `refutations': it may seem that it is the `verifications'18 rather than the refutations which provide the contact points with reality.Although one must point out that any `verification' of the n+ 1-th version of the programme is a refutation of the n-th version, we cannot deny that some defeats of the subsequent versions are always foreseen: it is the `verifications' which keep the programme going, recalcitrant instances notwithstanding.

We may appraise research programmes, even after their `elimination', for their heuristic power: how many new facts did they produce, how great was their capacity to explain their refutations in the course of their growth'?19 (We may also appraise them for the stimulus they gave to mathelnatics. The real difficulties for the theoretical scientist arise rather from the mathematical difficulties of the programme than from anomalies.The greatness of the Newtonian programme comes partly from the development- by Newtonians- of classical infinitesimal analysis which was a crucial precondition of its success.)

Thus the methodology of scientific research programmes accounts for the relative autonomy of theoretical science: a historical fact whose rationality cannot be explained by the earlier falsificationists. Which problems scientists working in powerful research programmes rationally choose, is determined by the positive heuristic of the programme rather than by psychologically worrying (or technologically urgent) anomalies. The anomalies are listed but shoved aside in the hope that they will turn, in due course, into corroborations of the programme. Only those scientists have to rivet their attention on anomalies who are either engaged in trial- and-error exercises20 or who work in a degenerating phase of a research programme when the positive heuristic ran out of steam.(All this, of course, must sound repugnant to naive falsificationists who hold that once a theory is 'refuted' by experiment (by their rule book), it is irrational (and dishonest) to develop it further: one has to replace the old `refuted' theory by a new, unrefuted one.)

(d) A new look at crucial experiments: the end of instant rationality.

It would be wrong to assume that one must stay with a research programme until it has exhausted all its heuristic power, that one must not introduce a rival programme before everybody agrees that the point of degeneration has probably been reached. (Although one can understand the irritation of a physicist when, in the middle of the progressive phase of a research programme, he is confronted by a proliferation of vague metaphysical theories stimulating no empirical progress.21) One must newr allow a research programme to become a Weltanschauung, or a sort of scientific rigour, setting itself up as an arbiter between explanation and non-explanation, as mathematical rigour sets itself up as an arbiter between proof and non

proof. Unfortunately this is the position which Kuhn tends to advocate: indeed, what he calls 'normal science' is nothing but a research programme that has achieved monopoly. But, as a matter of fact, research programmes have achieved complete monopoly only rarely and then only for relatively short periods, in spite of the efforts of some Cartesians, Newtonians and Bohrians. The history of science has been and should be a history of competing research programnmes (or, ifyou wish, ,paradigmms'), but it has not been and must not become a succession of periods of noormal science: tlle sooner comnpetition starts, the better for progress. `Theoretical pluralism' is better than `theoretical monism': on this point Popper and Feyerabend are right and Kuhn is wrong.22

The idea of competing scientific research programmes leads us to the problem: how are research programmnes eliminated? It has transpired from our previous considerations that a degenerating problemshift is no more a sufficient reason to eliminate a research programme than some oldfashioned 'refutation' or a Kuhnian 'crisis'. Can there be amly objective (as opposed to socio-psychological) reason to reject a prugramme, that is, to eliminate its hard core and its programme for constructing protective belts? Our answer, in outline, is that such an objective reason is provided by a rival research programme which explains the previous success of its rival and supersedes it by a further display of heuristic power.23

However, the criterion of `heuristic power' strongly depends on how we construe `factual novelty'. Until now we have assumed that it is immediately ascertainable whether a new theory predicts a novel fact or not24 But the novelty of a factlual proposition can frequently be seen only after a long period has elapsed. In order to show this, I shall start with an example.

Bohr's theory logically implied Balmer's formula for hydrogen lines a consequence25. Was this a novel fact? One might have been tempted to deny this, since after all, Balmer's formula was well- known. But this is a half-truth. Balmer merely `observed' B1: that hydrogen lines obey the Balmer formula. Bohr predicted B2: that the differences in the energy levels in different orbits of the hydrogen electron obey the Balmer formula. Now one may say that B1 already contains all the purely `observational' content of B2. But to say this presupposes that there can be a pure `observational level', untainted by theory, and impervious to theoretical change. In fact, B1 was accepted only because the optical, chemical and other theories applied by Balmer were well corroborated and accepted as interpretative theories ; and these theories could alvways be questioned. It might be argued that we can `purge' even B1 of its theoretical presuppositions, and arrive at what Balmer really `observed', which might be expressed in the more modest assertion, B0: that the lines emitted in certain tubes in certain wellspecified circumstances (or in the course of a `controlled experiment'26) obey the Balmer formula. Now some of Popper's argunments show that we can never arrive at any hard `observational' rock-bottom in this way; `observational' theories can easily be shown to be involved in B0 .27 On the other hand, given that Bohr's programme after a long progressive development, had shown its heuristic power, its hard core would itself have become well corroborated28 and therefore qualified as an `observational' or interpretative theory. But then B2 will be seen not as a mere theoretical reinterpretation of B1, but as a new fact in its own right.

These considerations lend new emphasis to the hindsight element in our appraisals and lead to a further liberalization of our standards. A new research programme which has just entered the competition may start by explaining `old facts' in a novel way but may take a very long time before it is seen to produce `genuinely novel' facts. For instance, the kinetic theory of heat seemed to lag behind the results of the phenomenological theory for decades before it finally overtook it with the Einstein-Smoluchowski theory of Brownian motion in l905. After this, what had previously seemed a speculativre reinterpretation of old facts (about heat, etc.) turned out to be a discovery of novel facts (about atoms).

All this suggests that we must not discard a budding research programme simply because it has so far failed to overtake a powerful rival. We should not abandon it if, supposimg its rival were not there, it would constitute a progressive problemshift29.And we should certainly regard a newly interpreted fact as a new fact, ignoring the insolent priority claims of amateur fact collectors. As long as a budding research programme can be rationally reconstructed as a progressive problemshift, it should be sheltered for a while from a powerful established rival.30

These considerations, on the whole, stress the importance of methodological tolerance, and leave the question of how research programmes are eliminated still unanswered. The reader may even suspect that laying this much stress on fallibility liberalizes or, rather, softens up, our standards to the extent that we will be landed with radical scepticism. Even the celebrated `crucial experiments' will then have no force to overthrow a research programme; anything goes.31

But this suspicion is unfounded. Within a research programme `minor crucial experiments' between subsequent versions are quite common. Experinments easily `decide' between the n-th and n+ 1-th scientific version, since the n+ 1-th is not only inconsistent with the n-th, but also supersedes it. If the n+ 1-th version has more corroborated content in the light of the same programme and in the light of the same well corroborated observational theories elimination is a relatively routine affair (only relatively, for even here this decision may be subject to appeal). Appeal procedures too are occasionally easy: in many cases the challenged observational theory, far from bein well corroborated, is in fact an inarticulate, naive, hidden assumption; it is only the challenge which reveals the existence of this hidden assumption, and brings about its articulation, testing and downfall. Time and again, however, the observational theories are themselves embedded in some research progranmme and then the appeal procedure leads to a clash between two research programmes : in such cases we may need a `major crucial experiment'.

When two research programmes compete, their first `ideal' models usually deal with different aspects of the domain (for example, the firstmodel of Newton's semi-corpuscular optics described light-refraction, the first model of Huyghens's wave optics light-interference). As the rival research programmes expand, they gradually encroach on each other's territory and the n-th version of the first will be blatantly, dramatically inconsistent with the m-th version of the second.32 An experiment is repeatedly performed, and as a result, the first is defeated in this battle, while the second wins. But the war is not over: any research programme is allowed a few such defeats. All its needs for a comeback is to produce an n+1-th (or n+k-th) content-increasing version and a verification of some of its novel content.

If such a comeback, after sustained effort, is not forthcoming, the war is lost and the original experiment is seen, with hindsight, to have been `crucial'. But especially if the defeated programme is a young, fast-developing programme, and if we decide to give sufficient credit to its `prescientific' successes, allegedly crucial experiments dissolve one after the other in the wake of its forward surge. Even if the defeated programme is an old, established and `tired' programme, near its `natural saturation point',33 it may continue to resist for a long time and hold out with ingenious content-increasing innovations even if these are unrewarded with empirical success. It is very difficult to defeat a research programme supported by talented, imaginative scientists. Alternatively, stubborn defenders of the defeated programme may offer ad hoc explanations of the experiments or a shrewd ad hoc `reduction' of the victorious programme to the defeated one. But such efforts we should reject as unscientific.34

Our considerations explain why crucial experiments are seen to be crucial only decades later. Kepler's ellipses were generally admitted as crucial evidence for Newton and against Descartes only about one hundred years after Newton's claim. The anomalous behaviour of Mercury's perihelion was known for decades as one of the many yet unsolved difficulties in Newton's programme; but only the fact that Einstein's theory explained it better transfornmed a dull anomaly into a brilliant `refutation' of Newton's research programme.35 Young claimed that his double-slit experiment of 1802 was a crucial experiment between the corpuscular and the wave programmes of optics; but his claim whas only acknowledged much later, after Fresnel developed the wave programme nmuch further `progressively' and it became clear that the Newtonians could not match its heuristic power. The anomaly, which had been known for decades, received the honorific title of refutation, the experiment the honorific title of `crucial experiment' only after a long period of uneven development of the two rival programmes. Brownian motion was for nearly a century in the middle of the battlefeld before it was seen to defeat the phenomenological research programme and turn the war in favour of the atomists. Michelson's `refutation' of the Balmer series was ignored for a generation until Bohr's triumphant research programme backed it up.

It may be wrorthwhile to discuss in detail some examples of experiments whose `crucial' character became evident only retrospectively. First I shall take the celebrated Michelson-Morley experiment of 1887 which allegedly falsified the ether theory and `led to the theory of relativity', then the Lummer-Pringsheim experiments which allegedly falsified the classical theory of radiation and `led to the quantum theory'. Finally I shall discuss an experiment which many physicists thought would turn out to decide against the conservation lawvs but which, in fact, ended up as their most triumphant corroboration.

Notices:

1 One may point out that the negative and positive heuristic gives a rough (implicit) definition of the `conceptual framewbork' (and consequently of the language). The recognition that the history of sciencc is the history of research programmes rather than of theories may therefore be seen as a partial vindication of the view that the history of science is the history of conceptual frameworks or of scientific languages.

2 Popper [1934], sections 11 and 7uo. I use `metaphysical' as a technical ternl of naive falsificationism : a contingent proposition is `metaphysical' if it has no,potential falsifiers'

3 Watkins (1g58). Watkins cautions that `the logical gap between statements and prescriptions in the metaphysical-methodological field is illustrated by the fact that a person may reject a [metaphysical) doctrine in its fact-stating form wrhile subseribing to the preseriptive version of it' (Ibid. pp. 356-7).

4 For this Cartesian research programme,cf. Popper[1958)andWatkins(1g58],pp.35o-I.

5 For the clarification of the concepts of 'counterexample' and `anomaly' cf. abovc, p.11 o, and especially below, p. r59, text to footnote 1.

6 Laplace [r7ugo], livre IV, chapter ii.

7 The actual hard core of a programme does not actually emerge fully armed like Athene from the head of Zeus. It develops slowrly, by a lolig, preliminary process of trial and error. In this paper this process is not discussed.

8 Cf. nbove, pp. 1oo-1. For meal examples, cf. my [19731.

9 Thc 'refutatioli was each time successfully diverted to `hidden lemmas'; that is, to lemmas emerging, as it were, from the ceter2s paribus clause.

10 But cf. below, pp. 155-7.

11 Cf. nbove, p. 1o5.

12 Ibid.

13 If a scientist (or mathematician) has a positive heuristic, he refuses to be drawn into observation. He will `lie down on hia couch, shut his eyes and forget about the data'.

(Cf. my [1963-4), especiaDy pp. 3oo ff.,wherc there is a detailed casc study of such a programme.) Occasionally, of course, he will ask Nature a shrewd question : he will then be encouraged by Nature's YES, but not discouraged by its NO.

14/ Reichenbach, follovwing Cajori, gives a different explanation of what delayed Newton in the publication of his Principia : 'To his disappointment he found that the observational results disagreed with his calculations. Rather than set any theory, however beautiful, before the facts, Newton put the manuscript of his theory into his drawer. Some twenty years later, after new measurements of the circumference of the earth had been made by a French expedition, Newton saw that the figures on which he had based his test were false and that the improved figures agreed with his theoretical calculation. It was only after this test that he published his law. . . The story of Mhewton is one of the most striking illustrations of the method of modem science' (Reichenbach [r951), pp. ro1-z). Feyerabend critirizes Reichenbach's account (Feyerabend [1g65), p. 22g), but does not

give an altemative rationalr.

15/ For a further discussion of Newton's research programme, cf. my [1gr3].

16/ For this point cf. TruesdSl! [ly6o).

17/ Soddy's contribution to Prout's programme or Pauli's to Bohr's (old quantum theory) programme are typical examples of such creative shifts.

18/ A 'verification' is a corroboration of excess content in the expanding programme.But, of course,a 'verification' does not verify a programme : it shows only its heuristic power.

19/ Cf.my [1963-4),PP.324-3o.Unfortunately in 1g63-4I had not yet made a clear terminological distinetion betvwreen theories and research programmes,and this impaired my exposition of a research programnme in informal,qlmasi-empirical mathematics.There are fewher such shortcomings in my,[19;4].

20/ Cf.beloauv,p.1?5.

footnotes

21/ This is what must have irritated N Newton most in the `sceptical proliferation of theories'

by Cartesians. Cf. my [h973]o

22/ Nevertheless there is something to be said for at least some people sticking to a research

programme until it reaches its `saturation point' ; a new programme is then challenged to

account for the full success of the old. It is no argument against this that the rival may,

when it was flrst proposed, already have explained al the success of the first programme ;

the growth of a research programe cannot be predicted-it may stimulate important

unforeseeable auxiliary theories of its own. Also, if a version An of a research programme

P1 is mathematically equivalent to a version Am of a rival P2, one should develop both:

their heuristic strength can still be very different.

23/ I use 'heuristic power' here as a technical term to characterize the power of a research

programme to anticipate theoretically novel facts in its grovvth. I could of course thse

'explanatorypower': cE above, p.1t9, footnote 1.

24/ Cf. nbove, p. hhb, text to footnote z, and p. h34, text to footnote 3.

25/ Cf. above, p.147.

26/ Cf. above, p.111, footnote 6.

27/ One of Popper's arguments is particularly important: 'There is a widespread belief

that the statement "I see that this table here is white", possesses some profound advantage

over the statement "This table here is white", from the point of view of epistemology.

But from the point of view of evaluating its possible objective tests, the first statement, in

speaking about me, does not appear more secure than the second statement, which speaks

about the table here' ([1934, section 27). M Neurath makes a characteristically blockheaded

comment on this passage: 'For us such protocol statements have the advantage of having

mome stabilmty. One may retain the statement: "People in the loth century saw fiery swords

in the sky" while crossing out "There were fiery swords in the sky" ' (Neurath (1935, Po 362).

28/ This memark, inctdentally, defines a 'degree of comroboration' fom the 'irrefsutable' hard cores

oJ research programmes. Newton's theoryo (in isolation) hnd no rmpiricnl content, yet it was,

im this sense, highlyo corroborated.

29/ Incidentally, in the methodology of research progw-ammes, the pragmatic meaning of

'rejection' (of a programme) becomes crvystal clear : it means the decision to cease working on

it.

30/ Some might regardcautiously-this sheltered period of development as `presc2entific'

(or `theoretical') ; and be prepared only when it starts producing 'genuinely novel' facts to

recognize its truly scientlfic (or 'empirical') character-but then their recognition will have

to be retroactive.

31/ Incidentally, this con Jfict between fallibllity and criticis,n can be rhglmtly said to be thc

mnain Qroblem-and drih;ing force-of the Popperian research programme in the theory of

lknowledge.

32/ An especially interesting case of such competition is competitive symbiosis, when a

new programe is grafted on to an old one which is inconsistent with it; cf. above, p.142

33/ There is not such thing as a natural 'saturation point'; in my [1963-4), especially on

pp. 327-8, I was more of a Hegelian, and I thought there was; now I use the expression

with an ironical emphasis. There is no predictable or ascertainable limitation on human

imagination in inventing new, content-increasing theories or on the 'cunning of reason'

(List der Vertmunft) in rewarding them with some empirical success even if they are false

or even if the new theory has less verisimilitude-in Popper s sense-than its predecessor.

(Probably all scientific theories ever uttered by men will be false : they still mav, 5e revvoarded

by empirical successes and even have increasing verisimilitude.)

34/ For an example, cf. above, p.126, footnote 2.

Falsification and the Methodology of Scientific Research Programmes

Imre Lakatos

(d) A new look at crucial experiments: the end of instant rationality.

It would be wrong to assume that one must stay with a research programme until it has exhausted all its heuristic power, that one must not introduce a rival programme before everybody agrees that the point of degeneration has probably been reached. (Although one can understand the irritation of a physicist when, in the middle of the progressive phase of a research programme, he is confronted by a proliferation of vague metaphysical theories stimulating no empirical progress.21) One must newr allow a research programme to become a Weltanschauung, or a sort of scientific rigour, setting itself up as an arbiter between explanation and non-explanation, as mathematical rigour sets itself up as an arbiter between proof and non

proof. Unfortunately this is the position which Kuhn tends to advocate: indeed, what he calls 'normal science' is nothing but a research programme that has achieved monopoly. But, as a matter of fact, research programmes have achieved complete monopoly only rarely and then only for relatively short periods, in spite of the efforts of some Cartesians, Newtonians and Bohrians. The history of science has been and should be a history of competing research programnmes (or, ifyou wish, ,paradigmms'), but it has not been and must not become a succession of periods of noormal science: tlle sooner comnpetition starts, the better for progress. `Theoretical pluralism' is better than `theoretical monism': on this point Popper and Feyerabend are right and Kuhn is wrong.22

The idea of competing scientific research programmes leads us to the problem: how are research programmnes eliminated? It has transpired from our previous considerations that a degenerating problemshift is no more a sufficient reason to eliminate a research programme than some oldfashioned 'refutation' or a Kuhnian 'crisis'. Can there be amly objective (as opposed to socio-psychological) reason to reject a prugramme, that is, to eliminate its hard core and its programme for constructing protective belts? Our answer, in outline, is that such an objective reason is provided by a rival research programme which explains the previous success of its rival and supersedes it by a further display of heuristic power.23

However, the criterion of `heuristic power' strongly depends on how we construe `factual novelty'. Until now we have assumed that it is immediately ascertainable whether a new theory predicts a novel fact or not24 But the novelty of a factlual proposition can frequently be seen only after a long period has elapsed. In order to show this, I shall start with an example.

Bohr's theory logically implied Balmer's formula for hydrogen lines a consequence25. Was this a novel fact? One might have been tempted to deny this, since after all, Balmer's formula was well- known. But this is a half-truth. Balmer merely `observed' B1: that hydrogen lines obey the Balmer formula. Bohr predicted B2: that the differences in the energy levels in different orbits of the hydrogen electron obey the Balmer formula. Now one may say that B1 already contains all the purely `observational' content of B2. But to say this presupposes that there can be a pure `observational level', untainted by theory, and impervious to theoretical change. In fact, B1 was accepted only because the optical, chemical and other theories applied by Balmer were well corroborated and accepted as interpretative theories ; and these theories could alvways be questioned. It might be argued that we can `purge' even B1 of its theoretical presuppositions, and arrive at what Balmer really `observed', which might be expressed in the more modest assertion, B0: that the lines emitted in certain tubes in certain wellspecified circumstances (or in the course of a `controlled experiment'26) obey the Balmer formula. Now some of Popper's argunments show that we can never arrive at any hard `observational' rock-bottom in this way; `observational' theories can easily be shown to be involved in B0 .27 On the other hand, given that Bohr's programme after a long progressive development, had shown its heuristic power, its hard core would itself have become well corroborated28 and therefore qualified as an `observational' or interpretative theory. But then B2 will be seen not as a mere theoretical reinterpretation of B1, but as a new fact in its own right.

These considerations lend new emphasis to the hindsight element in our appraisals and lead to a further liberalization of our standards. A new research programme which has just entered the competition may start by explaining `old facts' in a novel way but may take a very long time before it is seen to produce `genuinely novel' facts. For instance, the kinetic theory of heat seemed to lag behind the results of the phenomenological theory for decades before it finally overtook it with the Einstein-Smoluchowski theory of Brownian motion in l905. After this, what had previously seemed a speculativre reinterpretation of old facts (about heat, etc.) turned out to be a discovery of novel facts (about atoms).

All this suggests that we must not discard a budding research programme simply because it has so far failed to overtake a powerful rival. We should not abandon it if, supposimg its rival were not there, it would constitute a progressive problemshift29.And we should certainly regard a newly interpreted fact as a new fact, ignoring the insolent priority claims of amateur fact collectors. As long as a budding research programme can be rationally reconstructed as a progressive problemshift, it should be sheltered for a while from a powerful established rival.30

These considerations, on the whole, stress the importance of methodological tolerance, and leave the question of how research programmes are eliminated still unanswered. The reader may even suspect that laying this much stress on fallibility liberalizes or, rather, softens up, our standards to the extent that we will be landed with radical scepticism. Even the celebrated `crucial experiments' will then have no force to overthrow a research programme; anything goes.31

But this suspicion is unfounded. Within a research programme `minor crucial experiments' between subsequent versions are quite common. Experinments easily `decide' between the n-th and n+ 1-th scientific version, since the n+ 1-th is not only inconsistent with the n-th, but also supersedes it. If the n+ 1-th version has more corroborated content in the light of the same programme and in the light of the same well corroborated observational theories elimination is a relatively routine affair (only relatively, for even here this decision may be subject to appeal). Appeal procedures too are occasionally easy: in many cases the challenged observational theory, far from bein well corroborated, is in fact an inarticulate, naive, hidden assumption; it is only the challenge which reveals the existence of this hidden assumption, and brings about its articulation, testing and downfall. Time and again, however, the observational theories are themselves embedded in some research progranmme and then the appeal procedure leads to a clash between two research programmes : in such cases we may need a `major crucial experiment'.

When two research programmes compete, their first `ideal' models usually deal with different aspects of the domain (for example, the firstmodel of Newton's semi-corpuscular optics described light-refraction, the first model of Huyghens's wave optics light-interference). As the rival research programmes expand, they gradually encroach on each other's territory and the n-th version of the first will be blatantly, dramatically inconsistent with the m-th version of the second.32 An experiment is repeatedly performed, and as a result, the first is defeated in this battle, while the second wins. But the war is not over: any research programme is allowed a few such defeats. All its needs for a comeback is to produce an n+1-th (or n+k-th) content-increasing version and a verification of some of its novel content.

If such a comeback, after sustained effort, is not forthcoming, the war is lost and the original experiment is seen, with hindsight, to have been `crucial'. But especially if the defeated programme is a young, fast-developing programme, and if we decide to give sufficient credit to its `prescientific' successes, allegedly crucial experiments dissolve one after the other in the wake of its forward surge. Even if the defeated programme is an old, established and `tired' programme, near its `natural saturation point',33 it may continue to resist for a long time and hold out with ingenious content-increasing innovations even if these are unrewarded with empirical success. It is very difficult to defeat a research programme supported by talented, imaginative scientists. Alternatively, stubborn defenders of the defeated programme may offer ad hoc explanations of the experiments or a shrewd ad hoc `reduction' of the victorious programme to the defeated one. But such efforts we should reject as unscientific.34

Our considerations explain why crucial experiments are seen to be crucial only decades later. Kepler's ellipses were generally admitted as crucial evidence for Newton and against Descartes only about one hundred years after Newton's claim. The anomalous behaviour of Mercury's perihelion was known for decades as one of the many yet unsolved difficulties in Newton's programme; but only the fact that Einstein's theory explained it better transfornmed a dull anomaly into a brilliant `refutation' of Newton's research programme.35 Young claimed that his double-slit experiment of 1802 was a crucial experiment between the corpuscular and the wave programmes of optics; but his claim whas only acknowledged much later, after Fresnel developed the wave programme nmuch further `progressively' and it became clear that the Newtonians could not match its heuristic power. The anomaly, which had been known for decades, received the honorific title of refutation, the experiment the honorific title of `crucial experiment' only after a long period of uneven development of the two rival programmes. Brownian motion was for nearly a century in the middle of the battlefeld before it was seen to defeat the phenomenological research programme and turn the war in favour of the atomists. Michelson's `refutation' of the Balmer series was ignored for a generation until Bohr's triumphant research programme backed it up.

It may be wrorthwhile to discuss in detail some examples of experiments whose `crucial' character became evident only retrospectively. First I shall take the celebrated Michelson-Morley experiment of 1887 which allegedly falsified the ether theory and `led to the theory of relativity', then the Lummer-Pringsheim experiments which allegedly falsified the classical theory of radiation and `led to the quantum theory'. Finally I shall discuss an experiment which many physicists thought would turn out to decide against the conservation lawvs but which, in fact, ended up as their most triumphant corroboration.

footnotes

21 This is what must have irritated Newton most in the `sceptical proliferation of theories' by Cartesians.

22 Nevertheless there is something to be said for at least some people sticking to a research programme until it reaches its `saturation point' ; a new programme is then challenged to account for the full success of the old. It is no argument against this that the rival may, when it was first proposed, already have explained all the success of the first programme ; the growth of a research programme cannot be predicted-it may stimulate important unforeseeable auxiliary theories of its own. Also, if a version Al of a research programme P1 is mathematically equivalent to a version Am of a rival P2, one should develop both: their heuristic strength can still be very different.

23 I use 'heuristic power' here as a technical term to characterize the power of a research programme to anticipate theoretically novel facts in its growth. I could of course these 'explanatory power': cE above, p.1t9, footnote 1.

24 Cf. above, p. hhb, text to footnote z, and p. h34, text to footnote 3.

25/ Cf. above, p.147.

26/ Cf. above, p.111, footnote 6.

27/ One of Popper's arguments is particularly important: 'There is a widespread belief that the statement "I see that this table here is white", possesses some profound advantage over the statement This table here is white", from the point of view of epistemology. But from the point of view of evaluating its possible objective tests, the first statement, in speaking about me, does not appear more secure than the second statement, which speaks about the table here' ([1934, section 27). M Neurath makes a characteristically blockheaded comment on this passage: 'For us such protocol statements have the advantage of having mome stability. One may retain the statement: "People in the l6th century saw fiery swords in the sky" while crossing out "There were fiery swords in the sky" ' (Neurath (1935, Po 362).

28/ This memark, incidentally, defines a 'degree of corroboration' fom the 'irrefutable' hard cores of research programmes. Newton's theory (in isolation) had no empirical content, yet it was, im this sense, highly corroborated.

29/ Incidentally, in the methodology of research programmes, the pragmatic meaning of 'rejection' (of a programme) becomes crystal clear : it means the decision to cease working on it.

30/ Some might regard cautiously - this sheltered period of development as `prescientific' (or `theoretical') ; and be prepared only when it starts producing 'genuinely novel' facts to recognize its truly scientlfic (or 'empirical') character - but then their recognition will have to be retroactive.

31/ Incidentally, this conflict between fallibility and criticis, can be rightly said to be the main Problem - and drifting force - of the Popperian research programme in the theory of knowledge.

32/ An especially interesting case of such competition is competitive symbiosis, when a new programme is grafted on to an old one which is inconsistent with it; cf. above, p.142

33/ There is not such thing as a natural 'saturation point'; in my [1963-4), especially on pp. 327-8, I was more of a Hegelian, and I thought there was; now I use the expression with an ironical emphasis. There is no predictable or ascertainable limitation on human imagination in inventing new, content-increasing theories or on the 'cunning of reason' (List der Vernunft) in rewarding them with some empirical success even if they are false or even if the new theory has less verisimilitude-in Popper's sense- than its predecessor. (Probably all seientific theories ever uttered by men will be false : they still may, be rewarded by empirical successes and even have increasing verisimilitude.)

34/ For an example, cf. above, p.126, footnote 2.

4) Concllusion. The requirement of continuous growth.

There are no such things as crucial experiments, at least not if these are meant to be experiments which can instantly overthrow a research programme. In fact, when one research programme suffers defeat and is superseded by another one, we may- with long hindsight-call an experiment crucial if it turns out to have provided a spectacular corroborating instance for the victorious programme and a failure for the defeated one (in the sense that it was never `explained progressively'-or, briefly, `explained'- within the defeated programme). But scientists, of course, do not always judge heuristic situations correctly. A rash scientist may claim that his experiment defeated a programme, and parts of the scientific community may even, rashly, accept his claim. But if a scientist in the `defeated' camp puts forward a few years later a scientific explanation of the allegedly `crucial experiment' within (or consistent with) the allegedly defeated programme, the honorific title may be withdrawn and the `crucial experiment' may turn from a defent

into a new victory for the programme.

Examples abound. There were many experiments in the eighteenth century which were, as a matter of historico-sociological fact, widely

accepted as `crucial' evidence against Galileo's law of free fall, and Newton's theory of gravitation. In the nineteenth century there were several `crucial experiments' based on measurements of light velocity which `disproved' the corpuscular theory and which turned out later to be erroneous in the light of relativity theory. These `crucial experiments' were later deleted from the justificationist textbooks as manifestations of shameful shortsightedness or even of envy. (Recently they reappeared in some new textbooks, this time to illustrate the inescapable irrationality of scientific fashions.) However, in those cases in which ostensibly `crucial experiments' were indeed later borne out by the defeat of the programme, historians charged those who resisted them with stupidity, jealousy, or unjustified adulation of the father of the research programme in question. (Fashionable `sociologists of knowledge'- or `psychologists of knowledge'- tend to explain positions in purely social or psychological terms when, as a matter of fact, they are determined by rationality principles. A typical example is the explanation of Einstein's opposition to Bohr's complementarity principle on the ground that `in 1926 Einstein was forty-seven years old. Forty-seven may be the prime of life, but not for physicists'.1)

In the light of my considerations, the idea of instant rationality can be seen to be utopian. But this utopian idea is a hallmark of most brands of epistemology. Justificationists wanted scientific theories to be proved even before they were published; probabilists hoped a machine could flash up instantly the value (degree of confirmation) of a theory, given the evidence; naive falsificationists hoped that elimination at least was the instant result of the verdict of exeperiment.2 I hope I have shown that all these theories of instant rationality - and instant learning - fail. The case studies of this section show that rationality works much slower than most people tend to think, and, even then, fallibly. Minerva's owl flies at dusk. I also hope I have shown that the continuity in science, the tenacity of some theories, the rationality of a certain amount of dogmatism, can only be explained if we construe science as a battleground of research programmes rather than of isolated theories. One can understand very little of the growth of science when our paradigm of a chunk of scientific knowledge is an isolated theory like `All swans are white', standing aloof, without being embedded in a major research programme. My account implies a new criterion of demarcation between `mature science', consisting of research programmes, and `immature science' consisting ofa mere patched up pattern of trial and error.3 For instance, we may have a conjecture, have it refuted and then rescued by an auxiliary hypothesis which is not ad hoc in the senses which we had earlier discussed. It may predict novel facts some of which may even be corroborated.4 Yet one may achieve such 'progress' with a patched up, arbitrary series of disconnected theories. Good scientists will not find such makeshift progress satisfactory; they may even reject it as not genuinely scientific. They will call such auxiliary hypotheses merely 'formal', `arbitrary', `empirical', `semi-empirical', or even `ad hoc'.

Mature science consists of research programmes in which not only novel facts but, in an important sense, also novel auxiliary theories, are anticipated; mature science - unlike pedestrian trial-and-error- has `heuristic power'. Let us remember that in the positive heuristic of a powerful programme there is, right at the start, a general outline of how to build the protective belts: this heuristic power generates the autonomy of theoretical science.

This requirement of continuous grownth is my rational reconstruction of the widely acknowledged requirement of `unity' or `beauty' of science. It high-lights the weakness of two- apparently very different- types of theorizing. First, it shows up the weakness of programmes which, like Marxism or Freudism, are, no doubt, unified, which give a major sketch of the sort of auxiliary theories they are going to use in absorbing anomalies, but which unfailingly devise their actual auxiliary theories in the wake of facts without, at the same time, anticipating others. (What novel fact has Marxism predicted since, say, I9l7?) Secondly, it hits patched-up, unimaginative series of pedestrian `empirical' adjustments vhich are so frequent, for instance, in modern social psychology. Such adjustments may, with the help of so-called `statistical techniques', make some `novel' predictions and may even conjure up some irrelevant grains of truth in them. But this theorizing has no unifying idea, no heuristic power, no continuity. They do not add up to a genuine research programme and are, on the whole, worthless.5

My account of scientific rationality, although based on Popper's, leads away from some of his general ideas. I endorse to some extent both Le Roy's conventionalism with regard to theories and Popper's conventionalism with regard to basic propositions. In this view scientists (and as I have shown, mathematicians too) are not irrational when they tend to ignore counterexamples or as they prefer to call them, `recalcitrant' or 'residual' instances, and follow the sequence of problems as prescribed by the positive heuristic of their programme, and elaborate - and apply - their theories regardless.6 Contrary to Popper's falsificationist morality, scientists frequently and rationally claim `that the experimental results are not reliable, or that the discrepancies which are asserted to exist between the experimental results and the theory are only apparent and that they will disappear with thee advance of our understanding'. When doing so, they may not be `adopting the very reverse of that critical attitude which...is the proper one for the scientist'. Indeed, Popper is right in stressing that, the dogmatic attitude of sticking to a theory as long as possible is of considerable significance. Without it we could never find out what is in a theory- we should give the theory up before we had a real opportunity of finding out its strength; and in consequence no theory would ever be able to play its role of bringing order into the world, of preparing us for future events, of drawing our attention to events we should otherwise never observe'.Thus the `dognmatism' of `nornmal science' does not prevent growth as long as we combine it with the Popperian recognition that there is good, progressive normal science and that there is bad, degenerating normal science, and as long as we retain the determination to eliminate, under certain objectively defined conditions, some research programmes.

The dogmatic attitude in science - which would explain its stable periods - was described by Kuhn as a prime feature of `normal science'. But Kuhn's conceptual framework for dealing with continuity in science is socio-psychological: mine is normative. I look at continuity in science through 'Popperian spectacles'. Where Kuhn sees `paradigms', I also see rational `research programmes'.

1/ Bernstein [1961), p. 129. In order to appraise progressive and degenerating elements in rival problemshifts one must understand the ideas involved. But the sociology of knowledge frequently serves as a successful cover for illiteracy : most sociologists of knowledge do not understand or even care for- the ideas ; they watch the socio-psychological pattems of behaviour. Popper used to tell a story about a 'social psychologist', Dr. X, studying scientists' group behaviour. He went into a physics seminar to study the psychology of science. He observed the 'emergence of a leader, the 'rallying round effect' in some and the 'defence-reaction' in others, the correlation between age, sex and aggressive behaviour, etc. (Dr. X claimed to have used some sophisticated small- sample techniques of modern statistics.) At the end of the enthusiastic account Popper asked Dr. X: 'What was the problem the group wras discussing?' Dr. X was surprised : 'Why do you ask? I did not listen to the words! Anyway, what has that to do with the psychology of knowledge?'

2/ Of course, naive falsificationists may take some time to reach the 'verdict of experiment' : the experiment has to be repeated and critically considered. But once the discussion ends up in an agreement anmong the experts, and thus a 'basic statement' becomes 'accepted', and it has been decided which specific theory was hit by it, the naive falsificationist will have little patience with those who still 'prevaricate'.

3/ The elaboration of this demarcation in the two following paragraphs was improved in the press, following invaluable discussions with Paul Meehl in Minneapolis in 1969.

4/ Earlier, in my [1968al, h distinguished, following Popper, two criteria of adhocness. I caled ad hoc1 theories which had no excess content over their predecessors (or competitors) that is, which did not predict any novel facts ; I called ad hoc2 theories which predicted novel facts but completely failed : none of their excess content got corroborated.

5/ After reading Meehl [1967] and Lykken [1968) one wonders whether the function of statistical techniques in the social sciences is not primarily to provide a machinery for producing phoney corroborations and thereby a semblance of, scientific progress' where, in fact, there is nothing but an increase in pseudo-intellectual garbage. Meehl writes that, in the physical sciences, the usual result of an improvement in experimental design, instrumentation, or numerical mass of data, is to increase the diffculty of the "observational hurdle" which the physical theory of interest must successfully surmount; whereas, in psychology and some of the allied behaviour sciences, the usual effect of such improvement in experimental precision is to provide an easier hurdle for the theory to surmount'. Or, as Lykken put it: 'Statistical significance (in psychology) is perhaps the least important attribute of a good experiment; it is never a sufficient condition for claiming that a theory has been usefully corroborated, that a meaningful empirical fact has been established, or that an experimental report ought to be published.' It seems to me that most theorizing condemned by Meehl and Lykken may be ad hoc3. Thus the methodology of research programmes might help us in devising laws for stemming this intellectual pollution which may destroy our cultural environment even earlier than industrial and traffic pollution destroys our physical envrironment.

6/ Thus the methodological asymetry between universal and singular statements vanishes. We may adopt either by convention: in the 'hard core' we decide to `accept' universal, in the `empirical basis' singular, statements. The logical asymetry between universal and singular statements is fatal only for the dogmatic inductivist who wants to leam only from hard experience and logic. The conventionalist can, of course, ,accept' this logical asymmetry : he does not have to be (although he may be) also an inductivist. He 'accepts' some universal statements, but not because he claims to deduce (or induce) them from singular ones.