The importance of small samples in medical research

This is an open access journal, and articles are distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License, which allows others to remix, tweak, and build upon the work non-commercially, as long as appropriate credit is given and the new creations are licensed under the identical terms.

Abstract

Almost all bio-statisticians and medical researchers believe that a large sample is always helpful in providing more reliable results. Whereas this is true for some specific cases, a large sample may not be helpful in more situations than we contemplate because of the higher possibility of errors and reduced validity. Many medical breakthroughs have occurred with self-experimentation and single experiments. Studies, particularly analytical studies, may provide more truthful results with a small sample because intensive efforts can be made to control all the confounders, wherever they operate, and sophisticated equipment can be used to obtain more accurate data. A large sample may be required only for the studies with highly variable outcomes, where an estimate of the effect size with high precision is required, or when the effect size to be detected is small. This communication underscores the importance of small samples in reaching a valid conclusion in certain situations and describes the situations where a large sample is not only unnecessary but may even compromise the validity by not being able to exercise full care in the assessments. What sample size is small depends on the context.

KEY WORDS: Medical research, n = 1, self-experiments, small sample

Introduction

Statisticians, particularly those assisting medical research, are infamous for insisting on a large sample. “The larger the sample, the more reliable is the result” is their dictum. The recent examples are phase-III vaccine trials for coronavirus disease-19 where each company has conducted trials on thousands of people for assessing the efficacy of the vaccine and the incidence of the side effects. We explain later why such a large sample is required in this case, but there are several other studies on unnecessary huge samples. For example, Schnitzer et al.[1] conducted a study on 47,935 patients with osteoarthritis and 10,639 patients with rheumatoid arthritis to compare the prescription rate of rofecoxib and celecoxib. A trivial difference with these big samples is almost certain to be statistically significant as rightly mentioned by the authors. The sample size was not statistically determined but was based on the cases available in the large database of pharmacy claims in the US. Krishna et al.[2] studied the records of 782,320, 1,393,570, and 1,049,868 patients with allergic rhino-conjunctivitis, atopic eczema, and asthma, respectively, and twice as many controls, to find the relative risk of allergic diseases in such cases. This also is based on a retrospective cohort extracted from the UK primary care database with no justification of the sample size. Among the clinical trials, the effect of tranexamic acid on the mortality of different types of trauma patients was studied with a sample of 10,060 in the treatment arm and 10,067 in the control arm.[3] This study included 274 hospitals in 40 countries and no justification for the sample size is provided. There is an inclination to move to mega trials which would be based on huge samples. Thus, a large sample is used not just for retrospective data but also for prospective trials.

The above-mentioned examples show that a study is sometimes done on a large sample ignoring the statistical considerations of the desired precision and confidence level in the case of estimation and the minimum effect size to be detected and power in the case of testing of hypothesis situation. A large sample has become a lot easier in many cases these days because the data are available with individual institutions in an electronic form and these institutions form a consortium to achieve an impressively large sample, purportedly to increase the confidence in the results and to assert that their results have a high chance of being closer to the truth. This communication explains that this assertion could be false in some cases despite a very large sample, and studies on small samples can produce more truthful results in many cases because they can be carried out with more care. As described next, studies even with n = 1 can sometimes provide breakthrough findings. We also identify specific situations where a large sample may be required.

The Significance of n = 1

Scientists would agree that only one (n = 1) counter-example is enough to dismiss a theory. Such an example provides evidence that contrary to the existing knowledge 'can' happen. For example, there is no exception to the Pythagoras theorem. Medicine is not such a lucky science and a variation in agent, host, and environmental factors and their interaction can throw away any theory. Zhang et al.[4] has provided a counter-example to the conventional wisdom in the biomedical optics that longer wavelengths aid deeper imaging in the tissue, and Hughes et al.[5] presented a counter-example that in the center of the human ocular lens, there is no lipid turnover in the fiber cells during the entire human life span.

Howsoever paradoxical it may sound from the statistical viewpoint, many medical breakthroughs have occurred with a few or even a single observation (n = 1). Edward Jenner injected a boy with smallpox pus in 1796 that led to the vaccine and began immunology as a science.[6] The development of penicillin started from the single observation of Alexander Fleming, who noticed in 1928 that a mold had developed on a contaminated staphylococcus culture plate and concluded that possibly the culture prevented the growth of staphylococci, and could be effective against gram-positive bacteria.[7] He produced a filtrate of the mold cultures, named penicillin, which had a significant antibacterial effect and saved countless lives. A heart transplant by Christiaan Bernard in 1967[8] opened enormous possibilities. Only one instance of death soon after consuming a specific substance is generally considered enough to suspect that this substance can be poisonous, and one person developing a disease on contact with an affected person opens the possibility of it being contagious.

Many studies are based on self-experimentation. Nicholas Senn's experiment in 1901 of inserting a piece of a cancerous lymph node from a lip cancer patient and not developing the disease was a pointer to conclude that cancer is not microbial and not contagious.[9] William Harrington exchanged blood transfusion between himself and a thrombocytopenic patient in 1950 and thus discovered the immune basis of idiopathic thrombocytopenic purpura and provided evidence of the existence of auto-immunity.[10] Barry Marshall intentionally consumed H. pylori in 1984 and became ill. He took antibiotics and relieved his symptoms.[11] Thus, a cause-effect relationship was proposed based on just one observation. Sildenafil citrate (Viagra) was originally developed to treat cardiovascular problems, but Giles Brindley stunned the world in 1983 by dropping pants and showing an erected penis after injecting it with phenoxybenzamine in a urological conference. This proved that the erection mechanism of the penis is not in the heart but the penis.[12] An experiment on just one person was enough for the world to take a note that possibly a study on thousands of subjects would not have. Weisse identified 465 documented instances of self-experiment.[13] Many of these experiments paved the way for discoveries despite n = 1.

Convincing studies with n = 1 may be few and far between but they do provide evidence of the possible existence of an effect. They may not be enough to make a generalized statement for the entire target population, but they make a noticeable statement. All case studies are based on single patients and they are successful in highlighting the unusual occurrences that one must be aware of.

n–of–1 trials

In certain situations, a n-of-1 trial can be done where two or more treatment strategies are used on the same patient after proper randomization, blinding, and washout period where necessary. In this case, two or more regimens are tried on the same patient if the conditions allow. This does not require a big sample and can have just one patient. Such a trial can determine the optimal interaction for an individual patient and can be a good strategy for individualized medicine[14] although the generalization suffers. Nevertheless, a series of n-of-1 trials can provide a meaningful evidence base. Sedgwick[15] described an n-of-1 trial of release paracetamol and celecoxib for osteoarthritis. They had 41 patients completing the trial. Wood et al.[16] described an n-of-1 trial on 60 patients of statin, placebo, and no treatment to assess the side effects. Stunnenberg[17] has provided a practical flowchart for n-of-1 trials based on an ethical framework.

Small n

Sauro and Lewis[18] considered n n n (1 – p)] P is the proportion) for the qualitative outcome is considered small because the central limit theorem for normal distribution does not hold in most cases with such a sample size and an exact method of analysis is required.[20] This means that for P = 0.001 (1 in 1,000), n must be at least 8,000 for using the usual normal distribution-based methods. However, this is only for the purpose of choosing the method of statistical analysis. For research, what is a small sample depends on the context and no hard-core definition can be given. The examples cited in this article illustrate what is small in different contexts.

Although multiple problems have been cited with the studies on a small sample,[21,22,23] many examples exist of useful studies on small samples. Some big discoveries have started with case series such as the dissemination of Kaposi sarcoma in young homosexuals[24] and pneumocystis pneumonia.[25] Most preclinical studies are done on a small sample of animals, particularly for regimens with a potentially harmful outcome such as insecticides. Animal experiments can be done in highly controlled conditions to nearly eliminate all the confounders, and thus, establish the cause-effect relationship without studying a big sample. This shows that the crucial requirement for analytical research is not the sample size but the control of all the cofounders. When they are under control, the variance decreases, and sufficient power is achieved with a smaller sample. Thus, a study with a small sample can provide more believable results than those on a large sample with uncontrolled confounders. Small samples have a tremendous advantage as highly sophisticated and accurate measurements can be made with all the precautions in place. The measurement errors and biases can be easily controlled and can be easily identified in a small sample. The aggregation errors that occur due to the combining of small and large values are less likely with small samples. Small samples give quick results, can be carried out in one center without the hassles of multicenter studies, and are easy to get the ethical committee approval. They may require exact methods of statistical analysis that can help in reaching more valid conclusions.

Among the clinical studies, phase-I trials are done on small samples where the objective is to test toxicity. In other setups, Hansen and Fulton[26] carried out a study on four children with a history of mild retinopathy of prematurity (ROP) and four controls and concluded that there is evidence of peripheral rod photoreceptor involvement in the subjects with ROP. Machado et al.[27] found severe acute respiratory syndrome coronavirus-2 viral ribonucleic acid (SARS-COV-2 viral RNA) in the semen of 1 out of 15 patients of this disease and considered it enough to alert about a possible new mode of transmission. Hatchell et al.[28] studied six or fewer patients undergoing different reconstructions and concluded that vascularized nerve grafts for facial nerve offer a practical and viable facial reconstruction surgery with acceptable donor site deficits. Most trials on surgical procedures are done on small samples because of the unavailability of many homogeneous cases, intra-operative variations, and the difficulty in obtaining patient consent for randomization for such trials.[29] A small sample has not impeded the progress of science in these disciplines.

No single study, whether based on a small sample or a large sample, is considered conclusive. A large number of small studies can be done easily in different setups, and if they point toward the same direction, a safe, possibly more robust, conclusion can be drawn through a meta-analysis. Alvares et al.[30] combined the results of 26 small studies, with a sample size of 8–30, by meta-analysis for assessing the effect of dietary nitrate on muscular strength. They found a trivial but statistically significant effect of dietary nitrate ingestion on the muscular strength with a combined sample of more than 500 subjects, although none of the individual studies reported any significant effect, possibly due to the small size of the sample.

Anderson and Vingrys[31] argued that small samples may be enough to show the presence of an effect but not for estimating the effect size. If the objective is only to show that an effect exists, bearing the cost of a large sample can be avoided. An unrealized advantage of small studies is that only a relatively large effect would be statistically significant, and this large effect may be medically significant too to change the current application. In addition, there is wide acceptance to the call to move beyond P < 0.05.[32] At the same time, detecting a small medically significant effect can be important in some cases and that brings in the question of the studies based on a large sample with adequate power.

Large n

Most medical studies are carried out in less-than-ideal conditions primarily because ideal conditions in a medical setup simply do not exist in most situations. If there are many known and unknown confounders that can affect the outcome, a large sample is imperative to 'average out' their effect. This is tricky but is an underlying assumption in most medical studies although this requires a random sample. Large sample studies, including mega trials, are welcome if the data quality is assured. The second situation requiring a large sample is the need to have a high precision of the estimate of the effect size. We know that a large sample in this case handsomely improves the precision. However, the objective in most medical research is to be able to detect a medically significant effect (or not to miss an effect) when present and requires power calculations. The smaller the effect to be detected, the larger is the requirement of the sample. With the advancement of science, small improvement may have become medically important, and a large sample is required to detect a small improvement. A large sample may be required also to study a rare event, particularly if it is highly variable. A study on methicillin-resistant staphylococcus aureus (MRSA) positivity in general patients[33] is an example where a huge sample may be required. The identification of markers of Alzheimer's disease in its early phase[34] is another example where a large sample may be required because of wide variability. The efficacy of a vaccine is based on the difference in the incidence of the disease in the vaccinated and control groups—both these incidences may be small and the difference even smaller. Thus, a trial on a large sample is required. A large sample is also necessary to identify the rare side effects in this case. A large sample is also justified for multi-centric studies and for studies that investigate several outcomes.

At the same time, there are instances where an unnecessarily large sample was studied. We cited some examples earlier. Celik et al.[35] found that most randomized controlled trials (RCTs) on rheumatoid arthritis enroll more patients than needed. This is a needless exposure of patients to a regimen that is under trial.

Kaplan et al.[36] sounded a caution that big data could lead to big inferential errors and can magnify bias. This can happen due to the carelessness in collecting data or inadequate resources for a large study that could cause measurement errors, or due to unwittingly choosing a biased sample. They cite the example of the opinion polls before the election that rarely provide correct results despite a huge sample. Such surveys can rarely be done on a random or representative sample and the response received is not necessarily the same as actual voting. In medicine, this can happen with records-based studies and clinical trials when the sample is biased, or the quality of data is compromised. The investigator may not be aware that some impropriety has happened or may carelessly ignore it. The article by Munyangi et al.[37] had to be retracted due to questionable data, in addition to ethical issues, despite a large clinical trial. Discrepancies exist even among mega trials,[38] thus, large-scale trials too are not a guarantee of infallible results. Charlton[39] has discussed how typical mega trials recruit pathologically and prognostically heterogeneous subjects and lose validity. Mega trials generally require a multicenter approach where adopting a common protocol is difficult because of the preferences of individual centers.[40] Heterogeneity in a series of small trials may provide a significant advantage over a mega trial[41] when they point to the same conclusion.

On the other hand, are studies with inadequate samples that failed to detect medically significant improvement. Freiman[42] reexamined 71 negative trials and observed that 50 of these had more than a 10% chance of missing a 50% therapeutic improvement because of the small sample size, and Dimick[43] reported similar findings for surgical trials. Thus, a large sample may be required in certain situations.

The case for small n

Concerns such as truthful research[44] and the effect of aleatory and epistemic uncertainties on the results[45] do not necessarily require a large sample. A big sample may be required in cases where the variability is high or the event under the study is rare and a precise estimate is required. That does not ensure validity as the large studies tend to use less care in obtaining high-quality data. A large sample may not be needed for comparative studies that aim to detect a specified effect if they are adequately planned to control the effect of all known and unknown confounders, wherever they operate, on the pattern of a laboratory setup except when a small effect is to be detected. Enrolling subjects could be expensive in many cases and can be avoided. The investigators should rather concentrate on the optimal design, accurate measurements, right analysis, and correct interpretation for the increased validity of the results, and not so much on the sample size. Validity is the key to truthful results. This approach may be more cost-effective in many situations. When a particular hypothesis is to be disproved or a potential effect is to be demonstrated, a small sample, even n = 1, maybe enough.

The present emphasis on large-scale studies is misplaced in many cases, particularly in analytical studies, where design and accurate data are more important. The journals should avoid giving high weight to studies on a large sample and the reviewers should rather focus on the design for the control of cofounders and good quality of data.

Financial support and sponsorship

Conflicts of interest

There are no conflicts of interest.