A global network of researchers

Statistical significance level: The science behind the art in research publication - Part 2

By Felix Emeka Anyiam | Jan. 14, 2022  | Research skills Statistics

In part one of this series, Felix Emeka Anyiam explained the basics of statistical significance. In part two, he talks about Tests of Significance (TOS), how they differ from statistical significance and when to apply a Fisher’s Exact Test (FET).

Before we talk about statistical significance, I would like to reiterate that accepting the null hypothesis doesn't invalidate your research, and some publishers will publish your findings. Never be anxious if you do not find a statistically significant difference. A good example is a new drug released to treat malaria, which is being used in many countries with endemic malaria. Your country's government consults you to experiment on the drug, to see if the same drug can be used. Based on prior history and recommendations, your null hypothesis is that the drug is safe. Wouldn’t it be great news to know that the drug is truly safe after having done your research? That way, you fail to reject the null hypothesis; in other words, you accept your null hypothesis as saying that no toxic effect is present in the drug. So you see, accepting your null hypothesis is not always a bad thing. 

To accept or reject a hypothesis you need statistically significant results, and you can get those using a variety of statistical procedures collectively known as a Test of Significance (TOS).

Test of Significance and how it differs from Statistical Significance

What is a test of significance? A test of significance (TOS), sometimes called test statistic, quantifies differences in measured outcomes or variables. It is closely related to statistical significance. Simply put, a TOS contains information from your data analysis that will help you decide whether to reject or accept the null hypothesis. A good example of a TOS often used in student research is the Chi-Squared test. In Table 3, the TOS is the Chi-Squared (χ2), and the statistical significance is the value.  We used the TOS Chi-Squared to measure the difference in proportions (%) of positive COVID-19 cases among those with either good or poor knowledge of COVID-19 protective behaviours. It is good practice to include both in your table. A poor practice seen in research publications is that researchers only include theirvalues (the significance levels), which merely reflects a reference range, and have ignored the TOS, that is, the true inference value or procedure that supports the  relationship between two variables of statistical interest. The presentation of Tables 1 and 2 in our example in Part 1 is not entirely accurate in this regard, as we did not include our TOS values. Table 3 below is more appropriate.

 

The TOS does not have a cut-off point that helps us decide whether to accept or reject our null hypothesis, but the value does. With a value ≤ 0.05, we can reject our null hypothesis. Other popular choices of significance levels are 1% (0.01), 0.5% (0.005), and 0.1% (0.001). As a result, we cannot tell whether or not to accept or reject our null hypothesis by simply looking at our TOS (χ2 = 4.80) due to the absence of a cut-off point, but the value guides this. Also, most data analysis software I have used, like SPSS, Epi-Info, R, STATA and SAS, will generate the TOS alongside the statistical significance. I will now show two examples, one from Epi-Info and the other from application of the SPSS statistical software.

Figure 3: Generation of TOS and SS using Epi-Info statistical software

 

Figure 4: Generation of TOS and SS using SPSS statistical software

 

When to apply a Fisher’s Exact Test (FET)

If the number of measurements in a test of association is less than 5 (< 5.0), as is the case in the example seen in Table 4 (where we are dealing with 4 measurements), then Fisher’s Exact Test (FET) is recommended over the Chi-Squared test, as it adjusts for smaller values or sample sizes. This is in contrast to Table 3, where no frequency (counts) less than 5 is observed and a Chi-Squared test is applicable. FET is appropriate, especially when dealing with small numbers. Because the test is essentially an approximation of the exact test's results, it's possible that erroneous results could be derived from few observations (like 1, 2, 3 or 4), possibly leading to incorrect conclusions if a Chi-Squared test is applied. Therefore, it is incorrect practice when a researcher states that they applied a FET and you still see a Chi-Squared test value in addition to the FET, as shown in Table 5. A Chi-Squared test is not applicable when a FET is applied. The FET is one of a class of exact tests named after its creator, Ronald Fisher, because the significance of a deviation from a null hypothesis (e.g., the p value) can be calculated exactly, rather than having to rely on an approximation as observed for a Chi-Squared test. 

 

 

Different forms of TOS

Previously, we advocated the use of the Chi-Squared test when comparing proportional differences between two types of categorical data. Additionally, when comparing differences between two means from continuous data, the Student t-test is recommended, and for three or more means, ANOVA (Analysis of Variance), also referred to as the F-test or F-statistic, is recommended. Note that the Student t-test and the ANOVA test are only applicable for use with the parametric method of data analysis (which refers to the use of statistical tests or methods when one's data comes from a sample or a population of people that is normally distributed) and is recommended when the researcher follows a probability sampling method in choosing the sample size for the study. The term "probability sampling" refers to the fact that each member of the target population has a known likelihood of being selected for inclusion in the sample. Simple random sampling, systematic sampling, stratified sampling and cluster sampling are all examples of probability sampling procedures.

In cases where researchers used a convenience sampling method (for instance, by surveying people that enter a supermarket, or by considering people who belong to a particular group, without relying on a rigorous scientific process), a non-parametric method of analysis is recommended, like the Mann–Whitney test for comparing two medians, or the Kruskal–Wallis test for comparing three or more medians. The reason is that once a probability sampling method is no longer followed, most of the potential respondents in the population sample will not have an equal chance of being selected for the study, and that automatically favours a particular group of people over the other. This is when we say the data is skewed (bent to the right or left), focused on one interest group, and thus not normally distributed. The normality of your data, which is the basis for choosing either a parametric method or a non-parametric method of analysis can be tested using the Shapiro–Wilk Normality test in a statistical software package. The Shapiro–Wilk test determines whether or not a random sample is drawn from a normal distribution. In the statistical software, a statisticaly significant value means your data significantly deviates from a normal distribution. 

In addition, for comparing means from two different groups, the Student t-test is recommended. We have the independent and the paired t-test. An Independent Samples t-test compares the means of two groups (e.g. the difference in mean age between males and females). A Paired Sample t-test compares means from the same group at different times (say, the difference between the mean weight of a child when healthy and when sick). 

Below is a chart showing which TOS to use and for what purpose (Figure 5).

Figure 5: Forms of Tests of Significance

To conclude, the value or significance level is considered statistically significant if ≤ 0.05, which has been conventionally accepted as the cut-off to decide as to whether or not a difference exists. In theory, p ≤ 0.05 indicates strong evidence against the null hypothesis, as there is less than a 5% probability that the null (no difference) is correct. However, note that most disciplines are now moving away from usng values to using confidence intervals. Secondly, the TOS should be stated even when the value is the reference to support or reject claims based on the set hypothesis. And finally, there will be no need for a TOS, statistical significance or hypothesis if there is no comparison of some sort between two or more variables. Whether you are looking at differences in proportions, means, or medians. Hypothesis testing is the bedrock of Inferential statistics.   

 

About the author: Felix Emeka Anyiam is a Research & Data Scientist at the Centre for Health and Development (CHD), University of Port Harcourt, Port Harcourt, Nigeria. He is an accomplished Researcher, Analytics and Data Science professional with a demonstrated ability to develop and implement data-driven solutions in the Urban and Health sectors; with over 10 years’ experience in teaching and leading Epidemiological, Biostatistical and Data Science projects, from descriptive to predictive analytics. He is also a trained Research Scientist, Scientific Writer and Epidemiologist in his day-to-day duties at the CHD, a Centre that evolved from several years of international research collaborations between the University of Port Harcourt and the Dalla Lana School of Public Health at the University of Toronto, Canada, and most recently, the University of Ottawa, Canada. The CHD aims to develop human and organisational capacity for health-related research and quality health-care provision in the Niger Delta region of Nigeria, built on sustainable local structure and international collaborations. Felix is one of the guest facilitators of AuthorAID online courses, a Biostatistician for the AuthorAID Online Journal Clubs pilot project, INASP, and he also advises on a potential curriculum for an online statistical/data analysis course.

blog comments powered by Disqus