A) What is a hypothesis?
In statistics, a hypothesis is a premise, or a claim that we want to test or to investigate. To do so, we may, for instance, make a survey and get some data. So, when we decide to investigate or test a claim, what we will need to establish a sample from which we can acquire some information and draw conclusions.
The Webster's New International Dictionary of English Language define the word "hypothesis" as « proposition, condition or principle which is assumed, perhaps. without belief, in order to draw its logical consequences and by this method. to test its accord with facts which are known or may be determined».
B) Hypothesis’ functions
There are different functions of hypothesis :
a) Hypothesis can be formulated and elaborated in order to describe some features about the global population of a place.
b) It can establish the link between two or more variables and indicate the frequency of a special event’s occurrence.
c) Sometimes, it is formulated in order to compare two different variables or two groups belonging to the same variable.
d) Another type of hypothesis affirms that a particular feature can be a determining factor of another feature. In other words, if the modification of a given variable (independent variable) causes another variable to change (dependent variable), a hypothesis can be formulated to evaluate this connection and can be called “a causal hypothesis”.
e) When we measure the difference between statistical variables on the basis of relevant data, we will deal with statistical hypothesis.
f) Perhaps, a researcher can be dubious about a phenomenon’s reality because of lack of information. In this case, a hypothesis can be emitted temporarily in order to be checked and modified during the research process.
C) Biggest steps in hypothesis testing:
1. Determine the null hypothesis and the alternative hypothesis
Suppose that we make a premise about the average age of all unmarried pregnant women in a city and claim that this age is 24, we will refer to this number as the null hypothesis, represented by H0.
The opposite of the null hypothesis is called the alternative hypothesis, represented by H1. In other words, The alternative hypothesis is a statement that contradicts the null hypothesis. In this case, the alternative hypothesis would be that the average age of unmarried pregnant women in a city is not 24.
2. Gather information:
After defining the null hypothesis and the alternative hypothesis, it is important to identify the data sources that will provide you with relevant and reliable information. Data sources may include surveys, interviews, experiments, reports, publications, etc. Once you have elected your data sources, be sure to follow best practices for data collection, such as “defining your sample size, designing your data collection instruments, ensuring data quality and validity, and obtaining informed consent and permissions.” Afterwards, the collected data should be stored in a secure and consistent software.
3. Set the significance level:
The significance level, also known as α, corresponds to the “probability of rejecting the null hypothesis if it is true”[2].Usually, the chosen significance level is 5%. It means that we can be highly sure that a hypothesis is correct and has a very low percentage of inaccuracy (5%). The significance level helps us judge if the test results are genuine, convincing and powerful enough to trust or if they might merely be a pure coincidence. The range of variation contains two parts : the acceptance part and the rejection part. If the statistic of the sample falls into the rejection region, the hypothesis is obviously rejected because it will certainly lead to an erroneous decision. In this case, we retain the hypothesis H1 and refute H0.
4. Calculation of the Test Statistic
Collect sample data and calculate a test statistic by comparing the sample statistic to the parameter value. The calculation of this test statistic assumes the null hypothesis is true and “incorporates a measure of standard error and assumptions (conditions) related to the sampling distribution.” *
5. Calculation of the p-value
The p-value is: “ the probability of getting the observed test statistic or something more extreme when 𝐻𝑜 is true”*. It is a necessary statistical measurement employed to confirm or refute a given hypothesis.
6. Making a Decision
“If P-value>α then fail to reject the null hypothesis. There is insufficient evidence to conclude [𝐻𝑎 in words] If P-value<α then reject the null hypothesis. There is sufficient evidence to conclude [𝐻𝑎 in words] “*
Summary of different Steps:
- Determine the null hypothesis and the alternative hypothesis
- Gather information
- Choose the significance level (α).
- Calculate the test statistic.
- Calculate the p-value.
- Make the decision.