Descriptive
statistics
Descriptive statistics is one of the
two main branches of statistics.
Descriptive statistics provide a
concise summary of data. We can summarize data numerically or graphically. For
example, the manager of a fast food restaurant tracks the wait times for
customers during the lunch hour for a week. Then, the manager summarizes the
data.
Numeric
descriptive statistics
The Researcher calculates the numeric descriptive statistics:
Graphical descriptive statistics
The Researcher examines
the graphs to visualize the wait times.
Inferential
statistics
Inferential statistics is one of the
two main branches of statistics. It uses a random sample of data taken from a
population to describe and make inferences about the population. Inferential
statistics are valuable when examination of each member of an entire population
is not convenient or possible. For example, to measure the diameter of each
nail that is manufactured in a mill is impractical. We can measure the
diameters of a representative random sample of nails. We can use the
information from the sample to make generalizations about the diameters of all
of the nails.
Difference between
descriptive and inferential statistics :
1. Descriptive statistics uses the data to
provide descriptions of the population, either through numerical calculations
or graphs or tables. Inferential statistics makes inferences and predictions
about a population based on a sample of data taken from the population in
question.
2. Descriptive statistics consists of
the collection, organization, summarization, and presentation of data.
Inferential statistics consists of
generalizing from samples to populations, performing estimations and hypothesis tests, determining relationships among
variables, and making predictions.
Parametric
Tests: Population values are normally
distributed.
Reasons
to Use Parametric Tests
Reason 1:
Parametric tests can perform well with skewed and nonnormal distributions
This may be a surprise but
parametric tests can perform well with continuous data that are nonnormal if
you satisfy the sample size guidelines in the table below.
etric analyses
|
Sample size guidelines for nonnormal data
|
1-sample t test
|
Greater than 20
|
2-sample t test
|
Each group should be greater than 15
|
One-Way ANOVA
|
|
While nonparametric tests don’t assume that our data follow a normal distribution, they do have other assumptions that can be hard to meet. For nonparametric tests that compare groups, a common assumption is that the data for all groups must have the same spread (dispersion). If our groups have a different spread, the nonparametric tests might not provide valid results.
On the other hand, if we use the 2-sample t test or One-Way ANOVA, we can simply go to the Options sub dialog and uncheck Assume equal variances.
Reason 3: Statistical power
Parametric tests usually have more statistical power than nonparametric tests. Thus, we are more likely to detect a significant effect when one truly exists.
Nonparametric tests are also called
distribution-free tests because they don't assume that our data follow a specific distribution. We
should use nonparametric tests when our data don't meet the assumptions of the parametric
test, especially the assumption about normally distributed data.
It’s safe to say that most people
who use statistics are more familiar with parametric analyses than
nonparametric analyses.
- Parametric analysis to test group means.
- Nonparametric analysis to test group medians.
Hypothesis
Tests of the Mean and Median
Nonparametric tests are like a
parallel universe to parametric tests. It is shown in below table.
Parametric tests (means)
|
Nonparametric tests (medians)
|
1-sample
t test
|
1-sample
Sign, 1-sample Wilcoxon
|
2-sample
t test
|
Mann-Whitney
test
|
One-Way
ANOVA
|
Kruskal-Wallis,
Mood’s median test
|
Factorial
DOE with one factor and one blocking variable
|
Friedman
test
|
Reasons to Use Nonparametric Tests
Reason 1: Our area of study is better represented by the medianThis is my favorite reason to use a nonparametric test and the one that isn’t mentioned often enough! The fact that we can perform a parametric test with nonnormal data doesn’t imply that the mean is the best measure of the central tendency for our data.
For example, the center of a skewed distribution, like income, can be better measured by the median where 50% are above the median and 50% are below. If we add a few billionaires to a sample, the mathematical mean increases greatly even though the income for the typical person doesn’t change.
When our distribution is skewed enough, the mean is strongly affected by changes far out in the distribution’s tail whereas the median continues to more closely reflect the center of the distribution. For these two distributions, a random sample of 100 from each distribution produces means that are significantly different, but medians that are not significantly different.
Reason 2: We have a very small sample size
If we don’t meet the sample size guidelines for the parametric tests and we are not confident that we have normally distributed data, we should use a nonparametric test. When we have a really small sample, we might not even be able to ascertain the distribution of our data because the distribution tests will lack sufficient power to provide meaningful results.
In this scenario, we are in a tough spot with no valid alternative. Nonparametric tests have less power to begin with and it’s a double whammy when we add a small sample size on top of that!
Reason 3: We have ordinal data, ranked data, or outliers that we can’t remove
Typical parametric tests can only assess continuous data and the results can be significantly affected by outliers. Conversely, some nonparametric tests can handle ordinal data, ranked data, and not be seriously affected by outliers. Be sure to check the assumptions for the nonparametric test because each one has its own data requirements.
.
No comments:
Post a Comment