Basic Statistics

08/06/04

Home
Up
Grades
Basic Statistics
Prehension Overview
Writing A Research Report
COM and Base of Support
Posture Expt.
Catching Expt.
Final Project

 

1. Populations and Parameters

          A population is any large collection of objects or individuals, such as Koreans, Japanese, flowers, or students about which information is desired. A parameter is any summary number, like an average or percentage, which describes the entire population. However, it is impossible to know a population parameter in majority of cases. What we can do is just to estimate the parameter. For example, what is the average height of Koreans? What is the average weight of Americans. Can you measure the heights of Koreans or weights of Americans?

2. Samples and Statistics

          A sample is a representative group drawn from the population. A statistic is any summary number, like an average or percentage, that describes the sample. For example, is a sample proportion.

3. Mathematical Notation

         It is useful to introduce some mathematical notation to some numerical summaries. Just for convenience, variables are commonly given one-letter names: x, y, z,... Since the number of alphabets are limited, we use subscripts to identify the individual values in a set of data: x1, x2, x3,..., xn. Some numerical summaries involve summation, which is represented by the symbol (sigma).

4. Central Tendency

          A central tendency describes the center of a data set or the location of a 'typical value' of a data set. There are a few central tendency measures which are commonly used and we will deal with average (arithmetic mean), median, and mode.

1. Mean

         The most commonly used measure of centrality is the mean. If you have a data set which is x1, x2, x3,..., xn, you can compute the mean of the data set using the following equation.

For example, if the mean of 45, 50, 55, 60, and 65 is . Physical interpretation can be given to the mean which helps to explain its properties. Imagine a horizontal dot plot where all dots are equally sized and each dot is a point mass with the same weight, and that the axis itself has negligible mass. The mean is the position o the axis where the beam will have a balance.

2. Median

         Median is defined as the score corresponding to the 50th percentile. The median is the middle score in the distribution when scores are put in order in size. If there is an even number of scores, the median is computed by averaging the two middle scores. When the distribution score is symmetric, the mean and the median will be equal. So, the data set we already used to compute the mean (45, 50, 55, 60, and 65) will give you the same value for the mean and median which is 55. However, when you have a data set of 3, 4, 4, 4, 5, 6, 7, and 39, the mean will be 9.0 and the median will be 4.5 even if the extreme value 39 is changed to 390000000000000000.

3. Mode

        The mode is the most frequently obtained score in the data set. The mode is at best a rough measure and is generally less useful than the mean or median.

5. Variability

          In addition to general location, there is another important attribute of a distribution of scores, which is called variability. Variability refers to how spread out or scattered the scores in a distribution. Therefore, the minimum possible variability is zero. This will occur only if all of the scores are exactly the same. We will talk about a few variability measures: range, variance, and standard deviation.

1. Range

         The range, R, of a data set is the difference between the largest and the smallest values in the set; that is: R = Xmax-Xmin. The range is very easy to compute.

2. Variance

         Another measure of the spread of values in a data set is a variance which is based on their squared deviations. The variance for can be computed as

This is a basic measure of the variability of any set of data. However, when the data of a sample are used to estimate the variance of the variance of the population from which the sample was drawn, the population variance estimate () is computed as

There is a reason why we use n-1 rather than n. Whenever you use n, the variance is overestimated and it turned out that when n-1 is used, the variance become close to the actual number. It has been proven by statisticians. Although the variance does reflect the spread of values, it is not an easily interpreted quantity, and should not be used as a descriptive numerical summary of the data. The problem is that the variance is defined in terms of squared deviations and therefore its units are the square of the units of the sample values. For example, if you have a data set which is in kg sale and compute the variance of the data, you will get the variance in square kg.

3. Standard deviation

         Since the unit of variance is not convenient, we simply take a square root on the variance and make the unit in the same unit as in raw data. This variability measure is called standard deviation. Therefore, the standard deviation (SD) of data can be computed as

6. Confidence Intervals

 1. General form of confidence interval
         Although we wa
nt to estimate the actual population mean , the sample mean. In confidence intervals, we use certain range within which we can be confident that the actual population mean falls, such as L<<U. The range of values is called a confidence interval. The general form of most confidence intervals is Sample estimate±Margin of Error. Therefore, the lower level L is the estimate-margin of error and upper limit U is estimate+margin of error. We are confident that the value of the population parameter is somewhere between L and U.

2. (1-a)100% t-interval for population mean
  
       Formula in words: Sample mean
±(t-multiplier x standard error)
        
Formula in notation:
                                     
a level and the degree of freedom which is number of sample-1.

3. Determining t-multiplier

Typical t-multipliers in science

Confidence Coefficient
(1-a
)

Confidence Level
(1-a
) x 100%

0.90 90% 0.950
0.95 95% 0.975
0.99 99% 0.995

 

 

 

 

 

 

7. Hypothesis Testing

1. General idea

2. Making the decision

3. Important point

4. Errors in hypothesis testing

5. Possible hypotheses about mean

 

 

 

 

 

Home Up Contact Me Education & Work Scholarly Work Hand and Finger Biomechanics Motor Control Math and Stat Authorized pages Temporary

This site was last updated 10/18/03