Statistics, estimators and estimators 1. Introduction

3. Calculating the sample size

1. Introduction

The statistic has a prominent role the notion of random sample.

A random sample of size n is:

A collection of n random variables.

All the same distribution.

All independent.

This definition idealized repeating the operation n times the observation of the same random variable repetition being independent of one another.

The collection from which we extract the random sample, is called POPULATION. We intend to take a sample, is to make inference. This term is used in statistics to denote the process by which we make claims about general values of the population through the numbers you see in the sample.

Perhaps an example to clarify the ideas. Suppose we observe the process of manufacture of the pellets “” which will put the container roll on deodorants “”. Not all balls are going to have the same diameter, if we choose at random a ball, have a value for the diameter which is a random variable. We can assume that the diameters are normally distributed, because of our experience with the process, we know that the population standard deviation is 4 mm (approximately). But, by experience, we know that the average diameter mismatch can vary by production machinery. So we have:

A population that all balls are produced.

A known population parameter (or almost) that is the standard deviation.

Another parameter whose value is unknown: average.

To try to know the value of the unknown parameter, we take a sample of the pellets. Suppose that are 100 beads in the sample. With a precision instrument, and very carefully, we measured the diameters of the 100 balls of the sample and calculate their average.

Tell us the value of the sample mean from the mean of the population

On the one hand, definitely NOT sample mean will be equal to the population.

On the other hand, we have better information regarding the average population we draw the sample. Any other information is no more than gossip.

Finally, it would be very strange if the population of beads has, say, an average diameter of 45 mm, we touched 100 balls in the sample with an average of, say, 32 mm. Notice it does not say impossible, but odd or strange.

Also, if someone asked you as much is the average diameter of the population of balls I would answer by saying that we have seen the value in the sample.

We should add to our answer any warnings as “more or less”, or “ about”.

At a calculated with data from a sample we call STATISTICS. When we use a statistic to play the role of saying about the value of a population parameter, we call this estimator. When you walk a little pedantic point estimate call (to say “ timely” we mean that we estimate the parameter using a single value).

Returning to Roll the balls of “ on”. If the sample of 100 pellets yields an average value of 43.5 mm, we would say that we estimate the population average of 43.5 mm.

Construct yourself as an example of the beads. In your example, describe

A population.

A parameter for the population.

A sample.

A statistic that serves as an estimator.

Probabilistic characteristics of an estimator

When there is a formula for estimating and applied to a random sample, the result is uncertain, ie the estimators are random variables.

For example if you receive a shipment of objects that can

Be ready for use or

Defective.

We select, at random, some of them to give us an idea of the proportion of defective shipment. The parameter of interest is the proportion of defectives in the entire population, but what we see is the proportion of defectives in the sample. The value of the proportion in the sample is a random variable whose distribution is related directly to the binomial (if it were the number of defective, would binomial).

Like any random variable, the estimator has

Probability distribution.

Expected value.

Standard deviation / variance.

Expected value of an estimator and bias

The expected value of an estimator gives a value around which most likely is the value of the estimator. To give an example, if we knew that the expected value of a statistic is 4, this would mean that by taking a sample:

Do not believe the statistic value will be 4.

But do we believe that the value of the statistic will be away for four.

Since it is very likely that the value of the estimate is close to its expected value, a very desirable property is that the expected value of the estimator matches the parameter that is to be estimated. At least, we want the expected value does not differ much from the estimated parameter.

For that reason it is important that the amount, technically called bias. The bias is the difference between the expected value of the estimator and the parameter estimates.

If bias 0, we say that the estimator is instigated and this is a good characteristic for an estimator. An estimator that is instigated has a high probability of taking a value close to the value of the parameter.

Variance of an estimator

Another important property of an estimator is its variance (or its square root, the standard deviation).

The importance of the standard deviation is that we can make sense of the closeness of the numerical value of the estimator to its expected value.

The lower the standard deviation (or variance) of an estimator, the more likely that its value in a specific sample is closer to the expected value. To clarify this, consider two estimators T1 and T2, assume that both are instigated and assume that the variance of T1 is less than T2 does this mean Simply in a fixed parameter value, T1 values are more probable than T2. So let’s find closer T1 parameter value to T2. This makes our preferences are with T1.

When an estimator has a smaller variance than another we say that the estimator is more efficient.

On the board we see some estimators instigated:

The proportion shown as the ratio estimator populations.

The sample mean as an estimator of the expected value stocks.

The sample variance as the variance estimator of the population.

The probability distribution of a statistic

Perhaps the most important statistic is the Central Limit Theorem. This result indicates that, for the statistical average of the sample

The expected value is the average of the population.

The variance is equal to that of the population divided by the number of elements in the sample.

The probability distribution is normal.

This theorem is very important because it allows computing probabilities about where is the average value of the sample. It’s just a matter of using the normal table being careful to standardize using suitable standard deviation of the population is divided by the square root of the number of elements in the sample.

In the lounge we do in detail, examples of these calculations.

Error estimate a direct measure

The error estimate of a measure is always a subjective component. Indeed, who better than an experienced observer to know what the good approximation confidence you deserve as you just took. There is a set of well-founded and unalterable rules for determining the error of a measure in every conceivable case. Many times it is so important to record how they got error as its value.

However, the application of some statistical methods can greatly objectively estimate of random errors. The statistical parameters allows for a population (in this case the set of all measures that can be taken from a magnitude), from a sample (the limited number of steps we can take).

Best value of a set of measures

Suppose we measure a magnitude n number of times. Due to the existence of random errors, the measures will in general n different

The most reasonable method to determine the best value of these measures is to take the average value. Indeed, if the errors are due to chance, as likely to occur by deficient or excessive, and to make the average will be offset, at least partially. The mean value is defined by:

To view the graph select the work Download top menu

and this is the value that will occur as a result of the measures.

2. Statistical estimation Types

Estimation of parameters:

A major problem of statistical inference is the estimate of the population parameters, briefly parameters (such as mean and variance of the population), the corresponding sample statistics or statistical simply (such as the mean and variance of the shown). Unbiased estimates:

If the average of the dispersions with a statistical sampling is the same as the corresponding population parameter, the statistic is called unbiased estimator of the parameter, otherwise if not called biased estimator. The corresponding statistical values that are called unbiased estimate, and estimation bias respectively.

Example 1: the mean of the sampling distribution of means and, half of the population. Therefore, the sample mean is an unbiased estimate of the population mean.

Example 2. The means of the sampling distributions of the variables is:

To view the graph select the work Download top menu

We found, so that is an unbiased estimate of. However, s is a biased estimate of. In terms of hope that a statistician might say is instigated because To view the graph select the work Download top menu

Efficient Estimation:

If the two sample distributions have the same statistical average (or hope), the lowest variance estimator is called a half efficient, while the other is called an estimator inefficient, respectively.

If we consider all possible statistical sampling distributions which have the same mean, minimum variance that is sometimes called, the estimator of maximum efficiency, the best estimator bone.

Example:

Sampling distributions mean and median both have the same mean, namely, the average population. However, the variance of the sampling distribution means is smaller than the variance of the sampling distribution medium. Therefore the sample average efficient gives an estimate of the population mean, while the median of the sample gives an estimate of it inefficient.

Of all the statistics that estimate the population mean, the sample mean provides the best (most efficient) estimate.

In practice, use estimates often inefficient because of the relative ease with which some of them are obtained.

Point estimates and interval estimates, its reliability:

An estimate of a population parameter given by a single number is called a point estimate of the parameter. An estimate of a population parameter given by two points, between which can be fitted to the parameter considered is called a parameter estimation interval.

Interval Estimates indicate an estimation accuracy and are therefore preferable to point estimates

Example:

If we know that a distance measured as 5.28 meters (m), we are giving a point estimate. Moreover, if we say that the distance is 5.28 0.03 m, (bone, which is between 5.25 and 5.31 m), we are giving an interval estimate

The margin of error or the perception of the reliability estimate informs us.

Estimates of confidence intervals for population parameters:

Sean and the mean and standard deviation (error) of the sampling distribution of a statistic S. Then, if the sampling distribution of s is approximately normal (as we have seen is true for many statisticians if the sample size is N 30), we can expect to find a estadisco actual sample S that at intervals around 68.27%, 95.45% and 99.7% of the time remaining, respectively.

Table 1. Corresponds to the confidence levels used in practice. For confidence levels that are not on the table, Zc values can be found by the tables of areas under the normal curve.

Confidence intervals for the mean:

If the statistical mean s of the sample, then the confidence limits respectively. More generally confidence limits for the mean estimate population m is given by using the values of

If sampling is infinite population therefore given by:

To view the graph select the work Download top menu

If the sampling is without replacement from a population of size Np.

Example

Halar laos confidence limits of 98% and 90%. For a bag diameters

Solution:

Let Z = Zc such that the area under the normal curve to the right is 1%. Then, by symmetry the left side area Z = Zc. as the total area under the curve is 1, Zc = 0.49 therefore Zc = 2.33. then the confidence limit is 98% are X = 2.33s on = 0.824 2.33 (0.042 / O200) = 0.824 0.069 cm.

Generally, the standard deviation of the population is not known. Thus, the limits used for the estimation so S is satisfactory if N> = 30, if approximation is poor and must be employing the theory of small samples.

3.Clculo sample size

In determining the size of a sample to be achieved must take into account several factors: the type of sampling, the parameter to be estimated, the allowable sampling error, the population variance and the confidence level. So before presenting some simple cases of sample size calculation delimitemos these factors.

Parameter. Are the measures or data obtained on population.

Statistical. The data and measurements obtained on a sample and therefore an estimate of the parameters.

Sampling error, or standard estimation. It’s the difference between a statistic and its corresponding parameter. It is a measure of the variability of repeated samples estimates on the value of the population, gives us a clear idea of how far and with what probability estimates based on a sample away from the value that would have been obtained by a complete census. Whenever you make a mistake, but the nature of the investigation will tell us to what extent we can commit it (the results are subject to sampling error and confidence intervals that vary sample to sample). Varies calculated at the beginning or end. A statistician will be more precise about his mistake and therefore is smaller. We could say that is the deviation of the sampling distribution of a statistic and its reliability.

Confidence level. Probability that the predicted estimate fits reality. Any information that we collect is distributed according to a probability law (Gaussian or Student) and call a confidence level the probability that the interval constructed around a statistical capture the true value of the parameter.

Population Variance. When a population is more homogeneous variance is lower and the number of interviews needed to build a scale model of the universe, or population, will be smaller. Usually there is an unknown and estimate it using data from previous studies.

Sample size for estimating the average population

Consider the steps necessary to determine the size of a sample using simple random sampling. This requires from two assumptions: first confidence level that we want to work, and secondly, what is the maximum error that we are willing to admit in our estimate. So the steps are:

Consider the steps necessary to determine the size of a sample using simple random sampling. This requires from two assumptions: first confidence level that we want to work, and secondly, what is the maximum error that we are willing to admit in our estimate. So the steps are:

1. – Get imagining sample size N-> a

To view the graph select the work Download top menu

Where:

To view the graph select the work Download top menu:

z corresponding to the chosen confidence level

To view the graph select the work Download top menu

: Population variance

e: maximum error

2. – Check compliance

To view the graph select the work Download top menu

If this condition is satisfied the process ends here, and that is the right size that we sampled.

If not satisfied, we move to the third stage:

3. – Get the size of the sample according to the following section: No you formula:

To view the graph select the work Download top menu

For example: The Department of Labour is planning a study to know the interest of the average weekly hours worked by women in domestic service. The sample will be drawn from a population of 10 000 women listed in the Social Security records and which is known through a pilot study that its variance is 9648. Working with a confidence level of 0.95 and being willing to admit a maximum error of 0.1, what should be the sample size we use.

We look at the tables of the normal curve value corresponding to the confidence level chosen: = 1.96 and follow the steps outlined above.

To view the graph select the work Download top menu

3. –

To view the graph select the work Download top menu

Sample size for estimating the proportion of the population

To calculate the sample size for estimation of populations proportions have to take into account the same factors as in the case of the mean. The formula that will allow us to determine the sample size is:

To view the graph select the work Download top menu

: Z corresponding to the chosen confidence level

P: proportion of a category of variable

e: maximum error

N: population size

Continuing the study proposed in the previous section, suppose we try to estimate the proportion of women working 10 hours or more daily. In a pilot study it was concluded that P = 0.30, we set the confidence level at 0.95 and the maximum error 0.02.

To view the graph select the work Download top menu

Job Submitted by:

Lida Burbano