Basic statistical annotations

Home » Mathematics » Basic statistical annotations
Mathematics, Statistics No Comments

* Presentations tables:

* Graphical methods:

* Bibliography

* Presentations tables:

First define it is a table and then work the different kinds of tables requested:

A table is a table consisting of the joint arrangement, orderly and usually totaled, the amounts or total frequencies obtained in the tabulation of data for categories or shapes or several interrelated variables. Tables systematize quantitative results and provide an overview numerical synthetic and observed global phenomenon and the relationships between its various characteristics or variables. It culminates concrete and definitely the qualifiers of quantitative research.

Given the definition of what is a table, we can work then each of the types of orders tables:

* Data Entry Table: A table in which only show the data obtained from scientific research or experiment. It is the simplest table and used when not needed more information about the data, these tables are constructed by tabulating the data, this procedure is relatively simple to do we deal with a set of statistical data obtained at record the results of a series of n repetitions of a random experiment or observation, assuming that repetitions are mutually independent and under uniform conditions, it is important to say that the result of each observation can be expressed numerically, for these tables input can work with one or more variables, so that our statistical material consists of n observed values of the variable Xj.

The observed values are usually recorded first in a list, if the number of observations does not exceed 20 or 30, these data are recorded in ascending order of magnitude.

With the data in this table may be different graphs and certain numerical characteristics calculated as the mean, median, etc..

EJ: Group in a data table

10, 1, 6, 9, 2, 5, 7, 4, 3, 8

* Frequency tables: A frequency table is formed by the categories or values of a variable and their frequencies. This table is the same as a frequency distribution. This table is created by the tabulation and aggregation, which is a simple method as we had begun to see in the data table, the same procedure was performed as described above tab if the number of observed values for the variable, works with a single variable, repeated discounting are small, if there are repeated frequency f is the number of repetitions of a given X value, however, when the data set is larger, it is laborious work directly with individual values observed and then carried out, generally, any grouping as a preliminary step, before any other data processing. The rules to carry out the group are different depending on the variable, discrete or continuous, for a discrete variable is often convenient to make a table whose first column containing all the values of the variable X represented in the material, and in the second, frequency f that has appeared each value of X in the observations.

For a continuous variable, the clustering procedure is somewhat more complicated. A suitable range is taken on the axis of the variable that contains the n observed values, and divide the interval into a number of class intervals. All comments belong to the same class interval are grouped and counted, and the resulting number is the frequency for that class interval, then forms a table whose first column shows the limits of each class interval, and appear in the corresponding second frequencies.

These kinds of tables are the most widely used and provide more information on the data tables of data inputs, effectively, a table of this type given in abbreviated form, complete information about the distribution of the observed values. These can be used more fully graphical methods like arithmetic methods.

Eg in a table Group 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 5

Grouped in the following table heights: 160, 168, 175, 183, 170, 164, 170, 184, 171, 168, 187, 161, 183, 175, 185, 186, 187, 164, 165, 175, 162, 188, 169, 163, 166, 172, 173, 167, 174, 176, 178, 179, 177

* Crosstabs: Also called contingency tables are those tables of data on two variables, formed in the header rows, categories or values of a variable and the columns by the other , and in the cells of the table, for the frequencies or number of elements that bring together the two categories or values of the two variables that cross in each box. For tabulation of pooled material simultaneous observations of two random variables need described as a table as described above, the rules for grouping are the same as in the case of a single variable.

Such statistical information tables provide two interrelated events, is useful in cases where experiments are dependent another experiment, later appear more bivariate statistical analysis applications.



* Graphical methods:

First define what is a statistical chart or diagram

A diagram is a kind of schematic, consisting of lines, figures, maps, used to represent either scale or statistical data as a proportion, or the elements of a system, the steps of a process and the divisions or subdivisions a classification. Among the roles diagrams can indicate the following:

* They make visible the data, systems and processes

* They show their variations and their historical or spatial.

* Can demonstrate the relationships between the various elements of a system or process and represent the correlation between two or more variables.

* Systematize and synthesize data, systems and processes.

* Clarify and tables complement the exhibitions and theoretical or quantitative.

* The study of available and showing relationships may suggest new hypotheses.

Some of the most important diagrams are a tree diagram, chart areas or surfaces, strip chart, bar chart, block diagram, pie chart, pie chart polar, scatter plot, stem-and-leaf plot, histogram and box and whisker plots or boxplots.

Graphics 2.1 Univariate: Univariate graphics to work we must first know what the univariate statistical analysis and after that the methods work orders

Statistical analysis operating with data on a single variable or frequency distribution and tries to determine their statistical properties. The a.e.u. provides the analyst measures representative of the distribution or average, dispersion index distribution data, procedures for normalizing the data, some inequality measures data relating to others and finally measures the asymmetry of the distribution.

* Dot plots: A simple linear variation diagram of which is formed by straight lines or curves resulting from the representation in a coordinate axis of frequency distributions, constructed by placing this on the x axis values for and variable in the Y-axis value for the frequency corresponding to this value. Provides mainly information regarding frequencies. This is used when you only need information on the frequency.

When the sample is grouped by intervals working with the brand of class interval class, the class mark is the midpoint of the interval

EJ: Duration of neon tubes

* Graphics stem and leaf: it is a quick way to get a visual representation of illustrative data set, to construct a stem and leaf first select one or more digits for the values of stem, the final digit or digits become leaves, then makes a list of values in a vertical column stalk. Continuing to record sheet for each observation by the corresponding value of stem, finally indicated units stems and leaves somewhere in the diagram, this is used for large lists and is a method of displaying data summary, has the disadvantage that provides only data, and not appear anywhere frequency information and other important data.

Ex: make a stem and leaf data for the following distances in yards from a golf course

6435 6464 6433 6470 6526 6527 6506 6583 6605 6694 6614 6790 6770 6700 6798 6770 6745 6713 6890 6870 6873 6850 6900 6927 6936 6904 7051 7005 7011 7040 7050 7022 7131 7169 7168 7105 7113 7165 7280 7209

* Bar charts: name given to the diagram used to plot discrete frequency distributions ungrouped. So called because the frequencies of each category of the distribution are included by strokes or proportional length columns, separated from each other. There are three main types of bar charts:

* Barra simple facts are used to plot only

* Multiple Bars: it is highly recommended to purchase a statistical series with another, for it is simply used different color bars or hatched in the same coordinate plane, one next to the other

* Bars made: in this method of plotting the bars of the second series are placed on top of the bars of the first series in the form in question.

The bar chart provides comparative information and this is mainly primary use, this diagram also shows the information about the frequencies


Histograms: Used to illustrate binned samples. This consists of rectangles attached to other, base whose vertices coincide with the boundaries of the intervals and center of each interval class brand, we represent the x-axis. The height of each rectangle is proportional to the respective frequency range. This proportionality is applied by means of the following formula

Height of rectangle = relative frequency / length basis

The histogram is used to represent continuous variables that have been grouped in class intervals, the disadvantage of not working

for discrete variables, otherwise it is a useful and practical to show statistical data.


* Boxplots or boxplots: the steps to build it are:

* Draw and mark a horizontal measure shaft

* Build a rectangle whose left edge is above the bottom quarter and whose right edge is above the upper room

* Draw a vertical line segment in the box above the median

* Lines extend from each end of the box furthest to the observations that are still within the corresponding edges 1.5fs

* Draw an open circle to identify each observation that falls between 1.5fs and 3fs edge which is closest to these points are called soft unusual

* Draw a filled circle line to identify each observation to falling over the edge closest 3fs, these points are called unusual extremes

where fs = upper room – lower quarter

this diagram is used as needed as much information about the distribution of data, which has the advantage with respect to other diagrams is that this graph has a center and dispersion characteristics of the data, and has the main disadvantage is that presents no information about the frequencies that have data

EJ: For the following make a box plot: 2.68 3.06 4.31 4.71 5.71 5.99 6.06 7.04 7.17 7.46 7.50 8.27 8.42 8.73 8.84 9.14 9.19 9.21 9.39 11.28 15.19 21.06

Pie Charts * is a graph that is based on a proportionality between the frequency and the center angle of a circle, so that the total frequency corresponds to the central angle of 360 . To construct the following formula applies:

X = relative frequency * 360 / S relative frequency

This is used when working with large data frequencies, and the values of the variable are few, the advantage of this diagram is that it is easy to make and is easily understandable, it has the disadvantage that when the values of the variable are many is almost impossible or rather not report much this diagram and not productive, mainly provides information on the frequency of the data in an understandable and simple.

EJ: represented by a pie chart that shows how often each of the five vowels in this paragraph:

Bivariate graphs 2.2: To work scatterplots, we must first know who is the bivariate statistical analysis and the benefits this has

Statistical analysis bivariate analysis is that it operates with data on two variables and to discover and study their statistical properties. The bivariate statistical analysis is mainly focused on the standardization of frequencies ce values or raw data, determine the existence, degree and direction of the co-variation between the two variables, which is done by calculating the correlation coefficients relevant calculates the covariance or product of the deviations of the two variables relative to their respective means and finally establishes the nature and form of the association between the two variables in the case of interval variables.

Scatterplot *: is a diagram representing graphically arranged in a space, said space points values corresponding to a bivariate distribution correlative joint, these diagrams should be used when we have a statistical analysis bivariate table of bone double data entry, the advantage is that you can have a simple plot a distribution joint bivariate and the main disadvantage is that it does not work if it happens that a tandem repeat



* Sierra Bravo. R. Practical Dictionary of Statistics, Ed Auditorium SA Madrid. Spain, pp. 56-57, 177-187, 427-432.

* Serrano Rodrguez, Javier. Introduction to Statistics. Ed LIDA American university, Bogot, Colombia. Pag 30-49

* Devore, Jay L. Probability and Statistics for Engineering and Science, Ed Thomson, 4th Edition, pp. 7-37.