Graphic representation of distribution series: polygon, histogram. Indicators of the center of distribution, variability of a sign

Let a sample be drawn from the general population, and X 1 observed P 1 time, X 2 - P 2 times, x k - p to times and is the sample size. Observed values X 1 are called variants, and the sequence of variants is written in ascending order - variation series .

The number of observation variants is called frequency, and its relation to the sample size is called relative frequency.

Definition. Statistical (empirical) law of sample distribution, or simply statistical distribution of the sample name the sequence of options and their corresponding frequencies n i or relative frequencies.

The statistical distribution of a sample is conveniently represented in the form of a frequency distribution table called statistical discrete distribution series:

(the sum of all relative frequencies is equal to one).

Example 1. When measuring in homogeneous groups of subjects, the following samples were obtained: 71, 72, 74, 70, 70, 72, 71, 74, 71, 72, 71, 73, 72, 72, 72, 74, 72, 73, 72.74 ( heart rate). Based on these results, compile a statistical series of frequency distributions and relative frequencies.

Solution. 1) Statistical series of frequency distribution:

Control: 0.1 + 0.2 + 0.4 + 0.1 + 0.2 = 1.

Frequency polygon called a broken line, segments that connect points To construct a frequency polygon, options are laid out on the abscissa axis X 2, and on the ordinate - the corresponding frequencies p i . The points are connected by segments and a frequency polygon is obtained.

Polygon of relative frequencies called a broken line, segments that connect points. To construct a polygon of relative frequencies, options are plotted on the abscissa axis X i , and on the ordinate axis the corresponding frequencies w i. The points are connected by segments and a polygon of relative frequencies is obtained

Example 2. Construct a frequency polygon and a relative frequency polygon based on the data in Example 1.

Solution: Using the discrete statistical distribution series compiled in example 1, we will construct a frequency polygon and a relative frequency polygon:


2. Statistical interval distribution series. bar chart.

A statistical discrete series (or an empirical distribution function) is usually used when there are not too many variants in the sample that are different from each other, or when discreteness for one reason or another is important for the researcher. If the characteristic of the general population X that interests us is distributed continuously or its discreteness is impractical (or impossible) to take into account, then the options are grouped into intervals.


The statistical distribution can also be specified as a sequence of intervals and the frequencies corresponding to them (the sum of frequencies falling within this interval is taken as the frequency corresponding to the interval).

1. R(span) = X max -X ​​min

2. k- number of groups

3. (Sturges formula)

4. a = x min, b = x max

It is convenient to present the resulting grouping in the form of a frequency table, which is called statistical interval distribution series:

Intervals factions ...
Frequencies ...

An analogous table can be formed by replacing frequencies n i relative frequencies.

They are presented in the form of distribution series and are presented in the form.

A distribution series is one of the types of groupings.

Distribution range— represents an ordered distribution of units of the population being studied into groups according to a certain varying characteristic.

Depending on the characteristic underlying the formation of the distribution series, they are distinguished attributive and variational distribution rows:

  • Attributive- are called distribution series constructed according to qualitative characteristics.
  • Distribution series constructed in ascending or descending order of values quantitative characteristic are called variational.
The variation series of the distribution consists of two columns:

The first column provides quantitative values ​​of the varying characteristic, which are called options and are designated . Discrete option - expressed as an integer. The interval option ranges from and to. Depending on the type of options, you can construct a discrete or interval variation series.
The second column contains number of specific option, expressed in terms of frequencies or frequencies:

Frequencies- these are absolute numbers that show how many times a given value of a feature occurs in total, which denote . The sum of all frequencies must be equal to the number of units in the entire population.

Frequencies() are frequencies expressed as a percentage of the total. The sum of all frequencies expressed as percentages must be equal to 100% in fractions of one.

Graphic representation of distribution series

The distribution series are visually presented using graphical images.

The distribution series are depicted as:
  • Polygon
  • Histograms
  • Cumulates
  • Ogives

Polygon

When constructing a polygon, the values ​​of the varying characteristic are plotted on the horizontal axis (x-axis), and frequencies or frequencies are plotted on the vertical axis (y-axis).

The polygon in Fig. 6.1 is based on data from the micro-census of the population of Russia in 1994.

6.1. Household size distribution

Condition: Data is provided on the distribution of 25 employees of one of the enterprises according to tariff categories:
4; 2; 4; 6; 5; 6; 4; 1; 3; 1; 2; 5; 2; 6; 3; 1; 2; 3; 4; 5; 4; 6; 2; 3; 4
Task: Construct a discrete variation series and depict it graphically as a distribution polygon.
Solution:
IN in this example options is the employee's tariff category. To determine frequencies, it is necessary to calculate the number of employees with the corresponding tariff category.

The polygon is used for discrete variation series.

To construct a distribution polygon (Fig. 1), we plot the quantitative values ​​of the varying characteristic—options—on the abscissa (X) axis, and frequencies or frequencies on the ordinate axis.

If the values ​​of a characteristic are expressed in the form of intervals, then such a series is called interval.
Interval series distributions are depicted graphically in the form of a histogram, cumulate or ogive.

Statistical table

Condition: Data on the size of deposits is given 20 individuals in one bank (thousand rubles) 60; 25; 12; 10; 68; 35; 2; 17; 51; 9; 3; 130; 24; 85; 100; 152; 6; 18; 7; 42.
Task: Construct an interval variation series with equal intervals.
Solution:

  1. The initial population consists of 20 units (N = 20).
  2. Using the Sturgess formula, we determine the required number of groups used: n=1+3.322*lg20=5
  3. Let's calculate the value of the equal interval: i=(152 - 2) /5 = 30 thousand rubles
  4. Let's divide the initial population into 5 groups with an interval of 30 thousand rubles.
  5. We present the grouping results in the table:

With such a recording of a continuous characteristic, when the same value occurs twice (as the upper limit of one interval and the lower limit of another interval), then this value belongs to the group where this value acts as the upper limit.

bar chart

To construct a histogram, the values ​​of the boundaries of the intervals are indicated along the abscissa axis and, based on them, rectangles are constructed, the height of which is proportional to the frequencies (or frequencies).

In Fig. 6.2. shows a histogram of the distribution of the Russian population in 1997 by age group.

Rice. 6.2. Distribution of the Russian population by age groups

Condition: The distribution of 30 employees of the company by monthly salary is given

Task: Display the interval variation series graphically in the form of a histogram and cumulate.
Solution:

  1. The unknown boundary of the open (first) interval is determined by the value of the second interval: 7000 - 5000 = 2000 rubles. With the same value we find the lower limit of the first interval: 5000 - 2000 = 3000 rubles.
  2. To construct a histogram in a rectangular coordinate system, we plot along the abscissa axis the segments whose values ​​correspond to the intervals of the varicose series.
    These segments serve as the lower base, and the corresponding frequency (frequency) serves as the height of the formed rectangles.
  3. Let's build a histogram:

To construct cumulates, it is necessary to calculate the accumulated frequencies (frequencies). They are determined by sequentially summing the frequencies (frequencies) of previous intervals and are designated S. The accumulated frequencies show how many units of the population have a characteristic value no greater than the one under consideration.

Cumulates

The distribution of a characteristic in a variation series over accumulated frequencies (frequencies) is depicted using a cumulate.

Cumulates or a cumulative curve, unlike a polygon, is constructed from accumulated frequencies or frequencies. In this case, the values ​​of the characteristic are placed on the abscissa axis, and accumulated frequencies or frequencies are placed on the ordinate axis (Fig. 6.3).

Rice. 6.3. Cumulates of household size distribution

4. Let's calculate the accumulated frequencies:
The cumulative frequency of the first interval is calculated as follows: 0 + 4 = 4, for the second: 4 + 12 = 16; for the third: 4 + 12 + 8 = 24, etc.

When constructing a cumulate, the accumulated frequency (frequency) of the corresponding interval is assigned to its upper limit:

Ogiva

Ogiva is constructed similarly to a cumulate with the only difference being that the accumulated frequencies are placed on the abscissa axis, and the characteristic values ​​are placed on the ordinate axis.

A type of cumulate is a concentration curve or Lorentz plot. To construct a concentration curve, a scale scale in percentages from 0 to 100 is plotted on both axes of the rectangular coordinate system. At the same time, the accumulated frequencies are indicated on the abscissa axis, and the accumulated values ​​of the share (in percent) by volume of the characteristic are indicated on the ordinate axis.

The uniform distribution of the characteristic corresponds to the diagonal of the square on the graph (Fig. 6.4). With an uneven distribution, the graph represents a concave curve depending on the level of concentration of the trait.

6.4. Concentration curve

Frequency polygon

Let us be given a distribution series written using a table:

Picture 1.

Definition 1

Frequency polygon-- a broken line that connects the points $(x_m,n_m)$ ($m=1,2,\dots ,m)$.

That is, to construct a frequency polygon, it is necessary to plot the variant values ​​on the abscissa axis, and the corresponding frequencies along the ordinate axis. The resulting points are connected by a broken line:

Figure 2. Frequency polygon.

In addition to ordinary frequency, there is also the concept of relative frequency.

We obtain the following table of distribution of relative frequencies:

Figure 3.

Definition 2

Relative frequency polygon-- a broken line that connects the points $(x_m,W_m)$ ($m=1,2,\dots ,m)$.

That is, to construct a frequency polygon, it is necessary to plot the variant values ​​on the abscissa axis, and the corresponding relative frequencies along the ordinate axis. The resulting points are connected by a broken line:

Figure 4. Relative frequency polygon.

Frequency histogram

In addition to the concept of a polynomial for continuous values, there is the concept of a histogram.

Note that the area of ​​one such rectangle is $\frac(n_ih)(h)=n_i$. Therefore, the area of ​​the entire figure is equal to $\sum(n_i)=n$, that is, equal to the sample volume.

Definition 4

Relative frequency histogram-- a stepped figure consisting of rectangles with a base -- partial intervals of length $h$ and heights $\frac(W_i)(h)$:

Figure 6. Relative frequency histogram.

Note that the area of ​​one such rectangle is $\frac(W_ih)(h)=W_i$. Therefore, the area of ​​the entire figure is $\sum(W_i)=W=1$.

Examples of problems for constructing a polygon and a histogram

Example 1

Let the frequency distribution have the form:

Figure 7.

Construct a polygon of relative frequencies.

Let us first construct a series of relative frequency distributions using the formula $W_i=\frac(n_i)(n)$

Grouping- this is the division of a population into groups that are homogeneous according to some characteristic.

Purpose of the service. Using the online calculator you can:

  • build a variation series, build a histogram and polygon;
  • find indicators of variation (average, mode (including graphically), median, range of variation, quartiles, deciles, quartile differentiation coefficient, coefficient of variation and other indicators);

Instructions. To group a series, you must select the type of variation series obtained (discrete or interval) and indicate the amount of data (number of rows). The resulting solution is stored in Word file(see example of statistical data grouping).

If the grouping has already been carried out and the discrete variation series or interval series, then you need to use the online calculator Variation Indices. Testing the hypothesis about the type of distribution is carried out using the service Studying the distribution form.

Types of statistical groupings

Variation series. In the case of observations of a discrete random variable, the same value can be encountered several times. Such values ​​x i of a random variable are recorded indicating n i the number of times it appears in n observations, this is the frequency of this value.
In the case of a continuous random variable, grouping is used in practice.
  1. Typological grouping- this is the division of the qualitatively heterogeneous population under study into classes, socio-economic types, homogeneous groups of units. To build this grouping, use the Discrete variation series parameter.
  2. A grouping is called structural, in which a homogeneous population is divided into groups that characterize its structure according to some varying characteristic. To build this grouping, use the Interval series parameter.
  3. A grouping that reveals the relationships between the phenomena being studied and their characteristics is called analytical group(see analytical grouping of series).

Example No. 1. Based on the data in Table 2, construct distribution series for 40 commercial banks of the Russian Federation. Using the resulting distribution series, determine: profit on average per commercial bank, credit investments on average per commercial bank, modal and median value of profit; quartiles, deciles, range of variation, mean linear deviation, standard deviation, coefficient of variation.

Solution:
In chapter "Type of statistical series" select Discrete series. Click Insert from Excel. Number of groups: according to Sturgess formula

Principles for constructing statistical groupings

A series of observations ordered in ascending order is called a variation series. Grouping feature is a characteristic by which a population is divided into separate groups. It is called the basis of the group. The grouping can be based on both quantitative and qualitative characteristics.
After determining the basis of the grouping, the question of the number of groups into which the population under study should be divided should be decided.

Using personal computers To process statistical data, grouping of object units is carried out using standard procedures.
One such procedure is based on the use of the Sturgess formula to determine the optimal number of groups:

k = 1+3.322*log(N)

Where k is the number of groups, N is the number of population units.

The length of partial intervals is calculated as h=(x max -x min)/k

Then the numbers of observations falling into these intervals are counted, which are taken as frequencies n i . Few frequencies, the values ​​of which are less than 5 (n i< 5), следует объединить. в этом случае надо объединить и соответствующие интервалы.
The middle values ​​of the intervals x i =(c i-1 +c i)/2 are taken as new values.

Example No. 3. As a result of a 5% random sample, the following distribution of products by moisture content was obtained. Calculate: 1) average percentage of humidity; 2) indicators characterizing humidity variations.
The solution was obtained using a calculator: Example No. 1

Construct a variation series. Based on the found series, construct a distribution polygon, histogram, and cumulate. Determine the mode and median.
Download solution

Example. According to the results of sample observation (sample A, Appendix):
a) make a variation series;
b) calculate relative frequencies and accumulated relative frequencies;
c) build a polygon;
d) create an empirical distribution function;
e) plot the empirical distribution function;
f) calculate numerical characteristics: arithmetic mean, dispersion, standard deviation. Solution

Based on the data given in Table 4 (Appendix 1) and corresponding to your option, do:

  1. Based on the structural grouping, construct variational frequency and cumulative distribution series using equal closed intervals, taking the number of groups equal to 6. Present the results in table form and display graphically.
  2. Analyze the variation series of the distribution by calculating:
    • arithmetic mean value of the characteristic;
    • mode, median, 1st quartile, 1st and 9th decile;
    • standard deviation;
    • the coefficient of variation.
  3. Draw conclusions.

Required: rank the series, construct an interval distribution series, calculate the average value, variability of the average value, mode and median for the ranked and interval series.

Based on the initial data, construct a discrete variation series; present it in the form of a statistical table and statistical graphs. 2). Based on the initial data, construct an interval variation series with equal intervals. Choose the number of intervals yourself and explain this choice. Present the resulting variation series in the form of a statistical table and statistical graphs. Indicate the types of tables and graphs used.

In order to determine the average duration of customer service in a pension fund, the number of clients of which is very large, a survey of 100 clients was conducted using a random non-repetitive sampling scheme. The survey results are presented in the table. Find:
a) the boundaries within which, with probability 0.9946, the average service time for all clients of the pension fund is contained;
b) the probability that the share of all fund clients with a service duration of less than 6 minutes differs from the share of such clients in the sample by no more than 10% (in absolute value);
c) the volume of repeated sampling, in which with a probability of 0.9907 it can be stated that the share of all fund clients with a service duration of less than 6 minutes differs from the share of such clients in the sample by no more than 10% (in absolute value).
2. According to the data of task 1, using Pearson’s X 2 criterion, at a significance level of α = 0.05, test the hypothesis that the random variable X - customer service time - is distributed according to the normal law. Construct a histogram of the empirical distribution and the corresponding normal curve in one drawing.
Download solution

A sample of 100 elements is given. Necessary:

  1. Construct a ranked variation series;
  2. Find the maximum and minimum terms of the series;
  3. Find the range of variation and the number of optimal intervals for constructing an interval series. Find the length of the interval of the interval series;
  4. Construct an interval series. Find the frequencies of sample elements falling into the composed intervals. Find the midpoints of each interval;
  5. Construct a histogram and frequency polygon. Compare with normal distribution (analytically and graphically);
  6. Plot the empirical distribution function;
  7. Calculate sample numerical characteristics: sample mean and central sample moment;
  8. Calculate approximate values ​​of standard deviation, skewness and kurtosis (using the MS Excel analysis package). Compare approximate calculated values ​​with exact ones (calculated using MS Excel formulas);
  9. Compare selected graphical characteristics with the corresponding theoretical ones.
Download solution

The following sample data are available (10% sample, mechanical) on product output and the amount of profit, million rubles. According to the original data:
Task 13.1.
13.1.1. Construct a statistical series of distribution of enterprises by the amount of profit, forming five groups with equal intervals. Construct distribution series graphs.
13.1.2. Calculate the numerical characteristics of the distribution series of enterprises by the amount of profit: arithmetic mean, standard deviation, dispersion, coefficient of variation V. Draw conclusions.
Task 13.2.
13.2.1. Determine the boundaries within which, with probability 0.997, the amount of profit of one enterprise in the general population lies.
13.2.2. Using Pearson's x2 test, at the significance level α, test the hypothesis that the random variable X - the amount of profit - is distributed according to a normal law.
Task 13.3.
13.3.1. Determine the coefficients of the sample regression equation.
13.3.2. Establish the presence and nature of the correlation between the cost of manufactured products (X) and the amount of profit per enterprise (Y). Construct a scatterplot and regression line.
13.3.3. Calculate the linear correlation coefficient. Using Student's t-test, test the significance of the correlation coefficient. Draw a conclusion about the close relationship between factors X and Y using the Chaddock scale.
Guidelines. Task 13.3 is performed using this service.
Download solution

Task. The following data represents the time spent by clients on concluding contracts. Construct an interval variation series of the presented data, a histogram, find an unbiased estimate of the mathematical expectation, a biased and unbiased estimate of the variance.

Example. According to Table 2:
1) Construct distribution series for 40 commercial banks of the Russian Federation:
A) in terms of profit;
B) by the amount of credit investments.
2) Using the obtained distribution series, determine:
A) average profit per commercial bank;
B) credit investments on average per commercial bank;
C) modal and median value of profit; quartiles, deciles;
D) modal and median value of credit investments.
3) Using the distribution rows obtained in step 1, calculate:
a) range of variation;
b) average linear deviation;
c) standard deviation;
d) coefficient of variation.
Complete the necessary calculations in tabular form. Analyze the results. Draw conclusions.
Plot graphs of the resulting distribution series. Determine the mode and median graphically.

Solution:
To build a grouping with equal intervals, we will use the service Grouping statistical data.

Figure 1 – Entering parameters

Description of parameters
Number of lines: number of input data. If the row size is small, indicate its quantity. If the selection is large enough, then click the Insert from Excel button.
Number of groups: 0 – the number of groups will be determined by the Sturgess formula.
If a specific number of groups is specified, specify it (for example, 5).
Type of series: Discrete series.
Significance level: for example 0.954 . This parameter is set to determine the confidence interval of the mean.
Sample: For example, 10% mechanical sampling was carried out. We indicate the number 10. For our data we indicate 100.

For clarity, various statistical distribution graphs are constructed, in particular, a polygon and a histogram.

Definition. Polygon frequencies is called a broken line, the segments of which connect the points (x 1, n 1), (x 2, n 2), ..., (x k, n k).

To construct a frequency polygon, the x i options are plotted on the abscissa axis, and the corresponding frequencies n i are plotted on the ordinate axis. Points (x i, n i) are connected by straight lines and a frequency polygon is obtained.

Definition. Polygon of relative frequencies called a broken line, the segments of which connect the points (x 1, w 1), (x 2, w 2), ..., (x k, w k).

To construct a frequency polygon, options x i are plotted on the abscissa axis, and w i are plotted on the ordinate axis. Points (x i, w i) are connected by straight lines and a polygon of relative frequencies is obtained.

The figure shows a polygon of relative frequencies of the following distribution:

Rice. 6. Relative frequency polygon.

In the case of a continuous characteristic, it is advisable to construct a histogram, for which the interval in which all observed values ​​of the characteristic are contained is divided into several partial intervals of length h and for each partial interval n i is found - the sum of the frequencies of the variants falling in the i-th interval.

Definition. Frequency histogram called a stepped figure consisting of rectangles, the bases of which are partial intervals of length h, and the heights are equal to the ratio (frequency density).

Rice. 7. Frequency histogram.

To construct a frequency histogram, partial intervals are laid out on the abscissa axis, and segments parallel to the abscissa axis are drawn above them at a distance of .

The area of ​​the i-th partial rectangle is equal to =─ the sum of the frequencies of the variant of the i-th interval; therefore, the area of ​​the frequency histogram is equal to the sum of all frequencies, that is, the sample size n.

Figure 2 shows a frequency histogram of the n=100 volume distribution given in Table 1.

Partial interval,

length h=5

Frequency Density

Definition. Relative frequency histogram called a stepped figure consisting of rectangles, the bases of which are partial intervals of length h, and the heights are equal to the ratio (relative frequency density).

To construct a histogram of relative frequencies, partial intervals are plotted on the abscissa axis, and segments parallel to the abscissa axis at a distance are drawn above them. The area of ​​the i-th partial rectangle is equal to =─ the relative frequency of the variants falling into the i-th interval. Consequently, the area of ​​the histogram of relative frequencies is equal to the sum of all relative frequencies, that is, unity.

    As a result of the sampling, the following frequency distribution table was obtained.

Construct polygons of frequencies and relative frequency distributions.

First, let's build a frequency polygon.

Rice. 8. Frequency range.

To construct a polygon of relative frequencies, we will find relative frequencies by dividing the frequencies by the sample size n.

n = 3 + 10 + 7 = 20.

We get

Let's construct a polygon of relative frequencies.

Rice. 9. Relative frequency polygon.

2. Construct histograms of frequencies and relative frequency distributions.

Let's find the frequency density:

Partial interval,

length h = 3

Sum of frequencies partial interval option

Frequency Density

mob_info