Abstracts Statements Story

General population and sample study. Statistical significance

A sample is a set of data taken using certain procedures from a population for exploratory analysis. Representativeness is the property of reproducing the idea of ​​the whole by its part. In other words, this is the possibility of extending the idea of ​​a part to the whole, which includes this part.

Representativeness of a sample is an indicator that the sample must fully and reliably reflect the characteristics of the population of which it is a part. It can also be defined as the property of a sample to most fully represent the characteristics of the population that are significant from the point of view of the purpose of the study.

Let us assume that the general population is all school students (900 people from 30 classes, 30 people in each class). The object of the study is the attitude of schoolchildren towards smoking. A sample population consisting of 90 students will only represent the entire population much worse than a sample of the same 90 students, which would include 3 students from each class. The main reason is the unequal age distribution. Thus, in the first case, the representativeness of the sample will be low. In the second case - high.

In sociology they say that there is representativeness of a sample and its non-representativeness.

An example of an unrepresentative sample is a classic case that occurred in 1936 in the United States during the presidential election.

Literary Digest, which had been very successful in predicting the results of previous elections, was wrong in its forecasts this time, although it sent several million written questions to subscribers and to respondents they selected from phone books and car registration lists. Of the 1/4 of the ballots that were returned completed, the votes were distributed as follows: 57% gave preference to the Republican candidate named Alf Landon, and 41% preferred the incumbent President, Democrat Franklin Roosevelt.

In fact, F. Roosevelt won the election, gaining almost 60% of the vote. The Literary Digest's mistake was as follows. They wanted to increase the representativeness of the sample . And since they knew that most of their subscribers identified as Republicans, they decided to expand the sample to include respondents they selected from phone books and car registration lists. But they did not take into account the existing realities and actually selected even more Republican supporters, because at the time the middle and upper class could afford to have cars and telephones. And these were mostly Republicans, not Democrats.

There are different types of sampling: simple random, serial, typical, mechanical and combined.

Simple random sampling consists of selecting from the entire population of units being studied at random without any system.

Mechanical sampling is used when there is order in the general population, for example, there is a certain sequence of units of workers, electoral lists, telephone numbers of respondents, numbers of apartments and houses, etc.).

Typical selection is used when the entire population can be divided into groups by type. When working with the population, these can be, for example, educational, age, social groups; when studying enterprises - an industry or a separate organization, etc.

Serial selection is convenient when units are combined into small series or groups. Such a series can be batches of finished products, school classes, and other groups.

Combined sampling involves the use of all previous types of sampling in one or another combination.

Sample

Sample or sample population- a set of cases (subjects, objects, events, samples), using a certain procedure, selected from the general population to participate in the study.

Sample characteristics:

  • Qualitative characteristics of the sample - who exactly we choose and what sampling methods we use for this.
  • Quantitative characteristics of the sample - how many cases we select, in other words, sample size.

Necessity of sampling

  • The object of study is very extensive. For example, consumers of a global company’s products are represented by a huge number of geographically dispersed markets.
  • There is a need to collect primary information.

Sample size

Sample size- the number of cases included in the sample population. For statistical reasons, it is recommended that the number of cases be at least 30-35.

Dependent and independent samples

When comparing two (or more) samples, an important parameter is their dependence. If a homomorphic pair can be established (that is, when one case from sample X corresponds to one and only one case from sample Y and vice versa) for each case in two samples (and this basis of relationship is important for the trait being measured in the samples), such samples are called dependent. Examples of dependent samples:

  • pairs of twins,
  • two measurements of any trait before and after experimental exposure,
  • husbands and wives
  • and so on.

If there is no such relationship between samples, then these samples are considered independent, For example:

Accordingly, dependent samples always have the same size, while the size of independent samples may differ.

Comparison of samples is made using various statistical criteria:

  • and etc.

Representativeness

The sample may be considered representative or non-representative.

Example of a non-representative sample

  1. A study with experimental and control groups, which are placed in different conditions.
    • Study with experimental and control groups using a pairwise selection strategy
  2. A study using only one group - an experimental group.
  3. A study using a mixed (factorial) design - all groups are placed in different conditions.

Sampling types

Samples are divided into two types:

  • probabilistic
  • non-probabilistic

Probability samples

  1. Simple probability sampling:
    • Simple resampling. The use of such a sample is based on the assumption that each respondent is equally likely to be included in the sample. Based on the list of the general population, cards with respondent numbers are compiled. They are placed in a deck, shuffled and a card is taken out at random, the number is written down, and then returned back. Next, the procedure is repeated as many times as the sample size we need. Disadvantage: repetition of selection units.

The procedure for constructing a simple random sample includes the following steps:

1. it is necessary to obtain a complete list of members of the population and number this list. Such a list, recall, is called a sampling frame;

2. determine the expected sample size, that is, the expected number of respondents;

3. extract as many numbers from the random number table as we need sample units. If there should be 100 people in the sample, 100 random numbers are taken from the table. These random numbers can be generated by a computer program.

4. select from the base list those observations whose numbers correspond to the written random numbers

  • Simple random sampling has obvious advantages. This method is extremely easy to understand. The results of the study can be generalized to the population being studied. Most approaches to statistical inference involve collecting information using a simple random sample. However, the simple random sampling method has at least four significant limitations:

1. It is often difficult to create a sampling frame that would allow simple random sampling.

2. Simple random sampling may result in a large population, or a population distributed over a large geographic area, which significantly increases the time and cost of data collection.

3. The results of simple random sampling are often characterized by low precision and a larger standard error than the results of other probability methods.

4. As a result of using SRS, a non-representative sample may be formed. Although samples obtained by simple random sampling, on average, adequately represent the population, some of them are extremely misrepresentative of the population being studied. This is especially likely when the sample size is small.

  • Simple non-repetitive sampling. The sampling procedure is the same, only the cards with respondent numbers are not returned to the deck.
  1. Systematic probability sampling. It is a simplified version of simple probability sampling. Based on the list of the general population, respondents are selected at a certain interval (K). The value of K is determined randomly. The most reliable result is achieved with a homogeneous population, otherwise the step size and some internal cyclic patterns of the sample may coincide (sampling mixing). Disadvantages: the same as in a simple probability sample.
  2. Serial (cluster) sampling. Selection units are statistical series (family, school, team, etc.). The selected elements are subject to a complete examination. The selection of statistical units can be organized as random or systematic sampling. Disadvantage: Possibility of greater homogeneity than in the general population.
  3. Regional sampling. In the case of a heterogeneous population, before using probability sampling with any selection technique, it is recommended to divide the population into homogeneous parts, such a sample is called district sampling. Zoning groups can include both natural formations (for example, city districts) and any feature that forms the basis of the study. The characteristic on the basis of which the division is carried out is called the characteristic of stratification and zoning.
  4. "Convenience" sample. The “convenience” sampling procedure consists of establishing contacts with “convenient” sampling units - a group of students, a sports team, friends and neighbors. If you want to get information about people's reactions to a new concept, this type of sampling is quite reasonable. Convenience sampling is often used to pretest questionnaires.

Non-probability samples

Selection in such a sample is carried out not according to the principles of randomness, but according to subjective criteria - availability, typicality, equal representation, etc.

  1. Quota sampling - the sample is constructed as a model that reproduces the structure of the general population in the form of quotas (proportions) of the characteristics being studied. The number of sample elements with different combinations of studied characteristics is determined so that it corresponds to their share (proportion) in the general population. So, for example, if our general population consists of 5,000 people, of which 2,000 are women and 3,000 are men, then in the quota sample we will have 20 women and 30 men, or 200 women and 300 men. Quota samples are most often based on demographic criteria: gender, age, region, income, education, and others. Disadvantages: usually such samples are not representative, because it is impossible to take into account several social parameters at once. Pros: readily available material.
  2. Snowball method. The sample is constructed as follows. Each respondent, starting with the first, is asked for contact information of his friends, colleagues, acquaintances who would fit the selection conditions and could take part in the study. Thus, with the exception of the first step, the sample is formed with the participation of the research objects themselves. The method is often used when it is necessary to find and interview hard-to-reach groups of respondents (for example, respondents with a high income, respondents belonging to the same professional group, respondents with any similar hobbies/interests, etc.)
  3. Spontaneous sampling – sampling of the so-called “first person you come across”. Often used in television and radio polls. The size and composition of spontaneous samples is not known in advance, and is determined only by one parameter - the activity of respondents. Disadvantages: it is impossible to establish which population the respondents represent, and as a result, it is impossible to determine representativeness.
  4. Route survey – often used when the unit of study is the family. On the map of the locality in which the survey will be carried out, all streets are numbered. Using a table (generator) of random numbers, large numbers are selected. Each large number is considered as consisting of 3 components: street number (2-3 first numbers), house number, apartment number. For example, the number 14832: 14 is the street number on the map, 8 is the house number, 32 is the apartment number.
  5. Regional sampling with selection of typical objects. If, after zoning, a typical object is selected from each group, i.e. an object that is close to the average in terms of most of the characteristics studied in the study, such a sample is called regionalized with the selection of typical objects.

6.Modal sampling. 7.expert sampling. 8. Heterogeneous sample.

Group Building Strategies

Selection of groups for their participation in psychological experiment carried out through various strategies that are needed to ensure that internal and external validity are maintained to the greatest possible extent.

Randomization

Randomization, or random selection, is used to create simple random samples. The use of such a sample is based on the assumption that each member of the population is equally likely to be included in the sample. For example, to make a random sample of 100 university students, you can put pieces of paper with the names of all university students in a hat, and then take 100 pieces of paper out of it - this will be a random selection (Goodwin J., p. 147).

Pairwise selection

Pairwise selection- a strategy for constructing sampling groups, in which groups of subjects are made up of subjects who are equivalent in terms of secondary parameters that are significant for the experiment. This strategy is effective for experiments using experimental and control groups, with the best option being the involvement of twin pairs (mono- and dizygotic), as it allows you to create...

Stratometric sampling

Stratometric sampling- randomization with the allocation of strata (or clusters). With this method of sampling, the general population is divided into groups (strata) with certain characteristics (gender, age, political preferences, education, income level, etc.), and subjects with the corresponding characteristics are selected.

Approximate Modeling

Approximate Modeling- drawing limited samples and generalizing conclusions about this sample to the wider population. For example, with the participation of 2nd year university students in the study, the data of this study applies to “people aged 17 to 21 years”. The admissibility of such generalizations is extremely limited.

Approximate modeling is the formation of a model that, for a clearly defined class of systems (processes), describes its behavior (or desired phenomena) with acceptable accuracy.

Notes

Literature

Nasledov A. D. Mathematical methods psychological research. - St. Petersburg: Rech, 2004.

  • Ilyasov F.N. Representativeness of survey results in marketing research // Sociological Research. 2011. No. 3. P. 112-116.

see also

  • In some types of studies, the sample is divided into groups:
    • experimental
    • control
  • Cohort

Links

  • The concept of sampling. Main characteristics of the sample. Sampling types

Wikimedia Foundation. 2010.

Synonyms:
  • Shchepkin, Mikhail Semenovich
  • Population

See what “Selection” is in other dictionaries:

    sample- a group of subjects representing a specific population and selected for an experiment or study. The opposite concept is the general totality. A sample is a part of the general population. Dictionary of a practical psychologist. M.: AST,... ... Great psychological encyclopedia

    sample- sample Part of the general population of elements that is covered by observation (often it is called a sample population, and a sample is the method of sampling observation itself). IN mathematical statistics accepted... ... Technical Translator's Guide

    Sample- (sample) 1. A small quantity of a product, selected to represent its entire quantity. See: sale by sample. 2. A small quantity of goods given to potential buyers to give them the opportunity to carry it out... ... Dictionary of business terms

    Sample- part of the general population of elements that is covered by observation (often it is called a sample population, and a sample is the method of sampling observation itself). In mathematical statistics, the principle of random selection is adopted; This… … Economic and mathematical dictionary

    SAMPLE- (sample) A random selection of a subgroup of elements from the main population, the characteristics of which are used to evaluate the entire population as a whole. The sampling method is used when it is too time-consuming or too expensive to survey the entire population... Economic dictionary

The total number of objects of observation (people, households, enterprises, settlements etc.), possessing a certain set of characteristics (gender, age, income, number, turnover, etc.), limited in space and time. Examples of populations

  • All residents of Moscow (10.6 million people according to the 2002 census)
  • Male Muscovites (4.9 million people according to the 2002 census)
  • Legal entities of Russia (2.2 million at the beginning of 2005)
  • Retail outlets selling food products (20 thousand at the beginning of 2008), etc.

Sample (Sample Population)

A portion of a population selected for study in order to draw conclusions about the entire population. In order for the conclusion obtained by studying the sample to be extended to the entire population, the sample must have the property of representativeness.

Representativeness of the sample

The property of a sample to correctly reflect the population. The same sample can be representative and unrepresentative for different populations.
Example:

  • A sample consisting entirely of Muscovites who own a car does not represent the entire population of Moscow.
  • A sample of Russian enterprises with up to 100 employees does not represent all enterprises in Russia.
  • A sample of Muscovites shopping at the market does not represent the purchasing behavior of all Muscovites.

At the same time, these samples (subject to other conditions) can perfectly represent Muscovites who own cars, small and medium-sized Russian enterprises, and buyers who make purchases in markets, respectively.
It is important to understand that sample representativeness and sampling error are different phenomena. Representativeness, unlike error, does not depend in any way on the sample size.
Example:
No matter how much we increase the number of Muscovites who are car owners surveyed, we will not be able to represent all Muscovites with this sample.

Sampling error (confidence interval)

The deviation of the results obtained using sample observation from the true data of the general population.
There are two types of sampling error - statistical and systematic. Statistical error depends on sample size. The larger the sample size, the lower it is.
Example:
For a simple random sample of 400 units, the maximum statistical error (with 95% confidence level) is 5%, for a sample of 600 units - 4%, for a sample of 1100 units - 3% Usually, when they talk about sampling error, they mean statistical error .
Systematic error depends on various factors that constantly influence the study and bias the results of the study in a certain direction.
Example:

  • Using any probability samples will underestimate the proportion of people with high incomes who lead an active lifestyle. This happens due to the fact that it is much more difficult to find such people in any specific place (for example, at home).
  • The problem of respondents refusing to answer questions (the share of “refuseniks” in Moscow, for different surveys, ranges from 50% to 80%)

In some cases, when the true distributions are known, the systematic error can be leveled out by introducing quotas or reweighting the data, but in most real studies it can be quite problematic to even estimate it.

Sample types

Samples are divided into two types:

  • probabilistic
  • non-probabilistic

1. Probability samples
1.1 Random sampling (simple random sampling)
Such a sample assumes the homogeneity of the general population, the same probability of availability of all elements, the presence full list all elements. When selecting elements, as a rule, a table of random numbers is used.
1.2 Mechanical (systematic) sampling
A type of random sample, ordered by some characteristic (alphabetical order, phone number, date of birth, etc.). The first element is selected randomly, then, with step 'n', every 'k'th element is selected. The size of the population, in this case – N=n*k
1.3 Stratified (zoned)
It is used in case of heterogeneity of the population. The general population is divided into groups (strata). In each stratum, selection is carried out randomly or mechanically.
1.4 Serial (cluster or cluster) sampling
In serial sampling, the units of selection are not the objects themselves, but groups (clusters or nests). Groups are selected randomly. Objects within groups are examined in bulk.

2. Non-probability samples
Selection in such a sample is carried out not according to the principles of randomness, but according to subjective criteria - availability, typicality, equal representation, etc.
2.1. Quota sampling
Initially, a number of groups of objects are identified (for example, men aged 20-30 years, 31-45 years and 46-60 years old; persons with income up to 30 thousand rubles, with income from 30 to 60 thousand rubles and with income over 60 thousand rubles ) For each group, the number of objects that must be examined is specified. The number of objects that should fall into each of the groups is most often set either in proportion to the previously known share of the group in the general population, or the same for each group. Within groups, objects are selected randomly. Quota sampling is used quite often.
2.2. Snowball method
The sample is constructed as follows. Each respondent, starting with the first, is asked for contact information of his friends, colleagues, acquaintances who would fit the selection conditions and could take part in the study. Thus, with the exception of the first step, the sample is formed with the participation of the research objects themselves. The method is often used when it is necessary to find and interview hard-to-reach groups of respondents (for example, respondents with a high income, respondents belonging to the same professional group, respondents with any similar hobbies/interests, etc.)
2.3 Spontaneous sampling
The most accessible respondents are surveyed. Typical examples of spontaneous samples are in newspapers/magazines, given to respondents for self-completion, and most online surveys. The size and composition of spontaneous samples is not known in advance, and is determined only by one parameter - the activity of respondents.
2.4 Sample of typical cases
Units of the general population that have an average (typical) value of the characteristic are selected. This raises the problem of selecting a feature and determining its typical value.

Course of lectures on the theory of statistics

More detailed information on sample observations can be obtained by viewing.

Sample - a set of cases (subjects, objects, events, samples), using a certain procedure, selected from the general population to participate in the study.

Sample size

Sample size is the number of cases included in the sample population. For statistical reasons, it is recommended that the number of cases be at least 30-35.

Dependent and independent samples

When comparing two (or more) samples, an important parameter is their dependence. If it is possible to establish a homomorphic pair (that is, when one case from sample X corresponds to one and only one case from sample Y and vice versa) for each case in two samples (and this basis for the relationship is important for the trait being measured in the samples), such samples are called dependent. Examples of dependent samples:

  1. pairs of twins,
  2. two measurements of any trait before and after experimental exposure,
  3. husbands and wives
  4. and so on.

If there is no such relationship between the samples, then these samples are considered independent, for example:

  1. men and women,
  2. psychologists and mathematicians.
  3. Accordingly, dependent samples always have the same size, while the size of independent samples may differ.

Comparison of samples is made using various statistical criteria:

  • Student's t-test
  • Wilcoxon T-test
  • Mann-Whitney U test
  • Sign criterion
  • and etc.

Representativeness

The sample may be considered representative or non-representative.

Example of a non-representative sample

In the United States, one of the most famous historical examples of unrepresentative sampling occurs during the 1936 presidential election. The Literary Digest, which had successfully predicted the events of several previous elections, was wrong in its predictions when it sent out ten million test ballots to its subscribers, people selected from telephone books throughout the country, and people from automobile registration lists. In 25% of returned ballots (almost 2.5 million), the votes were distributed as follows:

57% preferred Republican candidate Alf Landon

40% chose then-Democratic President Franklin Roosevelt

In the actual elections, as is known, Roosevelt won, gaining more than 60% of the votes. The Literary Digest's mistake was this: wanting to increase the representativeness of the sample - since they knew that most of their subscribers considered themselves Republicans - they expanded the sample to include people selected from telephone books and registration lists. However, they did not take into account the realities of their time and in fact recruited even more Republicans: during the Great Depression, it was mainly representatives of the middle and upper classes who could afford to own phones and cars (that is, most Republicans, not Democrats).

Types of plan for constructing groups from samples

There are several main types of group building plans:

  • A study with experimental and control groups, which are placed in different conditions.
  • Study with experimental and control groups using a pairwise selection strategy
  • A study using only one group - an experimental group.
  • A study using a mixed (factorial) design - all groups are placed in different conditions.

Group Building Strategies

The selection of groups for participation in a psychological experiment is carried out using various strategies to ensure the greatest possible respect for internal and external validity.

  • Randomization (random selection)
  • Attracting real groups

Randomization

Randomization, or random selection, is used to create simple random samples. The use of such a sample is based on the assumption that each member of the population is equally likely to be included in the sample. For example, to make a random sample of 100 students, you can put pieces of paper with the names of all university students in a hat, and then take 100 pieces of paper out of it - this will be a random selection (Goodwin J., p. 147).

Pairwise selection

Pairwise selection- a strategy for constructing sampling groups, in which groups of subjects are made up of subjects who are equivalent in terms of secondary parameters that are significant for the experiment. This strategy is effective for experiments using experimental and control groups, with the best option being to involve

In statistics, there are two main research methods - continuous and selective. When conducting a sample study, it is mandatory to comply with the following requirements: representativeness of the sample population and a sufficient number of observation units. When selecting observation units, it is possible Offset errors, i.e. such events, the occurrence of which cannot be accurately predicted. These errors are objective and natural. When determining the degree of accuracy of a sampling study, the amount of error that can occur during the sampling process is estimated - Random representativeness error (M) — It is the actual difference between the average or relative values ​​obtained during a sample study and similar values ​​that would be obtained during a study on the general population.

Assessing the reliability of the research results involves determining:

1. errors of representativeness

2. confidence limits of average (or relative) values ​​in the population

3. reliability of the difference between average (or relative) values ​​(according to the t criterion)

Representativeness error calculation(mm) arithmetic mean value (M):

Where σ is the standard deviation; n—sample size (>30).

Calculation of representativeness error (mР) relative value (Р):

Where P is the corresponding relative value (calculated, for example, in%);

Q =100 - Ρ% - the reciprocal of P; n—sample size (n>30)

In clinical and experimental work, it is quite often necessary to use Small sample When the number of observations is less than or equal to 30. With a small sample to calculate errors of representativeness, both average and relative values , The number of observations decreases by one, i.e.

; .

The magnitude of the representativeness error depends on the sample size: the greater the number of observations, the smaller the error. To assess the reliability of a sample indicator, the following approach is adopted: the indicator (or average value) must be 3 times greater than its error, in which case it is considered reliable.

Knowing the magnitude of the error is not enough to be confident in the results of a sample study, since a specific error in a sample study may be significantly greater (or less) than the average representativeness error. To determine the accuracy with which a researcher wants to obtain a result, statistics uses such a concept as the probability of an error-free forecast, which is a characteristic of the reliability of the results of sample medical and biological statistical research. Typically, when conducting biomedical statistical studies, the probability of an error-free forecast is 95% or 99%. In the most critical cases, when it is necessary to draw particularly important conclusions in theoretical or practical terms, use the probability of an error-free forecast of 99.7%

A certain value corresponds to a certain degree of probability of an error-free forecast Marginal error of random sampling (Δ - delta), which is determined by the formula:

Δ=t * m, where t is a confidence coefficient, which, with a large sample and a 95% probability of an error-free forecast, is equal to 2.6; with a probability of an error-free forecast of 99% - 3.0; with a probability of an error-free forecast of 99.7% - 3.3, and with a small sample it is determined using a special table of Student’s t values.

Using the marginal sampling error (Δ), one can determine Trust boundaries, in which, with a certain probability of an error-free forecast, the actual value of the statistical quantity is contained , Characterizing the entire population (average or relative).

To determine confidence limits, the following formulas are used:

1) for average values:

Where Mgen are the confidence limits average size in the general population;

Msample - average value , Obtained during a study on a sample population; t is a confidence coefficient, the value of which is determined by the degree of probability of an error-free forecast with which the researcher wants to obtain the result; mM is the error of representativeness of the average value.

2) for relative values:

Where Pgen are the confidence limits of the relative value in the population; Rsb is a relative value obtained when conducting a study on a sample population; t—confidence coefficient; mP is the error of representativeness of the relative value.

Confidence limits show the limits within which the sample size can fluctuate depending on random reasons.

With a small number of observations (n<30), для вычисления довери­тельных границ значение коэффициента t находят по специальной таблице Стьюдента. Значения t расположены в таблице на пересечении с избранной вероятностью безошибочного прогноза и строки, Indicating the available number of degrees of freedom (n) , Which is equal to n-1.