A Guide to Statistics

In previous sections, you learned about the concepts involved in descriptive statistics. Specifically, we showed you the different measures involved in measures of central tendency and variability, as well as how to calculate each. In addition, we walked you through the types of variables involved in statistics as well as the types of analysis and visualizations you could make using data. Here, we’ll help you review everything related to descriptive statistics.

 

What are Descriptive Statistics?

The field of statistics is generally divided into two types of statistics: descriptive and inferential statistics. Descriptive statistics is, luckily, exactly what it sounds like: it involves analysing data on a descriptive basis. If this sounds confusing, let’s oppose it to inferential statistics in the table below.

 

Descriptive StatisticsInferential Statistics
Makes statements about what is within the dataMakes predictions using of data points outside the data set by using the information within the data
Conveys information through measures like mean and standard deviationConveys information through predictive models
Visualizations generally include:

  • Bar charts
  • Pie charts
  • Histograms
  • Line graphs
Visualizations generally include:

  • Line graphs
  • Scatterplots

 

While this general information is by no means exhaustive, it can be a great starting point for understanding the differences between the two branches of statistics. The goal of descriptive statistics is to either summarize the characteristics of a data set or to analyse a data set by utilizing its descriptive properties.

 

Superprof

Population

The units used in descriptive statistics can be anything. People using descriptive statistics can strive to measure things like:

  • Rainfall
  • Trees in parks
  • Tourists at a beach

The analysis that can be done using descriptive statistics alone isn’t just vastly diverse, it is also the majority of what many people use. The units that people strive to measure, however, need to be clearly defined in order to properly understand any data.

In statistics, the elements people want to study are split into a population and a sample. A population is the actual group of elements that you want to study. A population could be anything and take on any form. In the previous examples, the population would take the following form.

 

ElementsPopulation
RainfallTotal rain produced
Trees in a parkAll the trees in a park
Tourists at a beachTotal number of tourists at a beach

 

While this may seem simple, and it is, populations are notoriously hard to measure. While surveying the total number of trees in a park might be an easy task to accomplish if it involves a local city park, imagine the same task applied to a national forest. Often times, there is not enough financial resources or time to be able to measure an entire population. That is why in statistics you’ll often encounter samples.

 

Sample

A sample is a part of a population, where the elements and units might be the same. A sample is drawn from a population in order to make the data collection process cheaper and more time efficient. Taking the previous example, let’s take a look at the differences between a population and a sample.

 

PopulationSample
Total rain producedRainfall produced in an hour in one location of a city
All the trees in a parkNumber of trees in measured in a one-kilometre radius
Total number of tourists at a beachNumber of tourists arriving at the beach at three specific times in a day

 

As you can guess, samples tend to include a fraction of the elements that are included in a population. There are many different methods for drawing a sample, which include:

  • Simple Random Sampling
  • Stratified Sampling
  • Cluster Sampling
  • Quota Sampling

As you can imagine, each sampling method has their advantages and disadvantages. The sampling method that is desired in most cases is simple random sampling, also known as SRS.

The reason is because it involves a completely random selection of elements from a population, which can decrease variability in the estimation of statistical measures. An SRS can be conducted with or without replacement.

Because the true population measure, or the measure we would have calculated had we measured the entire population, is unknown, measures calculated from samples are always considered as estimates of the population. A measure from a population is called a “parameter” while a measure from a sample is called a “statistic.”

 

Measures of Central Tendency

Measures of central tendency is a long name for something simple: measuring the centre. The reason why people like to measure the centre point of a data set is because it generally indicates what the most “typical” value of the data looks like.

There are three basic measures of central tendency: the mean, median and mode. Some rules of thumb for remembering when each of them is used are:

  • When the data includes extreme values or outliers, the median is better
  • When the data doesn’t include outliers and you want to measure the average, use the mean
  • When you want to know the value or category with the highest frequency, use the mode

Below are the formulas for each measure.

SamplePopulation
Mean

    \[ \bar{x} = \frac{\Sigma x_{i}}{n} \]

    \[ \mu = \frac{\Sigma x_{i}}{N} \]

MedianMidpoint of ordered data points, the average of the two midpoint values if it’s an even number of valuesCalculated the same as the sample
ModeThe value or category with the highest frequencyCalculated the same as the sample

 

Measures of Variability

Unlike measures of central tendency, measures of variability strive to capture how the data are spread around the centre values. The two most basic types of variability measures include variance and standard deviation. Other common measures include:

  • Coefficient of Variation
  • Covariance
  • Standard Error

The spread of a data set is how closely or how far apart the data lie around the centre. While variance is used throughout statistics, standard deviation tends to be preferred when speaking to the spread of a data set because its units are easy to interpret.

Below you’ll find the formulas for standard deviation and variance for populations and samples.

SamplePopulation
Variance

    \[ s^2 =  \frac{\Sigma(x_{i}-\bar{x})^2}{n-1} \]

    \[ \sigma^2 =  \frac{\Sigma(x_{i}-\mu)^2}{n} \]

Standard Deviation

    \[ s =  \sqrt{ \frac{\Sigma(x_{i}-\bar{x})^2}{n-1} } \]

    \[ \sigma = \sqrt{  \frac{\Sigma(x_{i}-\mu)^2}{n} } \]

Notice that the standard deviation is simply the square root of the variance.

 

Notation of Measures of Central Tendency and Variability

As you may have noticed, the measures for the population and sample have different notations. These parameters are standardized throughout the statistical world. Meaning, you will encounter them everywhere from your textbooks to computer programs. Below, we’ve summarized the notations of the mean, standard deviation and variance.

SamplePopulation
Mean

    \[ \bar{x} \]

    \[ \mu \]

Standard Deviation

    \[ s \]

    \[ \sigma \]

Variance

    \[ s^2 \]

    \[ \sigma^2 \]

 

Types of Variables

There are many variable types, all used in different statistical analysis. The most common variable distinction is made between two variables: qualitative and quantitative variables, also known as categorical and numerical variables.

Qualitative variables are those that involve categories. They are called qualitative because they describe a variable’s characteristics, or qualities. These include variables like:

  • Colour
  • Shape
  • Gender

Quantitative variables, on the other hand, involve variables that measure quantities of something. These include variables like:

  • Height
  • Age
  • Weight

Quantitative and qualitative variables can be further broken down into sub-groups. Below you’ll find a summary.

Data

A collection of observations, measurements or ideas on specific variables

Quantitative

Qualitative

Numeric information about a place, person or thing

Descriptive information about a place, person or thing

Ordinal

Nominal

Ordered based on a specific scale

Not ordered on a scale

 

Data Visualization

Data visualization is an integral part of descriptive statistics and is defined by displaying information visually. The most common visualizations in descriptive statistics include:

  • Bar charts
  • Pie charts
  • Line graphs
  • Histograms

Did you like the article?

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 5.00 out of 5)
Loading...

Danica

Located in Prague and studying to become a Statistician, I enjoy reading, writing, and exploring new places.

Did you like
this resource?

Bravo!

Download it in pdf format by simply entering your e-mail!

{{ downloadEmailSaved }}

Your email is not valid

Leave a Reply

avatar
  Subscribe  
Notify of