February 29, 2020
Chapters
A Guide to Statistics
In previous sections, you learned about the concepts involved in descriptive statistics. Specifically, we showed you the different measures involved in measures of central tendency and variability, as well as how to calculate each. In addition, we walked you through the types of variables involved in statistics as well as the types of analysis and visualizations you could make using data. Here, we’ll help you review everything related to descriptive statistics.
What are Descriptive Statistics?
The field of statistics is generally divided into two types of statistics: descriptive and inferential statistics. Descriptive statistics is, luckily, exactly what it sounds like: it involves analysing data on a descriptive basis. If this sounds confusing, let’s oppose it to inferential statistics in the table below.
Descriptive Statistics  Inferential Statistics 
Makes statements about what is within the data  Makes predictions using of data points outside the data set by using the information within the data 
Conveys information through measures like mean and standard deviation  Conveys information through predictive models 
Visualizations generally include:
 Visualizations generally include:

While this general information is by no means exhaustive, it can be a great starting point for understanding the differences between the two branches of statistics. The goal of descriptive statistics is to either summarize the characteristics of a data set or to analyse a data set by utilizing its descriptive properties.
Population
The units used in descriptive statistics can be anything. People using descriptive statistics can strive to measure things like:
 Rainfall
 Trees in parks
 Tourists at a beach
The analysis that can be done using descriptive statistics alone isn’t just vastly diverse, it is also the majority of what many people use. The units that people strive to measure, however, need to be clearly defined in order to properly understand any data.
In statistics, the elements people want to study are split into a population and a sample. A population is the actual group of elements that you want to study. A population could be anything and take on any form. In the previous examples, the population would take the following form.
Elements  Population 
Rainfall  Total rain produced 
Trees in a park  All the trees in a park 
Tourists at a beach  Total number of tourists at a beach 
While this may seem simple, and it is, populations are notoriously hard to measure. While surveying the total number of trees in a park might be an easy task to accomplish if it involves a local city park, imagine the same task applied to a national forest. Often times, there is not enough financial resources or time to be able to measure an entire population. That is why in statistics you’ll often encounter samples.
Sample
A sample is a part of a population, where the elements and units might be the same. A sample is drawn from a population in order to make the data collection process cheaper and more time efficient. Taking the previous example, let’s take a look at the differences between a population and a sample.
Population  Sample 
Total rain produced  Rainfall produced in an hour in one location of a city 
All the trees in a park  Number of trees in measured in a onekilometre radius 
Total number of tourists at a beach  Number of tourists arriving at the beach at three specific times in a day 
As you can guess, samples tend to include a fraction of the elements that are included in a population. There are many different methods for drawing a sample, which include:
 Simple Random Sampling
 Stratified Sampling
 Cluster Sampling
 Quota Sampling
As you can imagine, each sampling method has their advantages and disadvantages. The sampling method that is desired in most cases is simple random sampling, also known as SRS.
The reason is because it involves a completely random selection of elements from a population, which can decrease variability in the estimation of statistical measures. An SRS can be conducted with or without replacement.
Because the true population measure, or the measure we would have calculated had we measured the entire population, is unknown, measures calculated from samples are always considered as estimates of the population. A measure from a population is called a “parameter” while a measure from a sample is called a “statistic.”
Measures of Central Tendency
Measures of central tendency is a long name for something simple: measuring the centre. The reason why people like to measure the centre point of a data set is because it generally indicates what the most “typical” value of the data looks like.
There are three basic measures of central tendency: the mean, median and mode. Some rules of thumb for remembering when each of them is used are:
 When the data includes extreme values or outliers, the median is better
 When the data doesn’t include outliers and you want to measure the average, use the mean
 When you want to know the value or category with the highest frequency, use the mode
Below are the formulas for each measure.
Sample  Population  
Mean 


Median  Midpoint of ordered data points, the average of the two midpoint values if it’s an even number of values  Calculated the same as the sample 
Mode  The value or category with the highest frequency  Calculated the same as the sample 
Measures of Variability
Unlike measures of central tendency, measures of variability strive to capture how the data are spread around the centre values. The two most basic types of variability measures include variance and standard deviation. Other common measures include:
 Coefficient of Variation
 Covariance
 Standard Error
The spread of a data set is how closely or how far apart the data lie around the centre. While variance is used throughout statistics, standard deviation tends to be preferred when speaking to the spread of a data set because its units are easy to interpret.
Below you’ll find the formulas for standard deviation and variance for populations and samples.
Sample  Population  
Variance 


Standard Deviation 


Notice that the standard deviation is simply the square root of the variance.
Notation of Measures of Central Tendency and Variability
As you may have noticed, the measures for the population and sample have different notations. These parameters are standardized throughout the statistical world. Meaning, you will encounter them everywhere from your textbooks to computer programs. Below, we’ve summarized the notations of the mean, standard deviation and variance.
Sample  Population  
Mean 


Standard Deviation 


Variance 


Types of Variables
There are many variable types, all used in different statistical analysis. The most common variable distinction is made between two variables: qualitative and quantitative variables, also known as categorical and numerical variables.
Qualitative variables are those that involve categories. They are called qualitative because they describe a variable’s characteristics, or qualities. These include variables like:
 Colour
 Shape
 Gender
Quantitative variables, on the other hand, involve variables that measure quantities of something. These include variables like:
 Height
 Age
 Weight
Quantitative and qualitative variables can be further broken down into subgroups. Below you’ll find a summary.
Data  
A collection of observations, measurements or ideas on specific variables  
Quantitative  Qualitative  
Numeric information about a place, person or thing  Descriptive information about a place, person or thing  
Ordinal  Nominal  
Ordered based on a specific scale  Not ordered on a scale 
Data Visualization
Data visualization is an integral part of descriptive statistics and is defined by displaying information visually. The most common visualizations in descriptive statistics include:
 Bar charts
 Pie charts
 Line graphs
 Histograms