March 26, 2020
What is data?
Data are the basis of all statistical analysis, whose simple definition is that it is a collection of measurements, observations or knowledge. The goal of collecting data is to attain information about one or more variables in a particular population. A variable is a characteristic that belongs to some person, thing or idea.
For example, you as a person own many different variables. Physically, you can measure variables such as your height, weight, hair colour or shoe size. Then, there are intangible characteristics, or those that can’t be seen or touched. These variables can include characteristics such as kindness and humour, or abilities such as leadership and organizational skills.
Data can be used for statistics in either of the discipline’s two main branches: descriptive and inferential statistics. Descriptive statistics, which include everything taught in this section, involve describing what the data looks like. This includes finding measurements like mean, standard deviation, mode and more. If you’d like to learn more about these concepts, they will be explained in the following sections.
- List five variables about yourself that are physical.
- Think about one variable regarding yourself that is intangible. Once you have chosen it, think about the ways that you can measure it. Don’t be afraid of being creative - creativity is one of the best tools for data analysts!
Different types of data
Now that you have an idea of what variables are and can be used for, we can delve into the world of data. Data, as we’ve mentioned, takes on many different shapes and sizes. There are some specific qualities about data that you should familiarize yourself with. The first is that data is generally divided into qualitative and quantitative data.
Qualitative data, also called categorical data, deals simply with variables that cannot be counted. Going back to our earlier example, eye colour is the perfect example of a qualitative variable. If you were to collect qualitative data in your classroom, you would write down the colour of each of your classmates’ eyes and it would take the form of a word, or letter - like B for blue, Br for brown, G for green, etc.
Quantitative data, also known as numeric data, has to do with variables that can be counted. In this case, instead of collecting data on eye colour, you could collect data about your classmates’ height. This would take on any unit of measurement, such as centimetres or meters.
Here are some of the main differences between the two types of data and an example of the type of information you can get from each.
|Qualitative Data||Quantitative Data|
|An easy way to remember what qualitative data is that it measures a certain quality in a person, place or thing. A qualitative variable often describes something, such as eye colour.||An easy way to remember what quantitative data is that it measures a quantity, or amount of something, in a person, place or thing. A quantitative variable is often given as numeric information, such as height.|
|Other Names Include:||Other Names Include:|
| || |
We have measured certain variables in a classroom. Each row represents one person, meaning that in the first line, the person with brown eyes also has a height of 160 cm. In the third row, the person with blue eyes also has a recorded height of 170 cm, and so on.
|Qualitative Data||Quantitative Data|
|Example: We have recorded the following eye colours of students in a classroom.||Example: We have recorded the following heights of students in a classroom, measured in cm.|
Using the table above, what can you describe about this given data? Give at least three different descriptions. (*Here’s a hint: how many people are there with blue eyes? Are all brown eyed people in this classroom different heights?*)
You are trying to label the variables in your study by what type they are. Given the following questions from the survey, state what type of variable each question deals with.
|1. How old are you?|
|2. Where do you live? Give the name of your city|
|3. How many siblings do you have?|
|4. What is your height?|
|5. What is your birth date?|
|6. Do you have a pet?|
|7. What grade level are you in?|
Solutions to Practice Problems
Below, you’ll find the solutions to the practice problems above. As you will notice, many of the answers provided aren’t necessarily the only solutions to the problem. Keep in mind while learning statistics that, while the math involved in the discipline tends to be black in white, the interpretation of the results is often more nuanced.
In this problem, you were asked to list five physical variables about yourself, as well as one intangible one and how to measure it. Some examples of tangible and intangible characteristics are recorded in the table below.
|Tangible Characteristics||Intangible Characteristics|
While tangible characteristics can be very easy to measure, intangible characteristics require what are known as proxy measures, which means measuring something related to the variable we want to measure in order to form an appropriate estimate.
Take strength, for example. If we wanted to measure physical strength, there are a myriad of different ways we can do so. You can count how many sit ups someone can do, the amount of weights someone can bench-press, etc.
In this problem, we were asked to make three different interpretations of the data given. This is easier to do if we reorder the data into the table below.
An example of three interpretations can be:
- All brown eyed and blue-eyed people have the same height as others with their eye colour
- There are 3 people with brown eyes, 4 with blue, and only 1 person for each colour of green, hazel and grey
- People with blue eyes are taller than people with other eye colours
While this data is clearly something we wouldn’t find in the real world, it is helpful in teaching you that any interpretation made on any data set only applies to the observations in the sample. While we can use these observations to make predictions about what observations may look like outside the sample, it is only ever an estimation.
Here, you were asked to identify each variable to arrive at the following answers.
|1. How old are you?||Quantitative|
|2. Where do you live? Give the name of your city||Categorical|
|3. How many siblings do you have?||Quantitative |
It can also be categorical if put into groups (group people who have 1 sibling, then those that have 2 and so on or those who have less than 3, those who have more than 3, etc).
|4. What is your height?||Quantitative|
|5. What is your birth date?||Categorical if it is a month, day or year. |
Quantitative if it is an exact date (down to the hour, minutes, or seconds you were born).
|6. Do you have a pet?||Categorical|
|7. What grade level are you in?||Categorical|