March 26, 2020
The term raw scores may sound a bit odd but are actually quite simple to understand. A raw score is defined as an unadulterated, or unmodified, datapoint. This usually takes on the form of quantitative variables, such as test scores, heights, car speed, etc. It is called a raw score because, when compared to all the other data points in a data set, it is not transformed.
Compare that to data points that are transformed in order to be comparable with each other. This transformation can be standardizing data in order to compare them on a standard normal distribution. Another type of transformation is to put data points into percentiles.
Let’s say that we have measured productivity as a quantitative variable. This quantitative variable is a score out of 100, in which the mean productivity is scored at 68 and you have a productivity score of 75.
If we wanted to compare the whole data set, we would be limited in our interpretation because we only have information about the mean. If we wanted to know whether or not we scored in the best 10% of people whose productivity was measured, we wouldn’t be able to do so based solely on knowing the mean.
One way we can compare data points within a data set is to split them up into percentiles. You can think of percentiles as a slice of % of the data. Each percentile contains a specific percentage of the data and each percentile rank is cumulative.
The way you split the data is completely arbitrary. However, there are a couple of common ways people split the data, which is summarized in the table below.
|Quartiles||Splitting the data into fourths, where each quartile contains 25% of the data|
|Deciles||Splitting the data into tenths, where each decile contains 10% of the data|
As you can see from the table below, deciles are a type of percentile where the data is split into tenths. Each decile contains 10% of the data, where each decile and corresponding decile is written in the table below.
You can think of a percentile as splitting a data set into individual, 1% slices. At the 10th percentile, and the 1st decile, there are 10 1% slices, giving us 10% of the data. If we split the data in groups of 15%, the 45th percentile would be the point at which 45% of the data lie below.
Take the table below as an example.
Splitting the data into deciles in this example is easy because there are only 10 data points. Meaning, the data is, in a way, already split into deciles for us. The first data point signifies the 1st decile, which is the 10th percentile. The 6th observation, likewise, signifies the 2nd decile, or the 60th percentile.
At this point you may be wondering what the point in splitting data in this manner is. After all, it’s pretty clear in the previous example that the 9th observation represents 90% of the data. Well, in real life, data rarely has only 10 observations - in fact, data sets typically range in the thousands. And that’s not even including big data, which can include billions and trillions of observations.
In these cases, splitting the data into deciles can be extremely helpful in telling us information about different groups of the population. While measures of central tendency are great at condensing information - they do nothing to tell us about what the rest of the data look like.
Deciles, on the other hand, are measures that give us a summary of what each segment of the data look like. Take the table below as an example
|5th Decile & Median||85|
|10th Decile & Maximum||150|
While the mean gives us information about the centre of the data, or what the average value is, we can see the approximate values of the 10 segments of the data. In general, finding percentiles is given by the formula below.
Where n is the sample size and P is the percentile you want to find. For deciles, the formula is simply,
These formulas will give you the position of the percentile, not the percentile itself. Meaning, if R was equal to 5, you would simply locate the 5th value in an ordered data set. The R is usually called the percentile rank. Deciles have a variety of real-world applications, used in everything from analysing drought data to creating rankings.
You are interested in attaining information on the salaries of the three lowest paid portions of a company. Given the following table, use the formula to find the percentile rank of deciles 1, 2, and 3.
Solution Example 1
Looking at the table below, you can find the answers for each decile.