Imagine you’re Caesar Augustus, Julius Caesar’s heir. You’re in ancient Rome and yes, you’re wearing one of those leather-clad gladiator skirts. Ruling an empire has been glamorized through 21st century blockbuster films, but you - being Caesar Augustus - know that the quality of life for most people can actually be quite dismal. So, you decide to turn towards an unlikely ally in keeping your citizens happy and healthy: statistical analysis. Read this guide to find out why!
Statistics in a Nutshell
The majority of people today understand the basics of data analysis and statistical methods but aren’t normally held privy to just how impactful statistical inference has been in shaping the world around us. While modern fields like biostatistics and machine learning do a lot in turning statistical data into products and services that make our lives easier, statisticians have been around since, well, before Rome.
Caesar Augustus executed a decree to conduct the first-ever census of Rome, where officials would make use of the categorical and numerical, demographical data to make better decisions on policy, health and commerce. Fast-forwarding centuries later, after the invention of Bayesian statistics, the work of the statistician is to describe data and make inferential decisions based on a sample size.
Expanding from more than simply collecting demographical and registry data, statistics has evolved to provide important indicators on agriculture, the economy and more.
How to Analyse Data Like a Statistician
Now that you understand a bit about the origins of mathematical statistics, it can be worth exploring the way probability and statistics is structured. Whether you need help collecting sample data or simply want to know more about the normal distribution, troubleshoot any question by recalling the two major divisions within the discipline: inferential and descriptive statistics.
The most common forms of statistical analyses take the form of descriptive statistics. Also known as exploratory analysis, descriptive strive to both analyse the content of and display either quantitative data or qualitative data. Every study design includes at least a preliminary exploratory analysis using descriptive statistics before constructing a confidence interval or running a linear regression.
The measures included in this branch include measures of central tendency, which include aspects like the sample mean, median and mode. Alongside these indicators are measures of spread, such as the variances, covariance, and standard deviation of raw data.
The other branch of statistics uses probability theory and the notion of a probability distribution in order to test a hull hypothesis against an alternative hypothesis through parametric and non-parametric models, including general linear or regression models. Using assumptions such as the Gauss-Markov assumptions for classical linear regression, you can conduct a multivariate analysis to draw estimators for both an independent variable and dependent variables.
All this to say inferential statistics is, in a nutshell, fitting a model to a set of data in order to make predictions for values outside that data set.
Advice for Learning Statistics
Whether you’re learning about a binomial distribution, how to correctly interpret effect size or need help creating an awesome data visualization, there are plenty of online resources for every skill level of statistician. Now that you’re familiar with the basics of statistics and have untangled some of the many different paths you can take when analysing data, you’ll need some tools to help you accomplish things like perfecting your experimental design and statistical methodology or understand how to run a regression analysis using statistical software.
Whether you’re a seasoned mathematician or are curious to learn more about the world of data scientists, here are some of the best resources for understanding the world of data and statistical theory by theme.
Statistical Models and Programming
Because the field of statistics is so broad, you will typically find software or issue-specific websites when trying to trouble-shoot any conceptual or technical statistics issues you might be having. For this reason, finding a website that houses content covering broad swaths of information, from constructing confidence intervals to machine learning, can be especially helpful if you’re looking for efficiency. Here are some recommendations!
Eurostat’s Statistics Explained
Okay, so starting with something related to a European Union database is quite tricky when dealing with the UK, seeing as it’s not only the UK’s data that’s being held in a limbo state with regards to official statistics but also the jobs of UK-related official statisticians. Acknowledging that this issue deserves full-length explanations in its own right, and does have many, we can move on and examine Eurostat’s Statistics Explained page.
Think of it as the Wikipedia of official statistics, where Eurostat not only provides insight into how to calculate various indicators such as consumer prices but also gives examples using the EU’s data. The topics you can discover and gather data from range from sustainability development goals to sports and tourism. Whether you’re looking for categorical data or numerical data for your next research project or want to find graphics to use in a newspaper article, you’ll find everything you need here.
Towards Data Science
If you’re looking for less euro-centric data, head over to TDS to find more technical explanations on subjects like statistical significance, analysis of variance (ANOVA) and more. The website is organized into six different subjects related to inferential statistics covering data science, machine learning, programming, AI, visualization and journalism.
This resource is perfect for both students and professionals who either want to learn more about specific topics or are looking for examples on how to execute specific tasks. For example, students might be more interested in explanations on chi-square tables or how to correctly differentiate between outliers and influentials within a data set. Professionals, on the other hand, might be more interested in learning about how to improve their data visualizations by using different Python libraries such as Pandas or Matplotlib.
Whether you’re stuck on a specific bit of code related to running a regular, least squares regression model or are having trouble with excel commands, this is the best place to go for answers by real people. Designed as a public forum for developers, you’ll be able to search through over 16 million questions related to coding issues in a range of different software.
Similar to Stackoverflow, Stack Exchange is a forum where anybody can answer or ask a question in various different topics. The difference, however, is that Stack exchange has a website specific to statistics called Cross Validated. Here, you’ll be able to unpack more mathematical and conceptual questions related to statistical data analysis and statistical techniques. From how to analyse ordinal data to how to correctly interpret a correlation coefficient, someone’s most likely had your question before - and answered it.
Issue Specific Resources
Whether you want to run a parametric model to find an estimator or want to learn how to wield software to run tests on observational data, you’ll find plenty of resources that explain a specific issue thoroughly. This can mean that you’re either looking for a website dedicated to helping you build knowledge of a specific software, such as Tableau, or one that enables you to understand a specific topic in more depths, such as randomization in clinical trials. Here are some of the most popular sites to explore.
If you’re looking for data visualization help, chances are you’re probably either looking for a comparison between the different types of visualization tools out there or you need help using a specific software. In the latter case, you should check out StackOverflow or StackExchange, where you’ll be able to search for solutions to your visualization question for languages like R, Python, C and more. If you’re looking for different comparisons between software based on your skill level, ranging from non-techie to advanced programmer, here are some data visualization tools you should check out:
- Datawrapper - for people looking to make a wide range of visualizations without needing to know how to program
- Tableau Public - tableau is for those with a little more experience looking to make highly customisable graphs, charts, maps and more. While some of tableau’s features are price-locked, this version should be enough for non-commercial uses. Students can download a more robust version for free for one year!
Whether you’re looking for help in big data and analytics, computer science, engineering and more, this online tutorial site is a great resource to check out. Not only will you be able to get help regarding languages, but also in topics such as AI or agile software development.
If you’re better at one-one-one tutoring, check out Superprof’s community of over 140,000 maths tutors for everything related to random variables, inferences and more!