methods used that summarize or describe characteristics of data are called _______ statistics. This is a topic that many people are looking for. s-star.org is a channel providing useful information about learning, life, digital marketing and online courses …. it will help you have an overview and solid multi-faceted knowledge . Today, s-star.org would like to introduce to you Statistics 101: Describing a Categorical Variable. Following along are instructions in the video below:
“Hello. And nwelcome to the next video in my series on basic statistics. If you you are new to the channel. Welcome.
If you are a returning viewer ngreat to you back now when you get to the end of the video. If you like it please give it a thumbs up leave a comment or share nit with other people who you think might nbenefit from watching it so let s go ahead and get started. So. This video is the first in a series about basic descriptive.
Statistics. And the good thing is the content is not all that difficult. But it will give you a firm foundation upon which to build as nyou go further into stats and study more complex topics because understanding nyour data is paramount to knowing what kind of questions. You can ask about your data.
What kind of statistical tests. Nyou can run on that data and of course. How to ninterpret your findings. So this first video is nabout summarizing data for a categorical variable.
Now this video is brought to you by the great courses. Plus. If you are watching this video. You re here to learn something and the great thing is nthat the great courses.
Plus is all about helping you learn things and many varied things at that so there s more information nin. The description below and at the end of the nvideo. I ll talk a bit more about how the great courses. Plus can help you with your learning.
So first let s talk about what categorical data actually is so categorical data uses labels names or other descriptors to nidentify exclusive categories. And that s important exclusive categories or types of things so here are some examples let s say region so maybe a region of a ncountry state or territory. So common ones are north nsouth east or west..
Now you could have others like northeast southwest or so forth. But let s say you have na office or a factory and you wanna label. Where it s at now in this case. It cannot nbe in both north and east.
The way these categories are set up you d have to create a different category called northeast. So that s what we mean by exclusive how about a machine in a factory. So we could have machine one nmachine two machine three or a car make so a ford. A toyota a lamborghini or my nfavorite a koenigsegg now.
I m sure you ve also nheard about quantitative data. So quantitative data in ncomparison to categorical data are numerical values nthat represent frequency measurement or something else like that so let s look at our regions up above so we have the region of nnorth. South east or west. And we could have a nquantitative data point for sales.
For those regions so. Maybe 12 million for. North 35 million for south. 104 million for east and 88.
Million for west for example. So the three machines above ncould have production units 983. 1085 and 899. Now.
It s important to point out that again those quantitative values go with exclusive categories above or we could have the fastest ncar in miles per hour. So the fastest ford. Which is the ford gt will go 216 miles. An hour the fastest toyota ever made which i believe is the supra twin turbo.
I m a car person if nyou can t tell is 156 the lamborghini aventador nwill go 217 miles an hour and the koenigsegg agera r. Or rs just set a record for 286 miles per hour here in the us. So you can see that we nhave categories above that are exclusive that are what we re talking about in this video and then later we ll talk nmore about quantitative data..
Which are numerical values. So just fundamental to nunderstand the difference between the two types of ndata as we move forward so here are some fake data about smartphone users nin the united states so let s say we did a fictitious study of 100 smartphone users in the us and we asked them who made nyour primary smartphone. Because sometimes people nhave more than one let s just say your primary smartphone. And this is what we have so we have a mix of apple samsung lg.
Other kinda captures neverything else and motorola and this is actually napproximately the distribution of cell phone makes here in nthe us. So here is our data now the question for this video is how do we make sense of this this is just a bunch of nrectangles with brands in them or makes of phones in them let s make some sense of it so the first thing we could ndo and it s very simple to do is create what s called na frequency distribution another way of saying that nis we just count them so we go back to our data nwe had in the previous slide. And we just count how many npeople had apple phones there were 45 htc were two lg ten motorola four samsung 28 and other mix of brands 11 as we said we had 100 nobservations on the previous slide. So when we add up our frequencies.
It better add up to 100 or nwe made some type of error. So you have to check your nfrequencies and your totals to make sure they add up to nhow. Many observations you had but again this is very simple count it up and write the frequency over in the right hand column now what we can do to visualize. This is to make a frequency bar chart again not all that complicated so here along the bottom nwhich is our x axis.
We have the brand of phone apple. Htc etc. Over here on the left on the y axis. We have the frequency and then all we do is ncreate a simple bar chart.
So we can visualize the ndistribution of phones among these brands now i do want to caution. You here that you ve probably heard of something else called a histogram and it looks very similar to a bar chart. Now there are some differences. A histogram is for quantitative data.
The charts look different on a histogram. There is no space between the bars. So all we re doing here nis. A basic bar chart where we re counting up the frequency of each phone with space between the bars.
And you can do that in excel. Nor. Any other stats program..
Now we could also talk nabout. Another measure called the relative frequency. And again. It s not all that difficult so.
The relative frequency of a class. Which is a part of the category is the frequency of that occurring over all of our observations. So for example in this case. The relative frequency of samsung is 28 observations out of 100 so there were 28 people in our data that had a samsung phone nout of 100 people total and therefore the relative nfrequency is 028.
Now we can create a frequency distribution. Which is very similar to nthe other chart we created so here we have our smartphone brand our frequency. What we ndid a couple slides ago and then the relative nfrequency for each one. Now.
We ll say that i made this very easy by having 100 observations. But that s not always gonna be the case actually it ll rarely be the case. But still it s the frequency divided by the total number of observations and that will give you nyour relative frequency so in this case apple. We had 45 people have an apple phone out of 100.
45 divided by 100 is 045. And you can see how that ngoes down the chart. Very straightforward now we can do the same thing nfor a frequency bar chart relative frequency bar chart here it looks exactly the same except the only difference here is that on the left hand nside in the y axis. We have relative frequency instead of the actual frequency here that counts.
But the basic idea is the same very straightforward very simple. But then you have other data. Where the observations aren t 100. This is gonna be a bit different.
But you can see how it works pie charts. Let s talk about npie charts for a second. This is what it would look like if we did a pie chart for nour data..
In this example. If you notice. It s very difficult to read and pie charts should not be used the only time you should nreally use a pie chart is if you have only two categories. So it s very hard to nvisualize proportionally in a pie chart.
When you have na lot of different categories. So don t use pie charts and definitely don t use any 3d charts unless. It s absolutely necessary so when you re in excel or something else and you see all those cool 3d charts don t use them and definitely ndon t use a 3d pie chart that is my soap box on pie charts this video is brought to you nby the great. Courses.
Plus where you can get unlimited access to over 8000. Different video lectures taught by award winning nprofessors from the ivy league. And other top schools around the world. You can learn about nanything that interests you science literature and yes.
Statistics. Like this lecture from nprofessor. Michael starbird. Called data.
And distributions ngetting. The picture from his course. Meaning from ndata statistics made clear and right now the great. Courses plus is offering my viewers a free trial so go to nthegreatcoursespluscom brandonfoltz to have access to the n8000 video lecture library or click the link in nthe description below okay so that wraps up our first video on descriptive statistics.
Where we talked about how to summarize data nfor. A categorical variable again very simple very straightforward. But it s a building block for what we re gonna do going forward. So thank you very much for watching.
Thank you for watching all the articles on the topic Statistics 101: Describing a Categorical Variable. All shares of s-star.org are very good. We hope you are satisfied with the article. For any questions, please leave a comment below. Hopefully you guys support our website even more.