title: Advanced Statistics
subtitle: DLMDSAS01
authors: Prof. Dr. Unknown
publisher: IU International University of Applied Sciences
date: 2023
Our learning objectives are as follows:
p. 44
Data analysis starts with collection of raw data. Assuming no issues with data quality, we can visualize the data next. However, raw data points cannot tell us much alone. Summarizations of the dataset can help describe the main characteristics; these are called descriptive statistics.
We define the arithmetic mean:
This is helpful but not a lot of information. Other useful metrics would include those that quantify the dispersion or how wide the data are spread out. And to quantify the distribution, if the data is symmetric or not.
Note, if you reduce the description of data to a few key metrics, you lose a lot of details that may be critical in understanding the data. Don’t only relate on metrics.
Descriptive statistics used to describe samples are aptly called Sample Statistics. Basically, a population typically cannot be measured directly. We want to infer the population metrics from the sample(s).
It is typically seen that large samples begin to re ect the behaviour of the population. Statisticians then try to infer or model the underlying probability distribution.
Given a vector of data points , the arithmetic mean is given by:
We call it the arithmetic mean because it is a sum of all data points. This is the most common, but there are other variations. The geometric mean is defined as:
The geometric mean is often suited to describe growth or growth rates. We can also define the harmonic mean as follows:
This is like an upside-down arithmetic mean and used when we describe rates or ratios. Finally, there is the root mean square:
Yes, it appears the divisor is also in the squareroot as well. This is used in electrical engineering or to compare a model prediction to observed values.
These are all forms of expected value. It’s like if the probability of each event is the same. Now consider if the probability of events in a discrete random variable followed probability mass function :
This is like a weighted average. We multiply each with its associated probability.
Extend that concept to continuous random variable with density function :
Again, this is like a sum of the product of point and probability, just a near infinite amount of points.
We will shift quickly into transformed random variables. Suppose we transform values of with a given function .
A strict notation should be because it is the composition of mapping from the sample space to another.
We will now provide another rule you can probably discover yourself:
The book goes into an example with the exponential distribution:
where and for . Also, . Finding the expected value may seem difficult but use integration by parts.
p. 49
The median is literally the middle number of an ordered sample. You may be given a more formal definition:
For discrete distributions, the median is defined as:
It is the half way point of the distribution. In a continuous respect:
Generally speaking, the mean and median are the same for symmetric distributions. They begin to deviate for asymmetric probability distributions though. We say that the median is more robust compared to the mean. This means that the median is less sensitive to outliers or behaviour in the tails of the distributions.
We can illustrate with an example. Suppose we have the following set of numbers: [1, 2, 3, 4, 5]
. Both the mean and the median are 3. However, let’s append an outlier to our set: [1, 2, 3, 4, 5, 42]
. The mean is not… 9.5, but the median is 3.5.
These are calculations with just samples of numbers and not really including probability distributions.
p. 52
The quantile is a point where of points are below this point, and are above. The median is a quantile, it is . The formulaic definition is like:
What are quantiles used for?
They can be used to describe the overall shape of a distribution.
The cumulative distribution function (CDF) for a probability distribution function is defined as:
It is the probability that a value being less than or equal to . We can then say that the median is given by:
is the inverse of the cumulative distribution. It may not always be defined but it can be proven that it exists when the density function is continuous and its domain of definition is an interval.
The mode is the most likely value to occur, or just the one that has the most occurrences. It specifies the highest point of a probability distribution.
Mathematically, you have:
The mode is not considered a stable location parameter. Even small details in the distribution will shift the mode noticeably. If multiple values occur the same number of times, and are the most occurring values, there are multiple modes.
The mean, median, and mode only provide a bit of information. What about how “wide” the distribution is, or how symmetric a distribution is?
The variance is a dispersion parameter and measures how much the values fluctuate around the mean.
Sample Variance | Statistics How To should be defined as:
I think the book accidently calls it sample variance but means population variance, or just variance:
Why is the sample variance denominator slightly smaller? It has to do with biased and unbiased estimates of the population statistic. This is called Bessel’s Correction | Wiki. In a sample, we technically do not know the population mean, so all of our calculations are done with the sample mean. You can think of it in terms of degrees of freedom in the residuals vector (not errors). Because the population mean is unknown, the sum of residuals will sum to zero. All residuals are free to be what they want to be, except one, which brings the sum to zero.
The standard deviation | wiki is a bit different. Doing reintroduces the bias. And there’s no one size fits all solution. So we typically go about our day with a biased estimator. However, for large distributions, it’s not a huge bias.
With some effort, you can prove:
Now we dive into variance of probability distributions:
which is nearly the same for discrete:
For moments like this…
A good introduction into moments, and how I learned back in the day, is Moment-generating function | Wiki. We transform our value through some function .
The book takes , such that
The central moment is then defined as:
which really just means .
An important application of the moments is that a probability distribution is defined by all its moments. So if we know all of the moments of a distribution, we can recreate the distribution.
Moments are an interesting topic and probably used more extensively at higher levels. But even in actuarial science, I merely learned about them but never actually worked with them.
p. 60
Skewness is a measure of how symmetric a distribution is. If the skewness is negative, it has a tail to the left. If positive, a tail to the right. And if 0, it is perfectly symmetric. Mathematically expressed as:
I like how we bring in moments… like this… into our expression.
Interesting notes, if the skewness is negative (left tail) then we probably have . The reverse is also true. Images in book give example. The tail really affects the mean, and kind of affects the median.
Skewness | Wiki is a great article that also tackles moments in skewness.
The book only lists the later of the next two equations, but for a sample of values, two natural estimators of population skewness are:
or
The latter is a Method of Moments | Wiki estimator, which is an estimator of population parameters with moments. It is an easy(ish) way to derive simple and consistent estimators. However, these estimators are often biased.
So, if skewness tells you which side the tail is on, then Kurtosis will tell you how pronounced the tails of a distribution are. Kurtosis | Wiki is a measure of the “tailedness” of a probability distribution. Different measures of this value have different interpretations. The standard measure, from Karl Pearson, is the scaled version of the fourth moment of the distribution.
That is from Wiki, but the book give a similar definition. Kurtosis of the standard normal distribution is .
Be careful using kappa to mean kurtosis. The Cumulant | Wiki is like an alternative to a moment of a distribution, and is known to use kappa. The course book does not cover cumulants though.
The book provides this discrete calculation here:
A high value of kurtosis indicates the existence of outliers in the sample. Excess Kurtosis is calculated by subtracting .
The general formula as you may have noticed is:
higher order quantities beyond kurtosis are rarely used in practice.
These metrics are helpful at gaining insights into behaviour of a sample or distribution. However, they neglect many details so do not rely to heavily on them. The course book then goes into why we should not rely to heavily on these metrics.
The examples are interesting in that the metrics appear similar, if no the same, yet the regression line really only fits to one set of points.
Descriptive statistics are a valuable tool in any statisticians toolbox. However, don’t forget to try and visualize your data as well.
Consider the RV . What is the sample mean, median, mode, and variance?
The sample mean is . The quiz calls it variable expectation which I cannot find in the course book for whatever reason. This is an expected value probably for the population. We only have a sample so it’s not correct.
The sample median is 8.
Sample mode is 11.
The sample variance would require .
Suppose now we have random variables and , which are independent and identically distributed RVs. Apparently .
The quiz would go on to claim the following assertions to be true then:
The other questions involve a percentile (just remember that median is 50th percentile), Something weird about standard deviation I’m not sure I believe, and skewness.
If has skewness of and has skewness of , how can we compare them?
They are not necessarily inverse of each other nor symmetric. However, because has a negative skew, it is left tailed, meaning it will tend to have higher values compared to , which has a right tail. This is assuming of course perhaps they are bound to the same interval of numbers or something.