Friday, 23 June 2017

Statistical Functions in Excel I


In the first post of Statistical Functions in Excel, we will cover a part of the functions that can be used for statistical analysis available in Excel. While there are several functions with a variety of functionality, we will focus on Statistical functions only in this post. It is assumed that the reader will know basics of Statistics and Like before, we will explore these functions in Excel 2016.


Let's take a quick look at some of these functions that are available OOTB on the Status Bar.












If we enter 1 to 4 in cells as shown above and select them, then, we see the Average, Count and Sum shown on the Status Bar on the right bottom of the window. We can customize the measures shown in the Status Bar by right clicking anywhere on the Status Bar and, then, clicking Numerical Count, Minimum and Maximum from the Menu Items shown. This is shown below:




















If we wish to add any more functions, then, we can add the functions in the Status Bar using VBA. But, that is a story for another day. Continuing with the above Menu Items figure, we see above that Average, Count, and Sum are checked. So, we see them in the Status Bar once we select cells containing numbers. Once we click Numerical Count, Minimum and Maximum, we see a tick against these measures as shown below:




















To confirm that these measures appear on the Status Bar, we can select cells as shown below and see that the corresponding measures are shown:












Count gives the count of all cells selected while Numerical Count gives the count of all cells containing numbers only. Min and Max give the minimum and maximum of the selected numbers. Now, that we have seen the OOTB functionality, let us quickly see the Statistical functions in alphabetical order:

AVEDEV: Calculates the average of the absolute deviations of data points from their mean. Mathematically, it can be written as shown below where x is a It can take numbers in the form of arguments, but not exceeding 255 in count, like, AVEDEV(1,2,3,4) that will return 1. If more than 255 arguments are passed, then, we get below error:









A work around is to use enter these values in cells and use cell range like, AVEDEV(A1:A2000). This is applicable to many of the statistical functions that we shall see shortly.

AVERAGE: Calculates the average. We can select cells and, then, insert numbers as shown below:



The result of the function is 2.5 and that compares well with the value in the Status bar as shown below:














AVERAGEA: Calculates the average like the AVERAGE function, but with a twist. While AVERAGE ignores any text, AVERAGEA counts any text as a member contributing zero to the sum. This is clearly seen below where for the selection below, AVERAGE returns 2.5 and AVERAGEA returns 2.222222222.









For the selection below, AVERAGE function returns the same value 2.5 whereas AVERAGEA returns a different value because boolean values are treated as numbers by AVERAGEA. TRUE is treated as 1 and FALSE as 0.








AVERAGEIF: Calculates the average of contents in a range of cells based on some criteria. Example is shown below:













Only those value that qualify the condition that value is not equal to 2 or 3.

AVERAGEIFS: Similar to above function. But, we can give multiple criteria. Example is shown below:











CORREL: Calculates the correlation of two arrays containing the same number of elements. Example is show below:













COUNT: Calculates the number of cells having numbers only. See example below:














COUNTA: Calculates the number of cells having any content. See example below:














The reason COUNTA returns 5 is because cell E4 has a blank space.

COUNTBLANK: Calculates the number of blanks in the cells. See example below:















Following from the last example, result is 1 because cell E4 has a blank space.

COUNTIF: Calculates the number of cells having a content based on a condition like we saw in the case of AVERAGEIF. See example below:














The number of cells containing FOO is counted here.

COUNTIFS: Similar to COUNTIF, but based on more than one condition like AVERAGEIFS.

COVARIANCE.P: Calculates the covariance of a population. See example below:











COVARIANCE.P: Calculates the covariance of a sample. See example below:











DEVSQ: Calculates the sum of the squares of the deviations from the average  See example below:













FORECAST, FORECAST.ETS, FORECAST.ETS.CONFINT, FORECAST.ETS.SEASONALITY, FORECAST.ETS.STAT, and FORECAST.LINEAR are functions that can be used for prediction analysis where future data can be predicted based on historical values.

FREQUENCY: is a wonderful function that can be used to classify raw data into bins. As an example, see below numbers where numbers 1 to 9 and they are classified to 4 bins:

less than 3;
between 3 and 6;
between 6 and 9;
greater than 9;



















Enter the formula as shown below:


















On pressing Enter, the function returns 3 that is in the first bin (less than 3). To get the number of elements in the other bins, after selecting the cell containing the function, drag the selection down such that the selection equals the number of bin. Here, since there are 4 bins, we select 3 cells below the cell containing the function as shown below:


















Once this is done, then, click on F2 and, then, CTRL+Shift+Enter. This will populate the selected cells with the number of elements as shown below:


















GEOMEAN: Calculates the geometric mean of an array of numbers. See example below:











GROWTH: Predicts a value based on the assumption that the growth is exponential.

HARMEAN: Calculates the harmonic mean of an array of numbers. See example below:











INTERCEPT: Calculates the point at which a line will intersect the y-axis when given existing y values and x values. See example below:












KURT: Calculates the kurtosis of an array. See example below:
















LINEST: is a line fitting function and calculates the best straight line.

LOGEST: is a curve fitting function and calculates the best exponential curve.

MAX: Calculates the maximum value from an array of numbers. See example below:














MAXA: Calculates the maximum value from any array of numbers, strings and boolean values.

MAXIFS: Calculates the maximum value in an array of numbers based on a criteria.

MEDIAN: Calculates the median of an array of numbers. See example below:















The rest of the functions will be covered in a following post.