## Data Modeling: Automated statistical analysis

Nothing fancy here, just some code I worked up for doing statistical analysis for a project I’m working on. This code could be integrated into an algorithm or loop and then used to calculate values on multiple datasets automatically. If you’re a researcher or statistician, it’s not hard to see how something like this would be very useful.

This is v1.0 of the code. For v2.0 (or more likely v3.0), I’m planning to add in data transforms (natural log and square root), normality testing (Anderson-Darling and Kolmogorov-Smirnov), Durbin-Watson, Chi Square, Mann-Whitney U, GEEs, Student t, ANOVA, Box-Cox and a few other analytic tricks such as an algorithm that can decide which test would be most appropriate based on the data and and algorithm that can detect when a data transform is required and then select the proper transform.

The attached file contains two functions for doing automated statistical analysis in MS Access. The code is in VBA but could easily be adapted to another language since most of it is just SQL.

The included functions are as follows. Notice that they return an array variable with multiple data items and you then reference a position in the array for the resultant computation.

Public Function GetStatistics(SelectSQL As String, StDevField As String, Optional StoredQuery As Boolean, Optional SingleValue As Variant) As Variant
‘The function returns: 0 = Sum, 1 = Mean, 2 = Variance, 3 = Standard Deviation, 4 = N, 5 = Skewness, 6 = Kurtosis, 7 = Median

Public Function OLS(SelectSQL As String, XValue As String, YValue As String, Optional StoredQuery As Boolean, Optional SingleValue As Variant) As Variant
‘The function returns: 0 = Slope, 1 = Intercept, 2 = r, 3 = R-Squared 