Creating own header files for statistics


I was always interested in the subject of 'Probability & Statistics'. So I went ahead and created a header file to help me solve the questions present in the text book I was studying.

I've developed two header files:
  1. measures.h: It has all the functions to measure the various measures used in statistics like mean, median, variance, deviation etc.
  2. data.h: This contains the definitions of two data-types 'univariateData' and 'bivariateData'. These are helpful in processing data right on input and giving out details about the given data. They are immutable(can't be edited).
*Note: Don't include measures.h if data.h is already included because data.h has measures.h declared in it.

The cool part is that just 2 lines of code and user input are now enough to analyze any univariate data or bivariate data and display the info!


About the data: The given data is about the use of alternative fueled vehicles in the USA between the years 1992 and 1999. Here alternative fueled vehicles refer to vehicles which use ethanol, methanol, electricity, natural gas or LPG as the source of energy.

Here's the data(in tabular form):


YEAR NUMBER OF VEHICLES
1992 251,352
1993 314,848
1994 324,472
1995 333,049
1996 352,421
1997 367,526
1998 383,847
1999 407,542

Analyzing using the data.h file: 

Clearly, the given data is bivariate data as two values are being measured on an experimental unit. Here the experimental unit is the vehicle and the two variables being measured are it's year and number of such vehicles in that particular year.

So now, we can use the 'bivariateData' data-type provided by the data.h to store and analyze data. Furthermore, since the given data is bivariate data, we can also construct a regression line and predict the data! 

Here's the program that I've written along with a snap of the execution:



Snap of execution:








*Note: In the regression equation(in the above snap), YEAR and NO_OF_VEHICLES were placed wrongly. It must be NO_OF_VEHICLES = (number) * YEAR + (number). I've made this change later.

So there you have it, the data has been stored and analyzed right on input! Similarly, this can be done
with univariate data. But we cannot predict data for univariate data as regression line cannot be constructed for it. We can only analyze.

So that's it for now. If you'd like to download the header files, here's the link to my google drive folder: https://drive.google.com/drive/folders/0B13xGrhLEVkjQm5aWV9IS3lJZ00?usp=sharing

By the way, you can directly use the functions such as mean(), median(), covariance() on any double[] array provided that you send the correct arguments. Have a look at the comments in header files to better understand.

Comments

Popular posts from this blog

Beginner's guide to Solving the N-Queens problem using backtracking method

Guide to Solving Maximal Subarray Problem using Kadane's Algorithm

PvP Chain reaction game using Python