per hour for our data, why don't you make a histogram of the residuals
    (that is, the difference between the original hourly entry data and the predicted values).
    Try different binwidths for your histogram.

    Based on this residual histogram, do you have any insight into how our model
    performed?  Reading a bit on this webpage might be useful:

    http://www.itl.nist.gov/div898/handbook/pri/section2/pri24.htm
    """

    plt.figure()
    (turnstile_weather["ENTRIESn_hourly"] - predictions).plot(kind="hist", bins=10)
    plt.title("Histogram of Residuals")
    plt.ylabel("Frequency")
    plt.xlabel("Prediction Error")
    plt.show()

    """
    # QQ Plot
    z = (turnstile_weather['ENTRIESn_hourly'] - np.mean(turnstile_weather['ENTRIESn_hourly']))/np.std(turnstile_weather['ENTRIESn_hourly'])
    stats.probplot(z,dist="norm",plot=plt)
    plt.show()
    """


if __name__ == "__main__":
    file_path = "../data/turnstile_weather_v2.csv"
    file_pointer = open(file_path)
    turnstile_weather = pandas.read_csv(file_pointer)
    plot_residuals(turnstile_weather, predictions(turnstile_weather))
def compute_r_squared(data, predictions):
    '''
    In exercise 5, we calculated the R^2 value for you. But why don't you try and
    and calculate the R^2 value yourself.
    
    Given a list of original data points, and also a list of predicted data points,
    write a function that will compute and return the coefficient of determination (R^2)
    for this data.  numpy.mean() and numpy.sum() might both be useful here, but
    not necessary.

    Documentation about numpy.mean() and numpy.sum() below:
    http://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html
    http://docs.scipy.org/doc/numpy/reference/generated/numpy.sum.html
    '''
    
    # your code here
    data_avg = np.mean(data)
    partial_denominator = data - data_avg
    denominator = np.sum(partial_denominator*partial_denominator)
    partial_numerator = data - predictions
    numerator = np.sum(partial_numerator*partial_numerator)
    r_squared = 1 - float(numerator/denominator)
    return r_squared

if __name__ == '__main__':
    file_path = "../data/turnstile_weather_v2.csv"
    file_pointer = open(file_path)
    turnstile_weather = pandas.read_csv(file_pointer)
    print(compute_r_squared(turnstile_weather['ENTRIESn_hourly'],predictions(turnstile_weather)))