示例#1
0
def display_usage_stats():
    trip_data = pd.read_csv('201309_trip_summary.csv')
    usage_stats(trip_data)
    usage_plot(trip_data, 'subscription_type')
    usage_plot(trip_data,
               'duration', ['duration < 60'],
               boundary=0,
               bin_width=5)
示例#2
0
def display_new_usage_stats():
    trip_data = pd.read_csv('babs_y1_y2_summary.csv')
    usage_stats(trip_data)
    usage_plot(trip_data,
               'start_month', ['start_month < 12'],
               boundary=1,
               bin_width=1)
    usage_plot(trip_data,
               'start_hour', ['start_hour < 23'],
               boundary=0,
               bin_width=1)
# TODO: plote um gráfico de barras que mostre quantidade de viagens por subscription_type
# lembrando que quando o comando .plot é usado, se pode escolher o tipo de gráfico usando
# o parâmetro kind. Ex: plot(kind='bar')
data = trip_data.groupby('subscription_type', as_index=False).count()
graphic = data.plot.bar(['subscription_type'], ['duration'])
graphic.set_title('Número de Viagens por Subscription Type')
graphic.set_xlabel('Subscription Type')
graphic.set_ylabel('Número de Viagens')
graphic.legend(['Nº de Usuários'])

# Para que você possa conferir se os seus gráficos estão corretos, usaremos a função `use_plot()`. O segundo argumento da função nos permite contar as viagens em uma variável selecionada, exibindo as informações em um gráfico. A expressão abaixo mostrará como deve ter ficado o seu gráfico acima.

# In[16]:

# como o seu gráfico deve ficar. Descomente a linha abaixo caso queira rodar este comando
usage_plot(trip_data, 'subscription_type')

# >*Nota*: Perceba que provavelmente o seu gráfico não ficou exatamente igual, principalmente pelo título e pelo nome dos eixos. Lembre-se, estes são detalhes mas fazem toda a diferença quando você for apresentar os gráficos que você analisou. Neste Nanodegree não focaremos nestas questões, mas tenha em mente que ter os gráficos acertados é de extrema importância.

# Parece que existe 50% mais viagens feitas por assinantes (subscribers) no primeiro mês do que outro tipos de consumidores. Vamos tentar uma outra variável. Como é a distribuição da duração das viagens (trip duration)?

# In[17]:

# TODO: Faça um gráfico baseado nas durações
plt.hist(trip_data['duration'])
plt.xlabel('Duração')
plt.ylabel('Número de Viagens')
plt.title('Número de Viagens Por Duração')
plt.show()

# In[18]:
trip_in = ['201309_trip_data.csv']
trip_out = '201309_trip_summary.csv'
summarise_data(trip_in, station_data, trip_out)

# Load in the data file and print out the first few rows
sample_data = pd.read_csv(trip_out)
display(sample_data.head())

# Verify the dataframe by counting data points matching each of the time features.
question_3(sample_data)

trip_data = pd.read_csv('201309_trip_summary.csv')

usage_stats(trip_data)

usage_plot(trip_data, 'subscription_type')

usage_plot(trip_data, 'duration')

usage_plot(trip_data, 'duration', ['duration < 60'])

usage_plot(trip_data, 'duration', ['duration < 60'], boundary=0, bin_width=5)

station_data = [
    '201402_station_data.csv', '201408_station_data.csv',
    '201508_station_data.csv'
]
trip_in = [
    '201402_trip_data.csv', '201408_trip_data.csv', '201508_trip_data.csv'
]
trip_out = 'babs_y1_y2_summary.csv'
# Now that you have some data saved to a file, let's look at some initial trends in the data. Some code has already been written for you in the `babs_visualizations.py` script to help summarize and visualize the data; this has been imported as the functions `usage_stats()` and `usage_plot()`. In this section we'll walk through some of the things you can do with the functions, and you'll use the functions for yourself in the last part of the project. First, run the following cell to load the data, then use the `usage_stats()` function to see the total number of trips made in the first month of operations, along with some statistics regarding how long trips took.

# In[14]:

trip_data = pd.read_csv('201309_trip_summary.csv')

usage_stats(trip_data)


# You should see that there are over 27,000 trips in the first month, and that the average trip duration is larger than the median trip duration (the point where 50% of trips are shorter, and 50% are longer). In fact, the mean is larger than the 75% shortest durations. This will be interesting to look at later on.
# 
# Let's start looking at how those trips are divided by subscription type. One easy way to build an intuition about the data is to plot it. We'll use the `usage_plot()` function for this. The second argument of the function allows us to count up the trips across a selected variable, displaying the information in a plot. The expression below will show how many customer and how many subscriber trips were made. Try it out!

# In[15]:

usage_plot(trip_data, 'subscription_type')


# Seems like there's about 50% more trips made by subscribers in the first month than customers. Let's try a different variable now. What does the distribution of trip durations look like?

# In[16]:

usage_plot(trip_data, 'duration')


# Looks pretty strange, doesn't it? Take a look at the duration values on the x-axis. Most rides are expected to be 30 minutes or less, since there are overage charges for taking extra time in a single trip. The first bar spans durations up to about 1000 minutes, or over 16 hours. Based on the statistics we got out of `usage_stats()`, we should have expected some trips with very long durations that bring the average to be so much higher than the median: the plot shows this in a dramatic, but unhelpful way.
# 
# When exploring the data, you will often need to work with visualization function parameters in order to make the data easier to understand. Here's where the third argument of the `usage_plot()` function comes in. Filters can be set for data points as a list of conditions. Let's start by limiting things to trips of less than 60 minutes.

# In[17]:
#
# Now that you have some data saved to a file, let's look at some initial trends in the data. Some code has already been written for you in the `babs_visualizations.py` script to help summarize and visualize the data; this has been imported as the functions `usage_stats()` and `usage_plot()`. In this section we'll walk through some of the things you can do with the functions, and you'll use the functions for yourself in the last part of the project. First, run the following cell to load the data, then use the `usage_stats()` function to see the total number of trips made in the first month of operations, along with some statistics regarding how long trips took.

# In[9]:

trip_data = pd.read_csv('201309_trip_summary.csv')

usage_stats(trip_data)

# You should see that there are over 27,000 trips in the first month, and that the average trip duration is larger than the median trip duration (the point where 50% of trips are shorter, and 50% are longer). In fact, the mean is larger than the 75% shortest durations. This will be interesting to look at later on.
#
# Let's start looking at how those trips are divided by subscription type. One easy way to build an intuition about the data is to plot it. We'll use the `usage_plot()` function for this. The second argument of the function allows us to count up the trips across a selected variable, displaying the information in a plot. The expression below will show how many customer and how many subscriber trips were made. Try it out!

# In[10]:

usage_plot(trip_data, 'subscription_type')

# Seems like there's about 50% more trips made by subscribers in the first month than customers. Let's try a different variable now. What does the distribution of trip durations look like?

# In[11]:

usage_plot(trip_data, 'duration')

# Looks pretty strange, doesn't it? Take a look at the duration values on the x-axis. Most rides are expected to be 30 minutes or less, since there are overage charges for taking extra time in a single trip. The first bar spans durations up to about 1000 minutes, or over 16 hours. Based on the statistics we got out of `usage_stats()`, we should have expected some trips with very long durations that bring the average to be so much higher than the median: the plot shows this in a dramatic, but unhelpful way.
#
# When exploring the data, you will often need to work with visualization function parameters in order to make the data easier to understand. Here's where the third argument of the `usage_plot()` function comes in. Filters can be set for data points as a list of conditions. Let's start by limiting things to trips of less than 60 minutes.

# In[12]:

usage_plot(trip_data, 'duration', ['duration < 60'])
示例#7
0
#
# Now that you have some data saved to a file, let's look at some initial trends in the data. Some code has already been written for you in the `babs_visualizations.py` script to help summarize and visualize the data; this has been imported as the functions `usage_stats()` and `usage_plot()`. In this section we'll walk through some of the things you can do with the functions, and you'll use the functions for yourself in the last part of the project. First, run the following cell to load the data, then use the `usage_stats()` function to see the total number of trips made in the first month of operations, along with some statistics regarding how long trips took.

# In[22]:

trip_data = pd.read_csv('201309_trip_summary.csv')

usage_stats(trip_data)

# You should see that there are over 27,000 trips in the first month, and that the average trip duration is larger than the median trip duration (the point where 50% of trips are shorter, and 50% are longer). In fact, the mean is larger than the 75% shortest durations. This will be interesting to look at later on.
#
# Let's start looking at how those trips are divided by subscription type. One easy way to build an intuition about the data is to plot it. We'll use the `usage_plot()` function for this. The second argument of the function allows us to count up the trips across a selected variable, displaying the information in a plot. The expression below will show how many customer and how many subscriber trips were made. Try it out!

# In[35]:

usage_plot(trip_data, 'start_hour')

# Seems like there's about 50% more trips made by subscribers in the first month than customers. Let's try a different variable now. What does the distribution of trip durations look like?

# In[49]:

usage_plot(trip_data, 'start_hour', ['start_hour > 12'])

# Looks pretty strange, doesn't it? Take a look at the duration values on the x-axis. Most rides are expected to be 30 minutes or less, since there are overage charges for taking extra time in a single trip. The first bar spans durations up to about 1000 minutes, or over 16 hours. Based on the statistics we got out of `usage_stats()`, we should have expected some trips with very long durations that bring the average to be so much higher than the median: the plot shows this in a dramatic, but unhelpful way.
#
# When exploring the data, you will often need to work with visualization function parameters in order to make the data easier to understand. Here's where the third argument of the `usage_plot()` function comes in. Filters can be set for data points as a list of conditions. Let's start by limiting things to trips of less than 60 minutes.

# In[50]:

usage_plot(trip_data, 'duration', ['duration < 60'], bin_width=2)