# <codecell>

ops.applymap?

# <codecell>

ops.apply?

# <codecell>

import YT_api_generate as yt

# <codecell>

ops = yt.format_durations(ops)

# <markdowncell>

# ## THIS THE KEY BIT TO LINK TITLES AND DURATIONS

# <codecell>

dur_ti = ops.groupby(ops.duration_time)['title'].value_counts()

# <codecell>

t=dt.time?

# <codecell>
Esempio n. 2
0
# <codecell>

# this shows some of the problems
ops_df.title_clean.value_counts()

# <markdowncell>

# Some of the difficulties of finding mirrors start to appear here. We have around 100 videos with no title! 175 videos just called 'anonymous' -- what operation do they belong to, if any?
# 
# The good news is that there are many cases where the standard 'Anonymous Operation X' pattern holds. We can work with those as long as we keep in mind that we are losing several hundred potential mirrors in the process.

# <codecell>


ops_cl = yt.format_durations(ops_df)
ti_du = ops_cl.groupby('title_short')['duration_time'].value_counts()
print(ti_du.sum())

# <markdowncell>

# Working with these 3500 videos, we can get an idea of the extent and distribution of mirrors. 
# 
# ## The main mirrors
# 
# Looking at the top 100 mirrored operation videos, we get this kind of distribution

# <codecell>

f= plt.figure(figsize=(10,16))
print(len(ti_du[ti_du>5]))