Exemplo n.º 1
0
def test_histogram(seed):
    x = np.random.normal(size=10000)
    expected = """ (Counts)  ^
224.400000 |
218.790000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
213.180000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
207.570000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
201.960000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡄⢠⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
196.350000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⣿⡇⣆⢸⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
190.740000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⡇⣿⣷⣿⢸⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
185.130000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠀⣤⡇⣿⣿⣿⣿⢸⢠⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
179.520000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠀⣿⡇⣿⣿⣿⣿⣿⢸⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
173.910000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠀⣿⣧⣿⣿⣿⣿⣿⢸⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
168.300000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⢀⣿⣿⣿⣿⣿⣿⣿⣸⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
162.690000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⣾⣿⣿⣿⣿⣿⣿⣿⣿⡇⣶⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
157.080000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⣿⣿⣿⣿⣿⣿⣿⣿⣿⡇⣿⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
151.470000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⣿⣿⣿⣿⣿⣿⣿⣿⣿⣷⣿⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
145.860000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣦⣼⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
140.250000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡄⠀⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
134.640000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⣴⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡇⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
129.030000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣇⡇⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
123.420000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡇⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
117.810000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡇⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
112.200000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢰⣇⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡇⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
106.590000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣷⣿⠀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
100.980000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡇⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
95.3700000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⢸⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣇⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
89.7600000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⢸⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
84.1500000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢰⢸⣾⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣧⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
78.5400000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣿⣾⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
72.9300000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
67.3200000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢠⡇⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
61.7100000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⡇⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣧⢸⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
56.1000000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣼⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
50.4900000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
44.8800000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢠⡀⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
39.2700000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⣧⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
33.6600000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢰⣸⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡀⡆⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
28.0500000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡆⣾⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣧⡇⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
22.4400000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢠⣇⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣧⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
16.8300000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣾⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣶⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
11.2200000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠀⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣤⡄⢠⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
5.61000000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡄⣠⢠⢰⣼⣠⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣷⣾⡄⣀⢀⠀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
         0 | ⠀⠀⠀⠀⠀⠀⢠⡄⢠⡄⠀⠀⣤⣦⣦⣿⣿⣾⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣷⣿⣼⡆⣧⣤⢠⢠⠀⠀⢠⡄⠀⠀⠀⠀⠀⠀
-----------|-|---------|---------|---------|---------|---------|---------|---------|---------|-> (X)
           | -4.707264 -3.529968 -2.352673 -1.175377 0.0019187 1.1792144 2.3565101 3.5338058 4.7111015"""
    print(plotille.histogram(x))
    assert expected == plotille.histogram(x)
    assert expected == plotille.histogram(list(x))
 def do_histogram(self, arg: str):
     try:
         data = self.batch_tp.query_single_result(arg)
         print(plotille.histogram(data))
         self.print_percentiles(data)
     except TraceProcessorException as ex:
         logging.error("Query failed: {}".format(ex))
Exemplo n.º 3
0
def print_stats(stats):
  header = [stats['filename'], 'count']
  table = []
  table.append(['sequences', stats['sequences']])
  logging.info(pprint.pformat(stats))
  for k,v in sorted(stats['type_counts'].items()):
    table.append(['   ' + k, v])
    if 'special_char_count' in stats:
      if k in stats['special_char_count']:
        for k2,v2 in sorted(stats['special_char_count'][k].items()):
          table.append(['      ' + k2,v2])
    if 'ambiguous_char_count' in stats:
      if k in stats['ambiguous_char_count']:
        for k2,v2 in sorted(stats['ambiguous_char_count'][k].items()):
          table.append(['      ' + k2,v2])
    if 'unknown_char_count' in stats:
      if k in stats['unknown_char_count']:
        for k2,v2 in sorted(stats['unknown_char_count'][k].items()):
          table.append(['      ' + k2,v2])

  tabulate.PRESERVE_WHITESPACE = True
  print(tabulate.tabulate(table, headers=header, tablefmt='pretty', colalign=('left', 'right')))
  if merge(stats['unknown_char_count']):
    print("WARNING: The file contains unknown characters for DNA, RNA and AA sequences. ")
    print("         It will probably fail in applications with strict alphabet checking.")

  if 'seq_lenghts' in stats:
    import plotille
    print('')
    print('Sequence length distribution')
    print(plotille.histogram(stats['seq_lenghts'], height=25, x_min=0))
  print('')
Exemplo n.º 4
0
def test_empty_histogram(seed, empty):
    expected = """ (Counts)  ^
         1 |
0.97500000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.95000000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.92500000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.90000000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.87500000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.85000000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.82500000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.80000000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.77500000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.75000000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.72500000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.70000000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.67500000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.65000000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.62500000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.60000000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.57500000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.55000000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.52500000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.50000000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.47500000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.45000000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.42500000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.40000000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.37500000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.35000000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.32500000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.30000000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.27500000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.25000000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.22500000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.20000000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.17500000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.15000000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.12500000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.10000000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.07500000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.05000000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
0.02500000 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
         0 | ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
-----------|-|---------|---------|---------|---------|---------|---------|---------|---------|-> (X)
           | 0         0.1250000 0.2500000 0.3750000 0.5000000 0.6250000 0.7500000 0.8750000 1        """
    print(plotille.histogram([]))
    assert expected == plotille.histogram([])
Exemplo n.º 5
0
def main():
    import argparse, sys, math
    import plotille
    import reprint
    from pklaus.network.ping.ping_wrapper import PingWrapper
    parser = argparse.ArgumentParser(
        description='Ping a host and create histogram.')
    parser.add_argument('host', help='The host to ping')
    parser.add_argument('--count',
                        '-c',
                        type=int,
                        default=60,
                        help='Number of times the host should be pinged')
    parser.add_argument('--interval',
                        '-i',
                        type=float,
                        default=1.0,
                        help='Interval between individual pings.')
    parser.add_argument('--debug',
                        '-d',
                        action='store_true',
                        help='Enable debug output for this script')
    args = parser.parse_args()

    if args.debug:
        logging.basicConfig(format='%(levelname)s:%(message)s', level='DEBUG')

    ping_wrapper = PingWrapper(
        f'ping {args.host} -c{args.count} -i{args.interval}')
    try:
        round_trip_times = []
        with reprint.output(initial_len=24, interval=0) as output_lines:
            for round_trip_time in ping_wrapper.run():
                round_trip_times.append(round_trip_time)
                if len(round_trip_times) < 2: continue
                x_min = math.floor(min(round_trip_times))
                x_max = math.ceil(max(round_trip_times))
                hist_string = plotille.histogram(round_trip_times,
                                                 width=60,
                                                 height=20,
                                                 x_min=x_min,
                                                 x_max=x_max)
                output_lines.change(hist_string.split('\n'))
    except KeyboardInterrupt:
        sys.exit()

    exit(ping_wrapper.returncode)
Exemplo n.º 6
0
    def calculate_crop_area(tokens,
                            width,
                            tolerance=.1,
                            edge_percentage=20,
                            show_histogram=True):
        PDFTokenizer.log.info(
            f'Going to calculate crop area for {len(tokens)} tokens')
        x_values = []
        for token in tokens:
            #PDFTokenizer.log.info(f'token.rect: {token.rect}')
            for i in range(int(token.rect.x0), int(token.rect.x1)):
                x_values.append(i)

        if len(x_values) == 0:
            PDFTokenizer.log.warn(
                'Unable to calculate crop area, will use full page width')
            return 0, width

        #PDFTokenizer.log.debug(f'min(x_values): {min(x_values)}')
        #PDFTokenizer.log.debug(f'max(x_values): {max(x_values)}')
        counts, bin_edges = numpy.histogram(x_values, bins=100)
        #PDFTokenizer.log.debug(f'counts: {counts}')
        #PDFTokenizer.log.debug(f'bin_edges: {bin_edges}')
        if show_histogram:
            print(plotille.histogram(x_values, bins=int(max(x_values))))

        cutoff = max(counts) * tolerance
        PDFTokenizer.log.info(
            f'Cutoff set to {max(counts)} * {tolerance} = {cutoff}')

        edge_left, edge_right = 0, max(x_values) + 1
        for i, c in enumerate(counts[:edge_percentage], start=1):
            #PDFTokenizer.log.debug(f'{i}: {c} < {cutoff} ? => {edge_left}')
            if c < cutoff:
                edge_left = (width * i) / 100
        for i, c in enumerate(counts[-edge_percentage:], start=1):
            #PDFTokenizer.log.debug(f'{i}: {c} < {cutoff} ? => {edge_right}')
            if c < cutoff:
                edge_right = (width * (100 - i)) / 100

        return edge_left, edge_right
Exemplo n.º 7
0
def main(args):
    data = []
    for input_f in args.input:
        with open(input_f) as f:
            probs = []
            for line in f:
                jsondict = json.loads(line)
                if 'label_prob' in jsondict:
                    prob = jsondict['label_prob']
                elif 'label_probs' in jsondict:
                    prob = max(jsondict['label_probs'])
                probs.append(prob)
            data.append(probs)
    if args.media == 'terminal':
        probs = data[0]  # terminal mode supports only 1 distribution
        import plotille
        if args.hist_type == 'histogram':
            print(plotille.histogram(probs, lc='cyan'))
        else:
            print(plotille.hist(probs, lc='cyan'))
    else:
        import matplotlib.pyplot as plt
        import seaborn as sns
        sns.set()
        colors = sns.color_palette("muted", len(data))
        for i, probs in enumerate(data):
            ax = sns.distplot(probs,
                              bins=40,
                              kde=False,
                              color=colors[i],
                              label=args.input[i])
        if args.title:
            ax.set_title(args.title)
        ax.spines["top"].set_visible(False)
        ax.spines["bottom"].set_visible(False)
        ax.spines["left"].set_visible(False)
        ax.spines["right"].set_visible(False)
        plt.show()
Exemplo n.º 8
0
def main(args):

    samples = []
    with open(args.input) as in_f:
        for line in in_f:
            samples.append(json.loads(line))
    samples = samples[args.start_index:]
    workers = ThreadPool(processes=args.num_workers)
    evidence_lens = []
    with open(args.output, 'a') as out_f:
        for batch_samples in tqdm(batch(samples, n=args.batch_size), total=math.ceil(len(samples) / args.batch_size)):
            choices_text = [choice['text'] for sample in batch_samples
                            for choice in sample['question']['choices']]
            evidences = workers.starmap(
                retrieve, [(text, args.max_evidences) for text in choices_text])
            for sample, evidences in zip(batch_samples, batch(evidences, 5)):
                for i, choice in enumerate(sample['question']['choices']):
                    choice['evidence'] = evidences[i]
                    evidence_lens.append(len(evidences[i]))
                out_f.write(json.dumps(sample) + '\n')
    print(plotille.histogram(evidence_lens))
    if args.stats_file:
        with open(args.stats_file, 'w') as stats_f:
            stats_f.write(','.join(map(str, evidence_lens)))
Exemplo n.º 9
0
def main():
    #############################################################################################
    # Preparation #
    #############################################################################################
    # Get options from user #
    logging.basicConfig(level=logging.DEBUG,format='%(asctime)s - %(levelname)s - %(message)s',datefmt='%m/%d/%Y %H:%M:%S')
    opt = get_options()
    # Verbose logging #
    if not opt.verbose:
        logging.getLogger().setLevel(logging.INFO)


    # Private modules containing Pyroot #
    from NeuralNet import HyperModel
    from import_tree import LoopOverTrees
    from produce_output import ProduceOutput
    from make_scaler import MakeScaler
    from submit_on_slurm import submit_on_slurm
    from generate_mask import GenerateMask, GenerateSampleMasks, GenerateSliceIndices, GenerateSliceMask
    from split_training import DictSplit
    from concatenate_csv import ConcatenateCSV
    from threadGPU import utilizationGPU
    import parameters

    # Needed because PyROOT messes with argparse

    logging.info("="*98)
    logging.info("   _    _ _    _ __  __            _     _            _                           _             ")
    logging.info("  | |  | | |  | |  \/  |          | |   (_)          | |                         (_)            ")
    logging.info("  | |__| | |__| | \  / | __ _  ___| |__  _ _ __   ___| |     ___  __ _ _ __ _ __  _ _ __   __ _ ")
    logging.info("  |  __  |  __  | |\/| |/ _` |/ __| '_ \| | '_ \ / _ \ |    / _ \/ _` | '__| '_ \| | '_ \ / _` |")
    logging.info("  | |  | | |  | | |  | | (_| | (__| | | | | | | |  __/ |___|  __/ (_| | |  | | | | | | | | (_| |")
    logging.info("  |_|  |_|_|  |_|_|  |_|\__,_|\___|_| |_|_|_| |_|\___|______\___|\__,_|_|  |_| |_|_|_| |_|\__, |")
    logging.info("                                                                                           __/ |")
    logging.info("                                                                                          |___/ ")
    logging.info("="*98)

    # Make path model #
    path_model = os.path.join(parameters.main_path,'model')
    if not os.path.exists(path_model):
        os.mkdir(path_model)

    #############################################################################################
    # Splitting into sub-dicts and slurm submission #
    #############################################################################################
    if opt.submit != '':
        if opt.split != 0:
            DictSplit(opt.split,opt.submit,opt.resubmit)
            logging.info('Splitting jobs done')
        
        # Arguments to send #
        args = '' 
        if opt.generator:           args += ' --generator '
        if opt.GPU:                 args += ' --GPU '
        if opt.resume:              args += ' --resume '
        if opt.nocache:             args += ' --nocache'
        if opt.model!='':           args += ' --model %s '%opt.model
        if len(opt.output)!=0:      args += ' --output '+ ' '.join(opt.output)+' '

        if opt.submit!='':
            logging.info('Submitting jobs with args "%s"'%args)
            name = opt.submit
            if opt.resubmit:
                name += '_resubmit'
            submit_on_slurm(name=name,debug=opt.debug,args=args)
        sys.exit()

    #############################################################################################
    # CSV concatenation #
    #############################################################################################
    if opt.csv!='':
        logging.info('Concatenating csv files from : %s'%(opt.csv))
        dict_csv = ConcatenateCSV(opt.csv)

        sys.exit()

    #############################################################################################
    # Reporting given scan in csv file #
    #############################################################################################
    if opt.report != '':
        instance = HyperModel(opt.report)
        instance.HyperReport(parameters.eval_criterion)

        sys.exit()

    #############################################################################################
    # Output of given files from given model #
    #############################################################################################
    if opt.model != '' and len(opt.output) != 0:
        # Create directory #
        path_output = os.path.join(parameters.path_out,opt.model)
        if not os.path.exists(path_output):
            os.mkdir(path_output)

        # Instantiate #
        inst_out = ProduceOutput(model=os.path.join(parameters.path_model,opt.model),generator=opt.generator)
        # Loop over output keys #
        for key in opt.output:
            # Create subdir #
            path_output_sub = os.path.join(path_output,key+'_output')
            if not os.path.exists(path_output_sub):
                os.mkdir(path_output_sub)
            try:
                inst_out.OutputNewData(input_dir=samples_path,list_sample=samples_dict[key],path_output=path_output_sub)
            except Exception as e:
                logging.critical('Could not process key "%s" due to "%s"'%(key,e))
        sys.exit()
    #############################################################################################
    # Data Input and preprocessing #
    #############################################################################################
    # Memory Usage #
    #pid = psutil.Process(os.getpid())
    logging.info('Current pid : %d'%os.getpid())

    # Input path #
    logging.info('Starting tree importation')

    # Import variables from parameters.py
    variables = parameters.inputs+parameters.LBN_inputs+parameters.outputs+parameters.other_variables
    variables = [v for i,v in enumerate(variables) if v not in variables[:i]] # avoid repetitons while keeping order
        
    list_inputs  = [var.replace('$','') for var in parameters.inputs]
    list_outputs = [var.replace('$','') for var in parameters.outputs]

    parametric = 'param' in list_inputs

    if parametric:
        def findMassInSignal(sampleName):
            if "HH" not in sampleName:
                return None
            return float(re.findall('M-\d+',sampleName)[0].replace('M-',''))
    else:
        findMassInSignal = None
        

    # Load samples #
    with open (parameters.config,'r') as f:
        sampleConfig = yaml.load(f)

    if not opt.generator:
        if opt.nocache:
            logging.warning('No cache will be used nor saved')
        if os.path.exists(parameters.train_cache) and not opt.nocache:
            logging.info('Will load training data from cache')
            logging.info('... Training set : %s'%parameters.train_cache)
            train_all = pd.read_pickle(parameters.train_cache)
            if os.path.exists(parameters.test_cache) and not opt.nocache and not parameters.crossvalidation:
                logging.info('Will load testing data from cache')
                logging.info('... Testing  set : %s'%parameters.test_cache)
                test_all = pd.read_pickle(parameters.test_cache)
        else:
            # Import arrays #
            data_dict = {}
            xsec_dict = dict()
            event_weight_sum_dict = dict()
            for era in parameters.eras:
                with open(parameters.xsec_json.format(era=era),'r') as handle:
                    xsec_dict[era] = json.load(handle)
                with open(parameters.event_weight_sum_json.format(era=era),'r') as handle:
                    event_weight_sum_dict[era] = json.load(handle)

            for node in parameters.nodes:
                logging.info('Starting data importation for class {}'.format(node))
                data_node = None
                for cat in parameters.categories:
                    logging.info('... Starting data importation for jet category {}'.format(cat))
                    strSelect = [f'{cat}_{channel}_{node}' for channel in parameters.channels]
                    data_cat = None
                    for era in parameters.eras:
                        samples_dict = sampleConfig['sampleDict'][int(era)]
                        if len(samples_dict.keys())==0:
                            logging.info('\tSample dict for era {} is empty'.format(era))
                            continue
                        list_sample = [sample for key in strSelect for sample in samples_dict[key]]
                        data_cat_era = LoopOverTrees(input_dir                 = sampleConfig['sampleDir'],
                                                      variables                 = variables,
                                                      weight                    = parameters.weight,
                                                      list_sample               = list_sample,
                                                      cut                       = parameters.cut,
                                                      xsec_dict                 = xsec_dict,
                                                      event_weight_sum_dict     = event_weight_sum_dict,
                                                      lumi_dict                 = parameters.lumidict,
                                                      eras                      = era,
                                                      paramFun                  = findMassInSignal,
                                                      tree_name                 = parameters.tree_name,
                                                      additional_columns        = {'tag':node,'era':era})
                                                      #stop                      = 300000) # TODO : remove 
                        era_str = '{:5s} class - {:15s} category - era {}  : sample size = {:10d}'.format(node,cat,era,data_cat_era.shape[0])
                        if data_cat is None:
                            data_cat = data_cat_era
                        else:
                            data_cat = pd.concat([data_cat,data_cat_era],axis=0)
                        if data_cat_era.shape[0] == 0:
                            logging.info(era_str)
                            continue
                        if parameters.weight is not None:
                            era_str += ', weight sum = {:.3e} (with normalization = {:.3e})'.format(data_cat_era[parameters.weight].sum(),data_cat_era['event_weight'].sum())
                        logging.info(era_str)
                    cat_str = '{:5s} class - {:15s} category : sample size = {:10d}'.format(node,cat,data_cat.shape[0])
                    if data_cat.shape[0] == 0:
                        logging.info(cat_str)
                        continue
                    if parameters.weight is not None:
                        era_str += ', weight sum = {:.3e} (with normalization = {:.3e})'.format(data_cat[parameters.weight].sum(),data_cat['event_weight'].sum())
                    if data_node is None:
                        data_node = data_cat
                    else:
                        data_node = pd.concat([data_node,data_cat],axis=0)
                data_dict[node] = data_node
                all_eras_str = '{:5s} class - all categories : sample size = {:10d}'.format(node,data_node.shape[0])
                if parameters.weight is not None:
                    all_eras_str +=  ', weight sum = {:.3e} (with normalization = {:.3e})'.format(data_node[parameters.weight].sum(),data_node['event_weight'].sum())
                logging.info(all_eras_str)

            # Data splitting #
            for node,data in data_dict.items():
                if parameters.crossvalidation: # Cross-validation
                    if parameters.splitbranch not in data.columns:
                        raise RuntimeError('Asked for cross validation mask but cannot find the slicing array')
                    try:
                        data['mask'] = (data[parameters.splitbranch] % parameters.N_slices).to_numpy()
                        # Will contain numbers : 0,1,2,...N_slices-1
                    except ValueError:
                        logging.critical("Problem with the masking")
                        raise ValueError
                else: # Classic separation
                    train_dict = {}
                    test_dict = {}
                    mask = GenerateMask(data.shape[0],parameters.suffix+'_'+node)
                    try:
                        train_dict[node] = data[mask==True]
                        test_dict[node]  = data[mask==False]
                    except ValueError:
                        logging.critical("Problem with the mask you imported, has the data changed since it was generated ?")
                        raise ValueError
             
            if parameters.crossvalidation:
                train_all = pd.concat(data_dict.values(),copy=True).reset_index(drop=True)
                test_all = pd.DataFrame(columns=train_all.columns) # Empty to not break rest of script
            else:
                train_all = pd.concat(train_dict.values(),copy=True).reset_index(drop=True)
                test_all  = pd.concat(test_dict.values(),copy=True).reset_index(drop=True)
            del data_dict 
            if not parameters.crossvalidation:
                del train_dict, test_dict

            if parametric:
                # Assign random parameters to background in same proportions as signal #
                param_idx = train_all['param']>0
                countPerParam = pd.value_counts(train_all[param_idx]['param']) # Number of occurences of each parameter
                freq = countPerParam / countPerParam.sum() # frequency of all parameters in the training data
                prop = (freq*(~param_idx).sum()).round().astype(np.int32) # proportions to be applied in the non-param samples
                if prop.sum() != (~param_idx).sum(): # resolved truncation errors
                    diff = (~param_idx).sum() - prop.sum()
                    prop.iloc[-1] += diff
                rep = pd.DataFrame(prop.index.repeat(prop.values),columns=['param']) # All repetitions in correct proportions
                rep = rep.sample(frac=1) # shuffle to uniformly distribute for each background
                rep.index = train_all.loc[~param_idx,'param'].index # Needs to have same index (otherwise nan)
                train_all.loc[~param_idx,'param'] = rep # Assign the prepared params to the non-param events
                if not parameters.crossvalidation: # do the same for testing data. TODO: make cleaner
                    param_idx = test_all['param']>0
                    prop = (freq*(~param_idx).sum()).round().astype(np.int32)
                    if prop.sum() != (~param_idx).sum(): 
                        diff = (~param_idx).sum() - prop.sum()
                        prop.iloc[-1] += diff
                    rep = pd.DataFrame(prop.index.repeat(prop.values),columns=['param']) 
                    rep = rep.sample(frac=1)
                    rep.index = test_all.loc[~param_idx,'param'].index
                    test_all.loc[~param_idx,'param'] = rep 
    
            #logging.info('Current memory usage : %0.3f GB'%(pid.memory_info().rss/(1024**3)))

            # Randomize order, we don't want only one type per batch #
            random_train = np.arange(0,train_all.shape[0]) # needed to randomize x,y and w in same fashion
            np.random.shuffle(random_train) # Not needed for testing
            train_all = train_all.iloc[random_train]
              
            # Add target #
            label_encoder = LabelEncoder()
            onehot_encoder = OneHotEncoder(sparse=False)
            label_encoder.fit(parameters.nodes)
            # From strings to labels #
            train_integers = label_encoder.transform(train_all['tag']).reshape(-1, 1)
            if not parameters.crossvalidation:
                test_integers = label_encoder.transform(test_all['tag']).reshape(-1, 1)
            # From labels to strings #
            onehot_encoder.fit(np.arange(len(list_outputs)).reshape(-1, 1))
            train_onehot = onehot_encoder.transform(train_integers)
            if not parameters.crossvalidation:
                test_onehot = onehot_encoder.transform(test_integers)
            # From arrays to pd DF #
            train_cat = pd.DataFrame(train_onehot,columns=label_encoder.classes_,index=train_all.index)
            if not parameters.crossvalidation:
                test_cat = pd.DataFrame(test_onehot,columns=label_encoder.classes_,index=test_all.index)
            # Add to full #
            train_all = pd.concat([train_all,train_cat],axis=1)
            train_all[list_inputs+list_outputs] = train_all[list_inputs+list_outputs].astype('float32')
            if not parameters.crossvalidation:
                test_all = pd.concat([test_all,test_cat],axis=1)
                test_all[list_inputs+list_outputs] = test_all[list_inputs+list_outputs].astype('float32')


          # Caching #
            if not opt.nocache:
                train_all.to_pickle(parameters.train_cache)
                logging.info('Data saved to cache')
                logging.info('... Training set : %s'%parameters.train_cache)
                if not parameters.crossvalidation:
                    test_all.to_pickle(parameters.test_cache)
                    logging.info('... Testing  set : %s'%parameters.test_cache)
         
        logging.info("Sample size seen by network : %d"%train_all.shape[0])
        #logging.info('Current memory usage : %0.3f GB'%(pid.memory_info().rss/(1024**3)))
        if parameters.crossvalidation: 
            N = train_all.shape[0]
            logging.info('Cross-validation has been requested on set of %d events'%N)
            for i in range(parameters.N_models):
                slices_apply , slices_eval, slices_train = GenerateSliceIndices(i)
                logging.info('... Model %d :'%i)
                for slicename, slices in zip (['Applied','Evaluated','Trained'],[slices_apply , slices_eval, slices_train]):
                    selector = np.full((train_all.shape[0]), False, dtype=bool)
                    selector = GenerateSliceMask(slices,train_all['mask'])
                    n = train_all[selector].shape[0]
                    logging.info('     %10s on %10d [%3.2f%%] events'%(slicename,n,n*100/N)+' (With mask indices : ['+','.join([str(s) for s in slices])+'])')
        else:
            logging.info("Sample size for the output  : %d"%test_all.shape[0])

        # Preprocessing #
        # The purpose is to create a scaler object and save it
        # The preprocessing will be implemented in the network with a custom layer
        MakeScaler(train_all,list_inputs) 

        # Weight equalization #
        N = train_all.shape[0]
        sumweight_per_group = {}
        factor_per_node = {}
        factorsum = sum([factor for factor,_ in parameters.weight_groups])
        for factor, group in parameters.weight_groups:
            if not isinstance(group,tuple):
                group = (group,)
            if group not in sumweight_per_group.keys():
                sumweight_per_group[group] = 0
            for node in group:
                sumweight_per_group[group] += train_all[train_all['tag']==node]['event_weight'].sum() 
                if not parameters.crossvalidation:
                    sumweight_per_group[group] += test_all[test_all['tag']==node]['event_weight'].sum() 
                factor_per_node[node] = factor
        sumweight_per_node = {node:sw for group,sw in sumweight_per_group.items() for node in group}

        totweight_per_node = {}
        train_all['learning_weight'] = pd.Series(np.zeros(train_all.shape[0]),index=train_all.index)
        if not parameters.crossvalidation:
            test_all['learning_weight'] = pd.Series(np.zeros(test_all.shape[0]),index=test_all.index)
        for node in parameters.nodes:
            sum_event_weight = train_all[train_all['tag']==node]['event_weight'].sum()
            if not parameters.crossvalidation:
                sum_event_weight += test_all[test_all['tag']==node]['event_weight'].sum()
            logging.info('Sum of weight for {:6s} samples : {:.2e}'.format(node,sum_event_weight))
            train_all.loc[train_all['tag']==node,'learning_weight'] = train_all[train_all['tag']==node]['event_weight'] * (factor_per_node[node]/factorsum) * (N/sumweight_per_node[node])
            if not parameters.crossvalidation:
                test_all.loc[test_all['tag']==node,'learning_weight'] = test_all[test_all['tag']==node]['event_weight'] * (factor_per_node[node]/factorsum) * (N/sumweight_per_node[node])
            sum_learning_weight = train_all[train_all['tag']==node]['learning_weight'].sum()
            if not parameters.crossvalidation:
                sum_learning_weight += test_all[test_all['tag']==node]['learning_weight'].sum()
            logging.info('\t -> After equalization  : {:.2e} (factor {:.2e})'.format(sum_learning_weight,sum_learning_weight/sum_event_weight))
            totweight_per_node[node] = sum_learning_weight
        logging.info("Proportions are as follows")
        for node in totweight_per_node.keys():
            logging.info("\t Node {:6s} : total weight = {:.2e} [{:3.2f}%]".format(node,totweight_per_node[node],totweight_per_node[node]/sum(totweight_per_node.values())*100))

        # Parameterized learning reweighting #
        if parametric:
            params = pd.unique(train_all['param'])
            sum_weight_params = {param:train_all[train_all['param']==param]['learning_weight'].sum() for param in params}
            total_weight = train_all['learning_weight'].sum()
            logging.info("Parameterized learning reweighting")
            for param in sorted(params):
                sum_weight_param = train_all[train_all['param']==param]['learning_weight'].sum()
                train_all.loc[train_all['param']==param,'learning_weight'] *= total_weight/sum_weight_param 
                logging.info('\t Param {:8s} : {:8d} events - total weight = {:.2e} -> {:.2e} [factor = {:3.3f}]'.format(str(param),train_all.loc[train_all['param']==param,'learning_weight'].shape[0],sum_weight_param,train_all[train_all['param']==param]['learning_weight'].sum(),total_weight/sum_weight_param))
            train_all['learning_weight'] *= N/train_all['learning_weight'].sum() # Rescale to N events

        # Quantile correction #
        if opt.scan != '':
            quantile_lim = train_all['learning_weight'].quantile(parameters.quantile)
            idx_to_repeat = train_all['learning_weight'] >= quantile_lim
            events_excess = train_all[idx_to_repeat]

            logging.info("{} events have learning weight above the {:0.2f} quantile at {:3f} (compared to mean {:3f})".format(events_excess.shape[0],parameters.quantile,quantile_lim,train_all['learning_weight'].mean()))
            logging.info("-> These events will be repeated and their learning weights reduced accordingly to avoid unstability :")
            tags = sorted(pd.unique(train_all['tag']))
            prop_tag = {tag:(train_all[train_all['tag']==tag].shape[0],train_all[train_all['tag']==tag]['learning_weight'].sum()) for tag in tags}
            for tag in tags:
                events_excess_tag = events_excess[events_excess['tag']==tag]['learning_weight']
                logging.info("\t{:6s} class : {:8d} events in [{:8.2f} : {:8.2f}]".format(tag,events_excess_tag.shape[0],events_excess_tag.min(),events_excess_tag.max()))

            factor = (events_excess['learning_weight']/quantile_lim).values.astype(np.int32)
            train_all.loc[idx_to_repeat,'learning_weight'] /= factor
            arr_to_repeat = train_all[idx_to_repeat].values
            repetition = np.repeat(np.arange(arr_to_repeat.shape[0]), factor-1)
            df_repeated = pd.DataFrame(np.take(arr_to_repeat,repetition,axis=0),columns=train_all.columns)
            df_repeated = df_repeated.astype(train_all.dtypes.to_dict()) # otherwise dtypes are object
            train_all = pd.concat((train_all,df_repeated),axis=0,ignore_index=True).sample(frac=1)
            logging.info("Effect of the repetitions :")
            for tag in tags:
                logging.info("\t{:6s} class : {:8d} events [learning weights = {:12.2f}] -> {:8d} events [learning weights = {:12.2f}]".format(tag,*prop_tag[tag],train_all[train_all['tag']==tag].shape[0],train_all[train_all['tag']==tag]['learning_weight'].sum()))
                
            logging.info("")


        # Plot learning weights in terminal #
        if opt.scan != '':
            for node in ["all"] + parameters.nodes:
                logging.info('Class {}'.format(node))
                height = 10
                bins = 100
                if node == 'all':
                    content = train_all['learning_weight']
                else:
                    content = train_all[train_all[node]==1]['learning_weight']
                x_max = content.max()
                y_max = np.histogram(content.array,bins=bins)[0].max()
                base = len(str(int(y_max)))
                plot = plotille.histogram(X         = content,
                                          bins      = bins,
                                          X_label   = "Learning weight",
                                          Y_label   = "Events",
                                          color_mode= 'byte',
                                          lc        = 25,
                                          x_min     = 0.,
                                          x_max     = x_max + (bins - x_max % bins),
                                          y_min     = 0.,
                                          y_max     = math.ceil(y_max / 10**(base-2))*10**(base-2),
                                          height    = height,
                                          width     = 100)

                for line in plot.split('\n'):
                    logging.info(line)
                logging.info('')

        # Check of batch content #
        if opt.scan != '':
            n_trials = 20
            for batch_size in parameters.p['batch_size']:
                logging.info("With a batch size of {} (average of {} trials)".format(batch_size,n_trials))
                tags = sorted(pd.unique(train_all['tag']))
                n_per_tag = {tag:[] for tag in tags}
                w_per_tag = {tag:[] for tag in tags}
                mw_per_tag = {tag:[] for tag in tags}
                sw_per_tag = {tag:[] for tag in tags}
                for _ in range(n_trials):
                    sample = train_all.sample(n=batch_size)
                    for tag in tags:
                        n_per_tag[tag].append(sample[sample['tag']==tag].shape[0])
                        w_per_tag[tag].append(sample[sample['tag']==tag]['learning_weight'].sum())
                        mw_per_tag[tag].append(sample[sample['tag']==tag]['learning_weight'].mean())
                        sw_per_tag[tag].append(sample[sample['tag']==tag]['learning_weight'].var())
                for tag in tags:
                    logging.info("\t{:6s} class : N = {:8.0f}, Sum of weights = {:12.5f} [{:3.2f}%], Mean of weights = {:10.5f}, Variance of weights = {:10.5f}".format(tag,
                            sum(n_per_tag[tag])/n_trials,
                            sum(w_per_tag[tag])/n_trials,
                            sum(w_per_tag[tag])/sum([sum(w_per_tag[t]) for t in tags])*100,
                            sum(mw_per_tag[tag])/n_trials,
                            sum(sw_per_tag[tag])/n_trials))
                    



    else:
        logging.info("You asked for generator so no data input has been done")
        list_samples = [os.path.join(sampleConfig['sampleDir'],sample) for era in parameters.eras for samples in sampleConfig['sampleDict'][int(era)].values() for sample in samples ]
        # Produce mask if not cross val #
        if not parameters.crossvalidation:
            logging.info("Will generate masks for each sample")
            GenerateSampleMasks(list_samples,parameters.suffix)
        # Produce scaler #
        MakeScaler(list_inputs  = list_inputs,
                   generator    = True,
                   batch        = 100000,
                   list_samples = list_samples,
                   additional_columns={'era':0.,'tag':''}) 
            
        train_all = None
        test_all = None

    list_inputs += [inp for inp in parameters.LBN_inputs if inp not in list_inputs]

    if opt.interactive:
        import IPython
        IPython.embed()


    #############################################################################################
    # DNN #
    #############################################################################################
    # Start the GPU monitoring thread #
    if opt.GPU:
        thread = utilizationGPU(print_time = 900,
                                print_current = False,
                                time_step=0.01)
        thread.start()

    if opt.scan != '':
        instance = HyperModel(opt.scan,list_inputs,list_outputs)
        if parameters.crossvalidation:
            modelIds = list(range(parameters.N_models))
            if opt.modelId != -1:
                for modelId in opt.modelId:
                    if modelId not in modelIds:
                        raise RuntimeError("You asked model id {} but only these ids are available : [".format(modelId)+','.join([str(m) for m in modelIds])+']')
                modelIds = opt.modelId
            for i in modelIds:
                logging.info("*"*80)
                logging.info("Starting training of model %d"%i)
                instance.HyperScan(data=train_all,
                                   task=opt.task,
                                   generator=opt.generator,
                                   resume=opt.resume,
                                   model_idx=i)
                instance.HyperDeploy(best='eval_error')
        else:
            instance.HyperScan(data=train_all,
                               task=opt.task,
                               generator=opt.generator,
                               resume=opt.resume)
            instance.HyperDeploy(best='eval_error')

        
    if len(opt.model) != 0: 
        # Make path #
        model_name = opt.model[0]
        if parameters.crossvalidation:
            if parameters.N_models != len(opt.model):
                raise RuntimeError('Cross validation requires %d models but you provided %d'%(parameters.N_models,len(opt.model)))
            model_name = model_name[:-1]
        path_output_train = os.path.join(parameters.path_out,model_name,'train')
        path_output_test = os.path.join(parameters.path_out,model_name,'test')
        if not os.path.exists(path_output_train):
            os.makedirs(path_output_train)
        if not os.path.exists(path_output_test):
            os.makedirs(path_output_test)

        # Instance of output class #
        inst_out = ProduceOutput(model=[os.path.join(parameters.main_path,'model',model) for model in opt.model],
                                 generator=opt.generator,
                                 list_inputs=list_inputs)

        # Use it on test samples #
        if opt.test:
            logging.info('  Processing test output sample  '.center(80,'*'))
            if parameters.crossvalidation: # in cross validation the testing set in inside the training DF
                inst_out.OutputFromTraining(data=train_all,path_output=path_output_test)
                inst_out.OutputFromTraining(data=train_all,path_output=path_output_train,crossval_use_training=True)
            else:
                inst_out.OutputFromTraining(data=test_all,path_output=path_output_test)
                inst_out.OutputFromTraining(data=train_all,path_output=path_output_train)
            logging.info('')
             
    if opt.GPU:
        # Closing monitor thread #
        thread.stopLoop()
        thread.join()
 def plot_histogram(frame_data: np.ndarray) -> str:
     return plotille.histogram(
         frame_data.flatten(), bins=40, width=78, height=25, x_min=0
     )
Exemplo n.º 11
0
def test_empty():
    x = histogram([])
    assert x == """\
Exemplo n.º 12
0
def test_single_value():
    x = histogram([0])
    assert x == """\