Beispiel #1
0
def aggregate_by_name(js):
    '''
    We take a json of the form {'expenses':[expenseJSON1,expenseJSON2,...]}, where
    expenseJSON1 is a dictionary of the form {'user':user1,'amount':amount1,
    'date':date1}, where user1 is a string for the user id, amount is an integer, and
    date1 is a date string of the form 'YYYY-MM-DD'.  We use 'merge_ls_dct_no_key'
    to delete the 'user' key from the embedded JSON's, since it's redundant.

    Here we go, line by line:

    (merge_ls_dct_no_key,_,'user'),

    We are taking a list of expenseJSON's, and building a dict from the 'user' field
    of these expenseJSON's, with a list of expenseJSON's associated with each user.
    
    However, unlike 'merge_ls_dct', this function deletes the 'user' key from the
    expenseJSON in the list, since it is already in the key. 

    describe_aggregation,

    We run 'describe_aggregation on the resulting dictionary.
    '''
    return p(
        js['expenses'],
        (merge_ls_dct_no_key, _, 'user'),
        describe_aggregation,
    )
Beispiel #2
0
def fib3(n):
    '''
    Finally, we have the most concise implementation.  _if(cond,expr) returns a switch
    dict:

    {cond:expr,
     'else':_}
    '''
    return p(n, _if(_ > 1, l(fib2, _ - 1) + (fib2, _ - 2)))
Beispiel #3
0
def fib3_opt(n):
    '''
    Finally, we have the most concise implementation.  _if(cond,expr) returns a switch
    dict:

    {cond:expr,
     'else':_}

    However, in this case, notice that we do not use a PypeVal epression l(fib2,-1).
    This is because the optimizer automatically encloses the first value of a binary
    operator into a PypeVal, which makes the code much cleaner.  
    '''
    return p(n, _if(_ > 1, (fib3_opt, _ - 1) + (fib3_opt, _ - 2)))
Beispiel #4
0
def train_classifier(texts, y):
    '''
    Here is a perfect example of the "feel it ... func it" philosophy:

    The pype call uses the function arguments and function body to specify 
    three variables, texts, a list of strings, y, a list of floats, and vectorizer,
    a scikit-learn object that vectorizes text.  This reiterates the adivce that you
    should use the function body and function arguments to declare your scope,
    whenever you can.  

    Line-by-line, here we go:

    {'vectorizer':vectorizer.fit,
     'X':vectorizer.transform},

    We build a dict, the first element of which is the fit vectorizer.  Luckily, the
    'fit' function returns an instance of the trained vectorizer, so we do not need to
    use _do.  This vectorizer is then assigned to 'vectorizer'.  Because iterating
    through dictionaries in Python3.6 preserves the order of the keys in which they 
    were declared, we can apply the fit function to the vectorizer on the texts, 
    assign that to the 'vectorizer' key.  We need this instance of the vectorizer to
    run the classifier for unknown texts.

    After this, we apply the 'transform' to convert the texts into a training matrix
    keyed by 'X', whose rows are texts and whose columns are words. 

    _a('classifier',(Classifier().fit,_['X'],y)),

    Finally, we can build a classifier.  _a, or _assoc, means we are adding a 
    key-value pair to the previous dictionary.  This will be a new instance of our
    Classifier, which is trained through the fit function on the text-word matrix 'X'
    and the labels vector y.

    _d('X'),

    Since we don't need the X matrix anymore, we delete it from the returned JSON,
    which now only contains 'vectorizer' and 'classifier', the two things we will
    need to classify unknown texts.
    '''
    vectorizer = Vectorizer()

    return p(
        texts,
        {
            'vectorizer': vectorizer.fit,
            'X': vectorizer.transform
        },
        _a('classifier', (Classifier().fit, _['X'], y)),
        _d('X'),
    )
Beispiel #5
0
def sum_by_month(js):
    '''
    Line-by-line:

    _['expenses'],

    Extract the 'expenses' field from the JSON.

    [_a('month',(date_string_to_month_string,_['date']))],

    Because it is a list, add a string for month.  These date-string-to-month-string
    mappings are cached in pype.time_helpers.  If the date string is '2019-02-15',
    its month string is '2019-02-01'.  

    This gets around the object creation necessary for datetime parsing, which becomes
    a performance bottleneck for large data volume.

    (merge_ls_dct_no_key,_,'month'),

    Aggregate the JSON's by month, eliminate the 'month' key for each JSON.

    [[_['amount']]],

    The value is a dictionary keyed by the month string, with lists of JSON's as
    values.  Because each JSON contains an integer amount, we iterate through each
    list and extract this integer amount.  The result is a JSON keyed by month strings,
    with lists of integers as values. 

    [sum],

    Now, we sum these values.  
    '''
    return p(
        js,
        _['expenses'],
        [_a('month', (date_string_to_month_string, _['date']))],
        (merge_ls_dct_no_key, _, 'month'),
        [[_['amount']]],
        [sum],
    )
Beispiel #6
0
def fib2(n):
    '''
    Notice here we have a more concise description, using the l helper from pype.vals.
    We use this because we want the expression in 'else' to be passed over by the 
    Python interpreter, but evaluated by the pype interpreter.  Therefore, we need 
    to turn this into a PypeVal.  As we know, PypeVals override their operators to 
    produce LamTups, which are hashable objects that contain a pype-interpretable
    expression.  For example, the expression:

    _ <= 1

    is evaluated by the interpreter as:

    L(<built-in function le>, G('_pype_mirror_',), 1)

    This is because _ is a PypeVal, whose operator is overridden to produce a LamTup.

    LamTups are hashable, so they can be used as set elements and dictionary keys.

    In the pype interpreter, when a LamTup is encountered, a function delam is run to 
    extract a lambda expression:

    delam(_ <= 1) => delam(L(<built-in function le>, G('_pype_mirror_',), 1)) =>
    (<built-in function le>, G('_pype_mirror_',), 1)

    We can see that the pype interpreter can now evaluate the final expression:

    p(0,_ <= 1) <=> p(0,(<built-in function le>, G('_pype_mirror_',), 1)) <=> True

    However, in the case of Lambda expressions, the double parentheses makes this
    very awkward: 

    v((fib2,_-1)) + (fib2,_-2)

    So the l helper builds a PypeVal which encloses this tuple:

    l(fib2,_-1) + (fib2,_-2)
    '''
    return p(n, {_ <= 1: _, 'else': l(fib2, _ - 1) + (fib2, _ - 2)})
Beispiel #7
0
def fib1(n):
    '''
    Here is an example of the recursive fibonacci sequence, and a demonstration of
    how to use dict builds as scopes for the succeeding expressions.
    
    Line-by-line:

    {_ <= 1:_,
     'else':_p({'fib1':(fib,_-1),
                'fib2':(fib,_-2)},
                _.fib1+_.fib2)}

    This is a switch dict, as we see because it has the 'else' key.  

    _ <= 1:_,

    If n is less or equal to 1, return n.

    'else':_p({'fib1':(fib,_-1),
               'fib2':(fib,_-2)},
               _.fib1+_.fib2)}

    _p means we are building an embedded pype.  The first dictionary assigns fib(n-1) to
    'fib1', and the second assigns fib(n-2) to 'fib2'.  

    _.fib1+_.fib2

    This adds the two values in the previous dict build.  
    '''
    return p(
        n, {
            _ <= 1:
            _,
            'else':
            _p({
                'fib1': (fib1, _ - 1),
                'fib2': (fib1, _ - 2)
            }, _.fib1 + _.fib2)
        })
Beispiel #8
0
    '''
    return p(
        js['expenses'],
        [_a('date', (parse, _['date']))],
        [_a('month', (get_month, _['date']))],
        [_d('date')],
        [_a('month', (date_to_str, _['month']))],
        (merge_ls_dct_no_key, _, 'user'),
        [(merge_ls_dct_no_key, _, 'month')],
        [describe_aggregation],
    )


if __name__ == '__main__':

    js = p(sys.argv[1], open, _.read, json.loads)
    '''
    For performance, pype doesn't strictly make the object immutable.  Therefore,
    if you want to run different functions on the same object, it is best to do
    a deepcopy beforehand.  
    '''
    js1 = deepcopy(js)
    js2 = deepcopy(js)
    js3 = deepcopy(js)

    print("Printing JS before Aggregation")

    print("Aggregating by Name")

    aggregatedByName = aggregate_by_name(js1)
Beispiel #9
0
def sum_by_month_numpy(js):
    '''
    Now, we throw numpy into the mix.  

    [_a('month_int',_p(_['date'],
                       date_string_to_month_string,
                       month_string_to_int))],

    Instead of adding a month string, we add a month integer, which is an integer
    derived from the string of concatenated year,month, and day values.  '2019-02-01'
    would be turned into 20190201.  This done by cache lookups in pype.time_helpers:

    _p(_['date'],
       date_string_to_month_string,
       month_string_to_int))

    date_string_to_month_string looks up the month string, as described above.
    month_string_to_int maps this month string to an integer, as described above.

    (zip_ls,[_['month_int']],[_['amount']]),

    We create a list of tuples containing the month integer and the amount.  Notice
    that we are using two maps on the JSON list - [_['month_int']] creates a list
    of month intetegers, and [_['amount']] creates a list of amounts. zip_ls is a 
    helper that takes the zip of two lists and converts the result into a list.  

    np.array

    Cast the list in an array with two columns, the first for the month integers and
    the second for amounts.
    
    aggregate_by_key

    This is a function from pype.numpy helpers which returns three things:

    1) A matrix whose i-th row is all the values corresponding with the i-th key.
       the matrix is padded with zeros so that lists of varying length can fit into
       it.  
    2) A list of keys, where the i-th key corresponds with the i-th row of (1).
    3) A list of counts for the keys, where the i-th count is the count of the i-th
       key.

    (zip,_p( _1,
            [int_to_month_string]),
            (sum_by_row,_0)),
    
    Notice we are using zip instead of zip_ls, because the following tup_dct can 
    take a zip object, whereas numpy.array requires an explicit list.

    _p( _1,
        [int_to_month_string]),

    _1 takes the second element in the tuple produced by aggregate_by_key, which is
    the month integers. [int_to_month_string] iterates through these keys and converts
    them backs to month strings.

    (sum_by_row,_0)

    _0 is the matrix containing all the amounts, with the i-th row corresponding
    with the i-th key in _1.  sum_by_row just sums the rows of this vector, so now
    we have a zip over tuples (month string, amount sum).  

    tup_dct

    Converts these tuples into a dictionary mapping the first element of the tuple
    to the second.
    '''
    return p(
        js,
        _['expenses'],
        [
            _a('month_int',
               _p(_['date'], date_string_to_month_string, month_string_to_int))
        ],
        (zip_ls, [_['month_int']], [_['amount']]),
        np.array,
        aggregate_by_key,
        (zip, _p(_1, [int_to_month_string]), (sum_by_row, _0)),
        tup_dct,
    )
Beispiel #10
0
            _a('month_int',
               _p(_['date'], date_string_to_month_string, month_string_to_int))
        ],
        (zip_ls, [_['month_int']], [_['amount']]),
        np.array,
        aggregate_by_key,
        (zip, _p(_1, [int_to_month_string]), (sum_by_row, _0)),
        tup_dct,
    )


if __name__ == '__main__':

    js = p(
        sys.argv[1],
        open,
        _.read,
        json.loads,
    )
    js1 = p(js, deepcopy, sum_by_month_imperative)
    js2 = p(js, deepcopy, sum_by_month)
    js3 = p(js, deepcopy, sum_by_month_numpy)

    print('Original JSON is:')
    pp.pprint(js)
    print('*' * 30)
    print('Output of imperative implementation is:')
    pp.pprint(js1)
    print('*' * 30)
    print('Output of pure pype implementation is:')
    pp.pprint(js2)
    print('*' * 30)
Beispiel #11
0
def classify():
    '''
    This is the function that is run on a JSON containing one field, 'texts', which
    is a list of strings.  This function will return a list of JSON's containing the
    label for that text given by the classifier (1 or -1), and the original text.
    Notice that, in this routing, we need access to 'texts' in (zip,_,texts).  

    Line-by-line:

    global MODEL

    We need this to refer to the model we trained at the initialization of the 
    microservice.  

    texts=request.get_json(force=True)['texts']

    This extracts the 'texts' list from the json embedded in the request.  

    MODEL['vectorizer'].transform,

    This uses the vectorizer to convert the list of strings in texts to a text-word
    matrix that can be fed into the classifier.

    MODEL['classifier'].predict,

    This runs the prediction on the text-word matrix, producing an array of 1's and
    -1's, where 1 indicates that the classification is positive (it is spam), and -1
    indicates that the classification is negative (it is not spam).

    (zip,_,texts),

    We know that the n-th label produced by the classifier is for the n-th string in
    texts, so we zip them together to produce an iterable of tuples (label,text).  

    [{'label':_0,
      'text':_1,
      'description':{_0 == 1: 'not spam',
                     'else':'spam'}}],

    Here, we are performing a mapping over the (label,text) tuples produced by the 
    zip.  For each tuple, we build a dictionary with three items.  The first is the
    label, which is numberic, either 1.0 or -1.0.  The second is the actual text
    string.  

    However, to help the user, we also include a description of what the label means:

    'description':{_0 == 1: 'not spam',
                   'else':'spam'}

    The value is a switch dict.  Since _0 is a Getter object, it overrides the == 
    operator to produce a LamTup, which Python will accept, but which the pype 
    interpreter will run as an expression.  _0 == 1 simply means, "the first element
    of the (label,text) tuple, label, is 1.  If this is true, 'description is set to
    'not spam'. Otherwise, it is set to 'spam'.  

    jsonify

    This just turns the resulting JSON, a list of dicitonaries, into something that can
    be returned to the client over HTTP.  
    '''
    global MODEL

    texts = request.get_json(force=True)['texts']

    return p(texts, MODEL['vectorizer'].transform, MODEL['classifier'].predict,
             (zip, _, texts), [{
                 'label': _0,
                 'text': _1,
                 'description': {
                     _0 == 1: 'not spam',
                     'else': 'spam'
                 }
             }], jsonify)