Skip to content

patrickporter/vs-tm-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Very Simple TM Server

A moderately scalable translation memory server written in Python.

It includes a CherryPy wrapper serving methods to interact with the translation memory provider. The TM provider performs fuzzy matching via character-based Levenshtein distance, modified to allow custom scoring for replacements that represent merely changes in case (i.e. uppercase to lowercase and vice versa). By default, data is stored using an Sqlite db, but MySql is optionally supported. Data is loaded into session-based memory for faster searching. Sessions are managed via cookie header and authentication with usernames and passwords. Translation memories are assigned an 'owner' who can read, write, and delete the TM. The TMs can also be assigned a 'read group' and 'readwrite group' for other users to interact with them. Admin users can read, write, and delete all TMs.

NOTE: currently very experimental... In its current form it should only be used on a private LAN as it sends/receives all data unencrypted, including usernames and passwords. To run it on a public server, see http://cherrypy.readthedocs.org/en/latest/deploy.html#ssl-support regarding running CherryPy behind SSL. Some security risks have been addressed, for example protection against sql injection and against session fixation, but should probably be reviewed. Among other potential unaddressed security issues, it can currently serve a simple password form for login, but this form is not protected at all against potential XSS attacks.

Requirements:
Python 3.4
CherryPy
python-Levenshtein
MySql Server (optional)

Usage / API methods (GET or POST):

name:
    check_server_status
description:
    Checks which, if any, TMs have been loaded to memory, which is necessary for searching.
params:
    [none]
returns:
    JSON dict: {'status': ...}

name:
    list_tms
description:
    Lists the translation memory documents (TMX files) that have been imported into the database and are available for loading into memory and searching.
params:
    [none]
returns:
    JSON list of JSON dicts: [{'created_datetime': ..., 'can_read': ..., 'tm_id': ..., 'name': ..., 'can_write': ..., 'sourcelang': ..., 'targetlang': ..., 'orig_filename': ..., 'last_updated_datetime': ...} ... ]

name:
    load_tm
description:
    Loads data for a given translation memory document from DB to memory for faster searching. Returns an HTTP error if the tm_id in question does not exist.
params:
    tm_id
returns:
    JSON dict: {'status': ...}

name:
    search
description:
    Searches for exact and fuzzy matches, rates and ranks, returning in descending order of match %. threshold is the minimum match score to return. maxresults is the maximum number of results to return (0 means no max). casecost is the cost applied to replacements consisting of merely a case change in the Levenshtein distance calc: A casecost of less than one warps results in favor of strings with merely case differences.
params:
    searchtext
    threshold (default 0.75)
    maxresults (default 0, i.e. unlimited)
    casecost (default 0.2)
returns:
JSON dict: {'data': {'matches': [{'sourcetext': ..., 'targettext': ..., 'matchscore': ..., 'created_by': ..., 'created_date': ..., 'changed_by': ..., 'changed_date': ..., 'last_used_date': ...}, ...]}}

name:
    sync_memory_add_only
description:
    Updates the in-memory data for the session from the DB, adding new TUs only. No TUs will be deleted from the in-memory data, even if some of them have been deleted in the DB.
params:
    [none]
returns:
    JSON dict: {'status': ...}

name:
    sync_memory_add_delete
description:
    Updates the in-memory data for the session from the DB, adding new TUs and removing deleted TUs, if the DB contains any changes.
params:
    [none]
returns:
    JSON dict: {'status': ...}

name:
    save_in_memory_tms
description:
    Saves all the translation units currently in memory, from all TMs in memory, to one new translation memory in the DB.
params:
    tm_name
    sourcelang (default None)
    targetlang (default None)
returns:
    JSON dict: {'status': ...}

name:
    delete_tm
description:
    Permanently deletes all the data related to a previously-loaded translation memory document from the DB.
params:
    tm_id
returns:
    JSON dict: {'status': ...}

name:
    delete_tu
description:
    Permanently deletes a TU based on sourcetext/targettext pair from the specified TM.
params:
    tm_id
    source
    target
returns:
    JSON dict: {'status': ...}

name:
    add_or_update_tu
description:
    Adds or updates a source text/target text pair to the specified translation memory.
params:
    tm_id
    source
    target
    allow_multiple (default False)
    overwrite_with_new (default True)
returns:
    JSON dict: {'status': ...}

name:
    import_tmx
description:
    Starts an upload of a TMX file and then imports its info and translation units to the DB. Runs DB import asynchronously in a background thread and returns immediately after file upload is complete, indicating the status of 'currently loading'
params:
    file
    tm_name
returns:
    JSON dict: {'filename' : ..., 'content-type' : ..., 'status' : ...}

name:
    export_tmx
description:
    Retrieves the specified TM and exports it as a TMX file.
params:
    tm_id
returns:
    TMX file object

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages