Skip to content

Modelling semantics from syntax in the Korean language

Notifications You must be signed in to change notification settings

choi-calvin/korean-som

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Modeling semantics from syntax in the Korean language

This is the repository containing the code used to produce the results in the paper "Modeling semantics from syntax in the Korean language". Transcripts from conversations between children and their parents in Korean are fed into a contextual self-organizing map to learn semantic representations of the Korean language.

Requirements and Installation

The scripts require Python 3 and Jupyter Notebook to run.

The Python libraries required by the scripts can be found in requirements.txt. These can be easily installed by running pip install -r requirements.txt within the Python environment of your choice.

In addition to the standard KoNLPy library, the scripts require the Korean MeCab segmentation library. Instructions for installing this can be found here.

Replicating the results

This section goes through the directory structure and files included in this repository.

Data

A copy of the data used in the paper is stored in Ko/. This was retrieved from

J. Jo & Ko, E.-S. (2018) Korean mothers attune the frequency and acoustic saliency of sound symbolic words to the linguistic maturity of their children, Frontiers in Psychology 9:2225, doi: 10.3389/fpsyg.2018.02225

at this link on March 19, 2021.

Tutorial

A notebook tutorial can be found in tutorial.ipynb. This notebook goes through the processes with which the contextual SOMs were trained on the Ko corpus, including data extraction, preprocessing, training, and evaluation.

Paper results

The exact code used to produce the results in the paper can be found in paper_results/. This directory includes contextual_som.py, an aggregated Python script of the cells in tutorial.ipynb; and main.ipynb, the exact notebook used to produce the results in the paper. Note that due to the randomness in the algorithm, re-running main.ipynb may produce slightly different outputs than in the paper.

Contributions

All code was written by Calvin Choi.

About

Modelling semantics from syntax in the Korean language

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published