Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
  • Loading branch information
wannaphong authored Aug 12, 2024
1 parent 4d48db9 commit fa4ef11
Showing 1 changed file with 218 additions and 0 deletions.
218 changes: 218 additions & 0 deletions source/notebooks/thai_semantic_representations.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"collapsed_sections": [
"7874DDJAneOV"
]
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"source": [
"# Automatic Derivation of Semantic Representations for Thai Serial Verb Constructions: A Grammar-Based Approach\n",
"\n",
"Vipasha Bansal\n",
"\n",
"ACL2024 SRW\n",
"\n",
"Abstract:\n",
"\n",
"> Deep semantic representations are useful for many NLU tasks (Droganova and Zeman, 2019; Schuster and Manning, 2016). Manual annotation to build these representations is time-consuming, and so automatic approaches are preferred (Droganova and Zeman, 2019; Bender et al. 2015). This paper demonstrates how rich semantic representations can be automatically derived for Thai Serial Verb Constructions (SVCs), where the semantic relationship between component verbs is not immediately clear from the surface forms. I present the first fully-implemented, unified analysis for Thai SVCs, deriving appropriate semantic representations (MRS; Copestake et al. 2005) from syntactic features, implemented within a DELPH-IN computational grammar (Slayden 2009). This analysis increases verified coverage of SVCs by 73% and decreases ambiguity by 46%.\n",
"\n",
"GitHub: https://github.com/VipashaB94/ThaiGrammar\n",
"\n",
"Paper: [Wait]\n",
"\n",
"\n",
"\n",
"---\n",
"\n",
"\n",
"\n",
"This notebook will guide you in running Thai semantic representations from the work.\n",
"\n",
"The notebook created by Wannaphong Phatthiyaphaibun, PyThaiNLP."
],
"metadata": {
"id": "tG27M8_sn0xy"
}
},
{
"cell_type": "markdown",
"source": [
"## Install\n",
"\n",
"Get latest ACE from http://sweaglesw.org/linguistics/ace/"
],
"metadata": {
"id": "7874DDJAneOV"
}
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "N4yiJEb-ms6w",
"outputId": "261e926a-eacd-4649-cbee-f9e61476179a"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"--2024-08-12 12:50:35-- http://sweaglesw.org/linguistics/ace/download/ace-0.9.34-x86-64.tar.gz\n",
"Resolving sweaglesw.org (sweaglesw.org)... 216.129.123.154, 2001:1868:a100:105:beae:c5ff:fe24:d767\n",
"Connecting to sweaglesw.org (sweaglesw.org)|216.129.123.154|:80... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 2526613 (2.4M) [application/x-gzip]\n",
"Saving to: ‘ace-0.9.34-x86-64.tar.gz’\n",
"\n",
"ace-0.9.34-x86-64.t 100%[===================>] 2.41M 4.37MB/s in 0.6s \n",
"\n",
"2024-08-12 12:50:36 (4.37 MB/s) - ‘ace-0.9.34-x86-64.tar.gz’ saved [2526613/2526613]\n",
"\n",
"ace-0.9.34/\n",
"ace-0.9.34/LICENSE\n",
"ace-0.9.34/post/\n",
"ace-0.9.34/post/english-postagger.hmm\n",
"ace-0.9.34/erg-files/\n",
"ace-0.9.34/erg-files/config.tdl\n",
"ace-0.9.34/erg-files/ace-erg-qc.txt\n",
"ace-0.9.34/RELEASE-NOTES\n",
"ace-0.9.34/ace\n",
"ace-0.9.34/doc/\n",
"ace-0.9.34/doc/config.wiki\n",
"ace-0.9.34/doc/options.wiki\n",
"Cloning into 'ThaiGrammar'...\n",
"remote: Enumerating objects: 199, done.\u001b[K\n",
"remote: Counting objects: 100% (199/199), done.\u001b[K\n",
"remote: Compressing objects: 100% (160/160), done.\u001b[K\n",
"remote: Total 199 (delta 66), reused 0 (delta 0), pack-reused 0\u001b[K\n",
"Receiving objects: 100% (199/199), 1.39 MiB | 2.53 MiB/s, done.\n",
"Resolving deltas: 100% (66/66), done.\n",
" Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m186.8/186.8 kB\u001b[0m \u001b[31m2.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m43.4/43.4 kB\u001b[0m \u001b[31m1.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25h Building wheel for progress (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
]
}
],
"source": [
"!wget http://sweaglesw.org/linguistics/ace/download/ace-0.9.34-x86-64.tar.gz\n",
"!mkdir run_ace\n",
"!tar -xvzf ace-0.9.34-x86-64.tar.gz -C run_ace\n",
"!git clone https://github.com/VipashaB94/ThaiGrammar.git\n",
"!pip install -q pydelphin"
]
},
{
"cell_type": "markdown",
"source": [
"## Usage\n",
"\n",
"We use ACE for delphin."
],
"metadata": {
"id": "aXjtb9ftnsF1"
}
},
{
"cell_type": "code",
"source": [
"from delphin import ace"
],
"metadata": {
"id": "fvE8146VnJKP"
},
"execution_count": 2,
"outputs": []
},
{
"cell_type": "code",
"source": [
"ace.compile('./ThaiGrammar/thaigrammar/ace/config.tdl', 'thai.dat',executable=\"./run_ace/ace-0.9.34/ace\")"
],
"metadata": {
"id": "-6opwCGVnTYH"
},
"execution_count": 3,
"outputs": []
},
{
"cell_type": "code",
"source": [
"response = ace.parse('thai.dat', 'สุรี ไป ซื้อ หนังสือ',executable=\"./run_ace/ace-0.9.34/ace\")\n",
"response['results']"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "wlEN8Q-jnWQN",
"outputId": "ff7c4502-df3a-4f5c-fab0-23a745d87b85"
},
"execution_count": 4,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"[{'result-id': 0,\n",
" 'derivation': '(328 subj-head 0.000000 0 4 (322 bare-np 0.000000 0 1 (5 สุรี_33142 0.000000 0 1 (\"สุรี\"))) (327 head-comp 0.000000 1 4 (324 drop-obj 0.000000 1 2 (323 deic-purpose-trans-svc-lex 0.000000 1 2 (6 ไป_4158 0.000000 1 2 (\"ไป\")))) (326 head-comp 0.000000 2 4 (7 ซื้อ_4236 0.000000 2 3 (\"ซื้อ\")) (325 bare-np 0.000000 3 4 (8 หนังสือ_4404 0.000000 3 4 (\"หนังสือ\"))))))',\n",
" 'mrs': '[ LTOP: h0 INDEX: e2 [ e SF: prop ] RELS: < [ named_rel<-1:-1> LBL: h4 CARG: \"สุรี\" ARG0: x3 ] [ \"exist_q_rel\"<-1:-1> LBL: h6 ARG0: x3 RSTR: h7 BODY: h8 ] [ \"_go_v_1_rel\"<-1:-1> LBL: h1 ARG0: e9 ARG1: x3 ARG2: x10 [ x COG-ST: type-id ] ] [ \"purpose_rel\"<-1:-1> LBL: h1 ARG0: e2 ARG1: e9 ARG2: e11 ] [ \"_buy_v_1_rel\"<-1:-1> LBL: h1 ARG0: e11 ARG1: x3 ARG2: x12 [ x PERS: 3 ] ] [ \"_book_n_1_rel\"<-1:-1> LBL: h13 ARG0: x12 ] [ \"exist_q_rel\"<-1:-1> LBL: h14 ARG0: x12 RSTR: h15 BODY: h16 ] > HCONS: < h0 qeq h1 h7 qeq h4 > ICONS: < > ]',\n",
" 'tree': '(\"S\" (\"NP\" (\"N\" (\"สุรี\"))) (\"VP\" (\"V\" (\"V-M\" (\"V\" (\"ไป\")))) (\"VP\" (\"V\" (\"ซื้อ\")) (\"NP\" (\"N\" (\"หนังสือ\"))))))',\n",
" 'flags': [(':ascore', 0.0), (':probability', 1.0)]}]"
]
},
"metadata": {},
"execution_count": 4
}
]
},
{
"cell_type": "code",
"source": [
"response = ace.parse('thai.dat', 'ผม จะ เป็น คน ดี',executable=\"./run_ace/ace-0.9.34/ace\")\n",
"response['results']"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "BD8ZQezQnbCM",
"outputId": "a521789f-8e8a-419b-be14-6877ec1d1677"
},
"execution_count": 5,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"[{'result-id': 0,\n",
" 'derivation': '(603 subj-head 0.000000 0 5 (598 bare-np 0.000000 0 1 (6 ผม_4375 0.000000 0 1 (\"ผม\"))) (602 head-comp 0.000000 1 5 (7 จะ_33089 0.000000 1 2 (\"จะ\")) (601 head-comp 0.000000 2 5 (8 เป็น_33088 0.000000 2 3 (\"เป็น\")) (600 bare-np 0.000000 3 5 (599 head-adj-int 0.000000 3 5 (12 คน_4133 0.000000 3 4 (\"คน\")) (13 ดี_4290 0.000000 4 5 (\"ดี\")))))))',\n",
" 'mrs': '[ LTOP: h0 INDEX: e2 [ e TENSE: fut SF: prop ] RELS: < [ \"pron_rel\"<-1:-1> LBL: h4 ARG0: x3 [ x PERS: 1 NUM: sg GEND: m SPECI: + ] ] [ \"exist_q_rel\"<-1:-1> LBL: h5 ARG0: x3 RSTR: h6 BODY: h7 ] [ \"_be_v_id_rel\"<-1:-1> LBL: h1 ARG0: e2 ARG1: x3 ARG2: x8 [ x PERS: 3 ] ] [ \"_person_n_1_rel\"<-1:-1> LBL: h9 ARG0: x8 ] [ \"_good_a_1_rel\"<-1:-1> LBL: h9 ARG0: e10 ARG1: x8 ] [ \"exist_q_rel\"<-1:-1> LBL: h11 ARG0: x8 RSTR: h12 BODY: h13 ] > HCONS: < h0 qeq h1 h6 qeq h4 h12 qeq h9 > ICONS: < > ]',\n",
" 'tree': '(\"S\" (\"NP\" (\"N\" (\"ผม\"))) (\"VP\" (\"V\" (\"จะ\")) (\"VP\" (\"V\" (\"เป็น\")) (\"NP\" (\"N\" (\"N\" (\"คน\")) (\"ADJ\" (\"ดี\")))))))',\n",
" 'flags': [(':ascore', 0.0), (':probability', 1.0)]}]"
]
},
"metadata": {},
"execution_count": 5
}
]
}
]
}

0 comments on commit fa4ef11

Please sign in to comment.