-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
4d48db9
commit fa4ef11
Showing
1 changed file
with
218 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,218 @@ | ||
{ | ||
"nbformat": 4, | ||
"nbformat_minor": 0, | ||
"metadata": { | ||
"colab": { | ||
"provenance": [], | ||
"collapsed_sections": [ | ||
"7874DDJAneOV" | ||
] | ||
}, | ||
"kernelspec": { | ||
"name": "python3", | ||
"display_name": "Python 3" | ||
}, | ||
"language_info": { | ||
"name": "python" | ||
} | ||
}, | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"# Automatic Derivation of Semantic Representations for Thai Serial Verb Constructions: A Grammar-Based Approach\n", | ||
"\n", | ||
"Vipasha Bansal\n", | ||
"\n", | ||
"ACL2024 SRW\n", | ||
"\n", | ||
"Abstract:\n", | ||
"\n", | ||
"> Deep semantic representations are useful for many NLU tasks (Droganova and Zeman, 2019; Schuster and Manning, 2016). Manual annotation to build these representations is time-consuming, and so automatic approaches are preferred (Droganova and Zeman, 2019; Bender et al. 2015). This paper demonstrates how rich semantic representations can be automatically derived for Thai Serial Verb Constructions (SVCs), where the semantic relationship between component verbs is not immediately clear from the surface forms. I present the first fully-implemented, unified analysis for Thai SVCs, deriving appropriate semantic representations (MRS; Copestake et al. 2005) from syntactic features, implemented within a DELPH-IN computational grammar (Slayden 2009). This analysis increases verified coverage of SVCs by 73% and decreases ambiguity by 46%.\n", | ||
"\n", | ||
"GitHub: https://github.com/VipashaB94/ThaiGrammar\n", | ||
"\n", | ||
"Paper: [Wait]\n", | ||
"\n", | ||
"\n", | ||
"\n", | ||
"---\n", | ||
"\n", | ||
"\n", | ||
"\n", | ||
"This notebook will guide you in running Thai semantic representations from the work.\n", | ||
"\n", | ||
"The notebook created by Wannaphong Phatthiyaphaibun, PyThaiNLP." | ||
], | ||
"metadata": { | ||
"id": "tG27M8_sn0xy" | ||
} | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"## Install\n", | ||
"\n", | ||
"Get latest ACE from http://sweaglesw.org/linguistics/ace/" | ||
], | ||
"metadata": { | ||
"id": "7874DDJAneOV" | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"metadata": { | ||
"colab": { | ||
"base_uri": "https://localhost:8080/" | ||
}, | ||
"id": "N4yiJEb-ms6w", | ||
"outputId": "261e926a-eacd-4649-cbee-f9e61476179a" | ||
}, | ||
"outputs": [ | ||
{ | ||
"output_type": "stream", | ||
"name": "stdout", | ||
"text": [ | ||
"--2024-08-12 12:50:35-- http://sweaglesw.org/linguistics/ace/download/ace-0.9.34-x86-64.tar.gz\n", | ||
"Resolving sweaglesw.org (sweaglesw.org)... 216.129.123.154, 2001:1868:a100:105:beae:c5ff:fe24:d767\n", | ||
"Connecting to sweaglesw.org (sweaglesw.org)|216.129.123.154|:80... connected.\n", | ||
"HTTP request sent, awaiting response... 200 OK\n", | ||
"Length: 2526613 (2.4M) [application/x-gzip]\n", | ||
"Saving to: ‘ace-0.9.34-x86-64.tar.gz’\n", | ||
"\n", | ||
"ace-0.9.34-x86-64.t 100%[===================>] 2.41M 4.37MB/s in 0.6s \n", | ||
"\n", | ||
"2024-08-12 12:50:36 (4.37 MB/s) - ‘ace-0.9.34-x86-64.tar.gz’ saved [2526613/2526613]\n", | ||
"\n", | ||
"ace-0.9.34/\n", | ||
"ace-0.9.34/LICENSE\n", | ||
"ace-0.9.34/post/\n", | ||
"ace-0.9.34/post/english-postagger.hmm\n", | ||
"ace-0.9.34/erg-files/\n", | ||
"ace-0.9.34/erg-files/config.tdl\n", | ||
"ace-0.9.34/erg-files/ace-erg-qc.txt\n", | ||
"ace-0.9.34/RELEASE-NOTES\n", | ||
"ace-0.9.34/ace\n", | ||
"ace-0.9.34/doc/\n", | ||
"ace-0.9.34/doc/config.wiki\n", | ||
"ace-0.9.34/doc/options.wiki\n", | ||
"Cloning into 'ThaiGrammar'...\n", | ||
"remote: Enumerating objects: 199, done.\u001b[K\n", | ||
"remote: Counting objects: 100% (199/199), done.\u001b[K\n", | ||
"remote: Compressing objects: 100% (160/160), done.\u001b[K\n", | ||
"remote: Total 199 (delta 66), reused 0 (delta 0), pack-reused 0\u001b[K\n", | ||
"Receiving objects: 100% (199/199), 1.39 MiB | 2.53 MiB/s, done.\n", | ||
"Resolving deltas: 100% (66/66), done.\n", | ||
" Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", | ||
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m186.8/186.8 kB\u001b[0m \u001b[31m2.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", | ||
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m43.4/43.4 kB\u001b[0m \u001b[31m1.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", | ||
"\u001b[?25h Building wheel for progress (setup.py) ... \u001b[?25l\u001b[?25hdone\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"!wget http://sweaglesw.org/linguistics/ace/download/ace-0.9.34-x86-64.tar.gz\n", | ||
"!mkdir run_ace\n", | ||
"!tar -xvzf ace-0.9.34-x86-64.tar.gz -C run_ace\n", | ||
"!git clone https://github.com/VipashaB94/ThaiGrammar.git\n", | ||
"!pip install -q pydelphin" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"## Usage\n", | ||
"\n", | ||
"We use ACE for delphin." | ||
], | ||
"metadata": { | ||
"id": "aXjtb9ftnsF1" | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"from delphin import ace" | ||
], | ||
"metadata": { | ||
"id": "fvE8146VnJKP" | ||
}, | ||
"execution_count": 2, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"ace.compile('./ThaiGrammar/thaigrammar/ace/config.tdl', 'thai.dat',executable=\"./run_ace/ace-0.9.34/ace\")" | ||
], | ||
"metadata": { | ||
"id": "-6opwCGVnTYH" | ||
}, | ||
"execution_count": 3, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"response = ace.parse('thai.dat', 'สุรี ไป ซื้อ หนังสือ',executable=\"./run_ace/ace-0.9.34/ace\")\n", | ||
"response['results']" | ||
], | ||
"metadata": { | ||
"colab": { | ||
"base_uri": "https://localhost:8080/" | ||
}, | ||
"id": "wlEN8Q-jnWQN", | ||
"outputId": "ff7c4502-df3a-4f5c-fab0-23a745d87b85" | ||
}, | ||
"execution_count": 4, | ||
"outputs": [ | ||
{ | ||
"output_type": "execute_result", | ||
"data": { | ||
"text/plain": [ | ||
"[{'result-id': 0,\n", | ||
" 'derivation': '(328 subj-head 0.000000 0 4 (322 bare-np 0.000000 0 1 (5 สุรี_33142 0.000000 0 1 (\"สุรี\"))) (327 head-comp 0.000000 1 4 (324 drop-obj 0.000000 1 2 (323 deic-purpose-trans-svc-lex 0.000000 1 2 (6 ไป_4158 0.000000 1 2 (\"ไป\")))) (326 head-comp 0.000000 2 4 (7 ซื้อ_4236 0.000000 2 3 (\"ซื้อ\")) (325 bare-np 0.000000 3 4 (8 หนังสือ_4404 0.000000 3 4 (\"หนังสือ\"))))))',\n", | ||
" 'mrs': '[ LTOP: h0 INDEX: e2 [ e SF: prop ] RELS: < [ named_rel<-1:-1> LBL: h4 CARG: \"สุรี\" ARG0: x3 ] [ \"exist_q_rel\"<-1:-1> LBL: h6 ARG0: x3 RSTR: h7 BODY: h8 ] [ \"_go_v_1_rel\"<-1:-1> LBL: h1 ARG0: e9 ARG1: x3 ARG2: x10 [ x COG-ST: type-id ] ] [ \"purpose_rel\"<-1:-1> LBL: h1 ARG0: e2 ARG1: e9 ARG2: e11 ] [ \"_buy_v_1_rel\"<-1:-1> LBL: h1 ARG0: e11 ARG1: x3 ARG2: x12 [ x PERS: 3 ] ] [ \"_book_n_1_rel\"<-1:-1> LBL: h13 ARG0: x12 ] [ \"exist_q_rel\"<-1:-1> LBL: h14 ARG0: x12 RSTR: h15 BODY: h16 ] > HCONS: < h0 qeq h1 h7 qeq h4 > ICONS: < > ]',\n", | ||
" 'tree': '(\"S\" (\"NP\" (\"N\" (\"สุรี\"))) (\"VP\" (\"V\" (\"V-M\" (\"V\" (\"ไป\")))) (\"VP\" (\"V\" (\"ซื้อ\")) (\"NP\" (\"N\" (\"หนังสือ\"))))))',\n", | ||
" 'flags': [(':ascore', 0.0), (':probability', 1.0)]}]" | ||
] | ||
}, | ||
"metadata": {}, | ||
"execution_count": 4 | ||
} | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"response = ace.parse('thai.dat', 'ผม จะ เป็น คน ดี',executable=\"./run_ace/ace-0.9.34/ace\")\n", | ||
"response['results']" | ||
], | ||
"metadata": { | ||
"colab": { | ||
"base_uri": "https://localhost:8080/" | ||
}, | ||
"id": "BD8ZQezQnbCM", | ||
"outputId": "a521789f-8e8a-419b-be14-6877ec1d1677" | ||
}, | ||
"execution_count": 5, | ||
"outputs": [ | ||
{ | ||
"output_type": "execute_result", | ||
"data": { | ||
"text/plain": [ | ||
"[{'result-id': 0,\n", | ||
" 'derivation': '(603 subj-head 0.000000 0 5 (598 bare-np 0.000000 0 1 (6 ผม_4375 0.000000 0 1 (\"ผม\"))) (602 head-comp 0.000000 1 5 (7 จะ_33089 0.000000 1 2 (\"จะ\")) (601 head-comp 0.000000 2 5 (8 เป็น_33088 0.000000 2 3 (\"เป็น\")) (600 bare-np 0.000000 3 5 (599 head-adj-int 0.000000 3 5 (12 คน_4133 0.000000 3 4 (\"คน\")) (13 ดี_4290 0.000000 4 5 (\"ดี\")))))))',\n", | ||
" 'mrs': '[ LTOP: h0 INDEX: e2 [ e TENSE: fut SF: prop ] RELS: < [ \"pron_rel\"<-1:-1> LBL: h4 ARG0: x3 [ x PERS: 1 NUM: sg GEND: m SPECI: + ] ] [ \"exist_q_rel\"<-1:-1> LBL: h5 ARG0: x3 RSTR: h6 BODY: h7 ] [ \"_be_v_id_rel\"<-1:-1> LBL: h1 ARG0: e2 ARG1: x3 ARG2: x8 [ x PERS: 3 ] ] [ \"_person_n_1_rel\"<-1:-1> LBL: h9 ARG0: x8 ] [ \"_good_a_1_rel\"<-1:-1> LBL: h9 ARG0: e10 ARG1: x8 ] [ \"exist_q_rel\"<-1:-1> LBL: h11 ARG0: x8 RSTR: h12 BODY: h13 ] > HCONS: < h0 qeq h1 h6 qeq h4 h12 qeq h9 > ICONS: < > ]',\n", | ||
" 'tree': '(\"S\" (\"NP\" (\"N\" (\"ผม\"))) (\"VP\" (\"V\" (\"จะ\")) (\"VP\" (\"V\" (\"เป็น\")) (\"NP\" (\"N\" (\"N\" (\"คน\")) (\"ADJ\" (\"ดี\")))))))',\n", | ||
" 'flags': [(':ascore', 0.0), (':probability', 1.0)]}]" | ||
] | ||
}, | ||
"metadata": {}, | ||
"execution_count": 5 | ||
} | ||
] | ||
} | ||
] | ||
} |