Skip to content

Commit 2c15b90

Browse files
committedOct 1, 2022
Greater independence from the commons app (CommonSpaces),
1 parent 0c81c8c commit 2c15b90

File tree

7 files changed

+62
-30
lines changed

7 files changed

+62
-30
lines changed
 

‎README.md

+24-15
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ Text-analysis support for *Django* clients, talking through HTTP API to an exten
33

44
This is a **Django** app implementing a repertoire of **Text Analysis** functions with general objectives of linguistic education, to be used in the context of both L1 and L2, by learners and teachers and by editors of text resources.
55

6-
*commons-textanalysis* currently supports, quite in similar way, **8 European languages**: English, Italian, Spanish, Greek, French, Portuguese, Croatian and Lithuanian.
6+
*commons-textanalysis* currently supports, quite in similar way, **8 European languages**: English, Italian, Spanish, Greek, French, Croatian, Danish and Lithuanian.
77
Since this largely depends on the availability of *spaCy* (statistical) *language models* and on other *language resources* needed by different text analysis methods, support of additional languages is expected to be added as those resources will be available. This, in turn, will depend on the the interest shown by users and contributors.
88

99
***Origin***
@@ -27,24 +27,33 @@ At present, CommonSpaces hosts also a few *mini-sites* dedicated to the communit
2727

2828
It also includes many **language resources**, mostly concerning specific languages, being available as *open data*. The role of these resources is to make the analysis methods, and often even the algorithms, able to work in a very similar way for different languages.
2929

30-
**Functionality**
30+
***Functionality***
3131

3232
Currently the following ouput views are implemented:
33-
1. <u>Keywords in Context</u> thanks to the exploitation of a function of **tmtoolkit** in *commons-language*;
34-
2. <u>ord lists by POS</u>; sorted lists are produced based on lexical resources concerning word frequencies and/or *CEFR* vocabulary levels;
35-
3. <u>Annotated text<u>, interactively showing individual attributes of the text *tokens* and the result of *Named Entity Recognition* (NER); at present it reuses some code of **NlpBuddy**;
36-
4. <u>Noun chunks</u> this comes directly from *spaCy*;
37-
5. <u>Text readability</u> this is a provisional view putting together some raw (shallow) text features - mainly counts and means -, lexical features and syntactic features, with the results of classical *readability formulas* also based on raw text features;
38-
6. <u>Text Cohesion</u> this view puts together *text coherence* scores computed with the *entity graph method* (Guinaudeau and Strube), as implemented by **TRUNAJOD**, with *local cohesion* scores based on the lexicon shared among contiguous paragraphs (visual detail is provided) and on *similarity scores* coming directly from spaCy;
39-
7. <u>Text Summarization</u> this is the result of a very simple extractive algorithm;
40-
8. <u>Text Analysis Dashboard</u> this is a tentative view putting together some results from 2, 3, 5 and 7; it also includes a sophisticated visualisation of the text structure derived with *dependency parsing*.
41-
42-
**Plans**
43-
44-
This package is work in progress; main activities planned are:
33+
1. *Keywords in Context*; thanks to the exploitation of a function of **tmtoolkit** in *commons-language*;
34+
2. *Word lists by POS*; sorted lists are produced based on lexical resources concerning word frequencies and/or *CEFR* vocabulary levels;
35+
3. *Annotated text*; interactively shows individual attributes of the text *tokens* and the result of *Named Entity Recognition* (NER); at present it reuses some code of **NlpBuddy**;
36+
4. *Noun chunks*; this comes directly from *spaCy*;
37+
5. *Text readability*; this is a provisional view putting together some raw (shallow) text features - mainly counts and means -, lexical features and syntactic features, with the results of classical *readability formulas* also based on raw text features;
38+
6. *Text Cohesion*; this view puts together *text coherence* scores computed with the *entity graph method* (Guinaudeau and Strube), as implemented by **TRUNAJOD**, with *local cohesion* scores based on the lexicon shared among contiguous paragraphs (visual detail is provided) and on *similarity scores* coming directly from spaCy;
39+
7. *Text Summarization*; this is the result of a very simple extractive algorithm;
40+
8. *Text Analysis Dashboard*; this is a tentative view putting together some results from 2, 3, 5 and 7; it also includes a sophisticated visualisation of the text structure derived with *dependency parsing*.
41+
42+
***Interfaces***
43+
44+
There are 3 levels of interfaces *commons-textanalysis*, In correspondence with the components of its architecture:
45+
- an **upward** interface utilizing the generic HTTP API exposed by the *commons-language service*;
46+
- the **interactive** (user) interface for selecting a text analysis function and executing it on the text inserted in the *input box* (possibly doing *copy-and-paste°), or specified through an URL;
47+
- a **downward** interface exposing application-level API through a list of *url patterns*, for the convenience of other applications, such as the collaborative learning platform *CommonSpaces*.
48+
49+
Moreover, *commons-textanalysis* acts as *pass-through* for a set of functions provided by *commons-language*, aimed at building and exploiting **corpora** of texts; here the term *corpus* is strictly related to the *DocBin* object type in *spaCy*, which "lets you efficiently serialize the information from a collection of *Doc* objects". Currently, said functionality isn't available in interactive way through *commons-textanalysis*, but is exploited only by *CommonSpaces*.
50+
51+
***Plans***
52+
53+
This package is *work in progress*; main activities planned are:
4554
- complete the restructuring of the software stack, in order to make *commons-textanalysis* completely independent from the software of the *Commons Platform*, of wich originally it was part;
4655
- document the API;
47-
- retrieve language resources allowing to enable additional languages;
56+
- retrieve and adapt language resources allowing to enable additional languages;
4857
- improve and extend the current functionality;
4958
- reorganize the output views to improve their usability;
5059
- clean up the code, also to make easier possible contributions.

‎textanalysis/templates/text_cohesion.html

+1-1
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@ <h3 class="text-center">{% trans "Text Analysis" %} - {% trans "Text cohesion" %
160160
$this = this;
161161
if (! this['obj_id'])
162162
this['obj_id'] = 0;
163-
fetch('/text_cohesion/'{% if file_key %}+this['file_key']+'/'{% endif %}{% if obj_type %}+this['obj_type']+'/'+this['obj_id']+'/'{% endif %}, {
163+
fetch('/textanalysis/cohesion/'{% if file_key %}+this['file_key']+'/'{% endif %}{% if obj_type %}+this['obj_type']+'/'+this['obj_id']+'/'{% endif %}, {
164164
method: 'GET',
165165
headers: {"X-Requested-With": "XMLHttpRequest"}, // this is just a patch
166166
})

‎textanalysis/templates/text_dashboard.html

+2-2
Original file line numberDiff line numberDiff line change
@@ -332,9 +332,9 @@ <h3 style="color: red; text-align: center; margin: 10px;">{{ error }}</h3>
332332
mounted: function () {
333333
$this = this;
334334
if (`{{ file_key }}`)
335-
url = `/text_dashboard/{{ file_key }}/`;
335+
url = `/textanalysis/text_dashboard/{{ file_key }}/`;
336336
else
337-
url = `/text_dashboard/{{ obj_type }}/`+encodeURIComponent(`{{ obj_id }}`);
337+
url = `/textanalysis/text_dashboard/{{ obj_type }}/`+encodeURIComponent(`{{ obj_id }}`);
338338
$.ajax({
339339
type:'GET',
340340
url: url,

‎textanalysis/templates/text_wordlists.html

+1-1
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,7 @@ <h4 style="color: red; text-align: center; margin: 10px;">{{ error }}</h4>
167167
$this = this;
168168
if (! this['obj_id'])
169169
this['obj_id'] = 0;
170-
fetch('/text_wordlists/'{% if file_key %}+this['file_key']+'/'{% endif %}{% if obj_type %}+this['obj_type']+'/'+this['obj_id']+'/'{% endif %}, {
170+
fetch('/textanalysis/wordlists/'{% if file_key %}+this['file_key']+'/'{% endif %}{% if obj_type %}+this['obj_type']+'/'+this['obj_id']+'/'{% endif %}, {
171171
method: 'GET',
172172
headers: {"X-Requested-With": "XMLHttpRequest"}, // this is just a patch
173173
})

‎textanalysis/urls.py

+25
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
from django.conf.urls import url
2+
3+
from textanalysis import views
4+
5+
urlpatterns = [
6+
url(r"^ta_input/$", views.ta_input, name="ta_input"),
7+
# wordlists and cohesion must be before parametric urls
8+
url(r"^wordlists/$", views.text_wordlists, name="text_wordlists_0"),
9+
url(r"^wordlists/(?P<file_key>[\w\d-]+)/$", views.text_wordlists, name="text_wordlists_1"),
10+
url(r"^wordlists/(?P<file_key>[\w\d-]+)/(?P<obj_type>[\w\.-]+)/(?P<obj_id>[\d-]+)/$", views.text_wordlists, name="text_wordlists_3"),
11+
url(r"^wordlists/(?P<obj_type>[\w\.-]+)/(?P<obj_id>[\d-]+)/$", views.text_wordlists, name="text_wordlists_2"),
12+
url(r"^cohesion/$", views.text_cohesion, name="text_cohesion_0"),
13+
url(r"^cohesion/(?P<file_key>[\w\d-]+)/$", views.text_cohesion, name="text_cohesion_1"),
14+
url(r"^cohesion/(?P<file_key>[\w\d-]+)/(?P<obj_type>[\w\.-]+)/(?P<obj_id>[\d-]+)/$", views.text_cohesion, name="text_cohesion_3"),
15+
url(r"^cohesion/(?P<obj_type>[\w\.-]+)/(?P<obj_id>[\d-]+)/$", views.text_cohesion, name="text_cohesion_2"),
16+
url(r"^text_dashboard/(?P<file_key>[\w\.-]+)/$", views.text_dashboard, name="text_dashboard_1"),
17+
url(r"^text_dashboard/(?P<obj_type>[\w\.-]+)/(?P<obj_id>[\d-]+)/$", views.text_dashboard, name="text_dashboard"),
18+
url(r"^text_dashboard/(?P<obj_type>[\w\.-]+)/(?P<obj_id>[\d]+)$", views.text_dashboard, name="text_dashboard_unterminated"),
19+
url(r"^text_dashboard/(?P<obj_type>[\w\.-]+)/(?P<obj_id>.+)$", views.text_dashboard, name="text_dashboard_by_url"),
20+
url(r"^text_dashboard/(?P<obj_type>[\w\.-]+)/(?P<obj_id>[\w\d-]+)/$", views.text_dashboard, name="text_dashboard_by_url"),
21+
#
22+
url(r"^(?P<function>[\w\.-]+)/(?P<file_key>[\w\.-]+)/$", views.ta, name="text_analysis_1"),
23+
url(r"^(?P<function>[\w\.-]+)/(?P<file_key>[\w\.-]+)/(?P<obj_type>[\w\.-]+)/(?P<obj_id>[\d-]+)/$", views.ta, name="text_analysis_3"),
24+
url(r"^(?P<function>[\w\.-]+)/(?P<obj_type>[\w\.-]+)/(?P<obj_id>[\d-]+)/$", views.ta, name="text_analysis_2"),
25+
]

‎textanalysis/views.py

+9-11
Original file line numberDiff line numberDiff line change
@@ -852,6 +852,7 @@ def ajax_delete_corpus(request):
852852
@csrf_exempt
853853
def text_wordlists(request, file_key='', obj_type='', obj_id=''):
854854
var_dict = {'file_key': file_key, 'obj_type': obj_type, 'obj_id': obj_id}
855+
var_dict['VUE'] = True
855856
# if request.is_ajax():
856857
if is_ajax(request):
857858
keys = ['verb_frequencies', 'noun_frequencies', 'adjective_frequencies', 'adverb_frequencies',
@@ -862,7 +863,6 @@ def text_wordlists(request, file_key='', obj_type='', obj_id=''):
862863
data.update([[key, dashboard_dict[key]] for key in keys])
863864
return JsonResponse(data)
864865
else:
865-
# return render(request, 'vue/text_wordlists.html', var_dict)
866866
return render(request, 'text_wordlists.html', var_dict)
867867

868868
"""
@@ -921,6 +921,7 @@ def text_annotations(request, params):
921921
@csrf_exempt
922922
def text_cohesion(request, file_key='', obj_type='', obj_id=''):
923923
var_dict = {'file_key': file_key, 'obj_type': obj_type, 'obj_id': obj_id}
924+
var_dict['VUE'] = True
924925
if is_ajax(request):
925926
keys = ['paragraphs', 'repeated_lemmas',
926927
'cohesion_by_entity_graph', 'cohesion_by_repetitions', 'cohesion_by_similarity',
@@ -1031,7 +1032,8 @@ def text_readability(request, params):
10311032
var_dict['readability_indexes']['gagatsis_1985'] = index
10321033
return render(request, 'text_readability.html', var_dict)
10331034

1034-
def text_analysis_input(request):
1035+
# def text_analysis_input(request):
1036+
def ta_input(request):
10351037
var_dict = {}
10361038
if request.POST:
10371039
form = TextAnalysisInputForm(request.POST)
@@ -1045,8 +1047,8 @@ def text_analysis_input(request):
10451047
# return render(request, 'vue/text_dashboard.html', var_dict)
10461048
return render(request, 'text_dashboard.html', var_dict)
10471049
else:
1048-
# return text_analyze(request, function, 'text', 0)
1049-
return text_analyze(request, function, obj_type='text', obj_id=0)
1050+
# return text_analyze(request, function, obj_type='text', obj_id=0)
1051+
return ta(request, function, obj_type='text', obj_id=0)
10501052
else:
10511053
# do not present the input form if the language server is down
10521054
endpoint = nlp_url + '/api/configuration'
@@ -1061,10 +1063,10 @@ def text_analysis_input(request):
10611063
var_dict['form'] = form
10621064
else:
10631065
var_dict['error'] = off_error
1064-
return render(request, 'text_analysis_input.html', var_dict)
1066+
return render(request, 'ta_input.html', var_dict)
10651067

1066-
# def text_analyze(request, function, obj_type, obj_id, file_key='', text=''):
1067-
def text_analyze(request, function, obj_type='', obj_id='', file_key='', text=''):
1068+
# def text_analyze(request, function, obj_type='', obj_id='', file_key='', text=''):
1069+
def ta(request, function, obj_type='', obj_id='', file_key='', text=''):
10681070
var_dict = { 'obj_type': obj_type, 'obj_id': obj_id, 'file_key': file_key, 'title': '' }
10691071
if file_key:
10701072
if obj_type == 'corpus':
@@ -1074,11 +1076,9 @@ def text_analyze(request, function, obj_type='', obj_id='', file_key='', text=''
10741076
if obj_type == 'text':
10751077
var_dict['obj_id'] = 0
10761078
if function == 'dashboard':
1077-
# return render(request, 'vue/text_dashboard.html', var_dict)
10781079
return render(request, 'text_dashboard.html', var_dict)
10791080
elif function == 'context':
10801081
var_dict['VUE'] = True
1081-
# return render(request, 'vue/context_dashboard.html', var_dict)
10821082
return render(request, 'context_dashboard.html', var_dict)
10831083
elif function == 'annotations':
10841084
var_dict['VUE'] = True
@@ -1091,12 +1091,10 @@ def text_analyze(request, function, obj_type='', obj_id='', file_key='', text=''
10911091
return text_readability(request, params=var_dict)
10921092
elif function == 'cohesion':
10931093
var_dict['VUE'] = True
1094-
# return text_cohesion(request, params=var_dict)
10951094
return render(request, 'text_cohesion.html', var_dict)
10961095
elif function == 'nounchunks':
10971096
var_dict['VUE'] = True
10981097
return text_nounchunks(request, params=var_dict)
10991098
elif function == 'wordlists':
11001099
var_dict['VUE'] = True
1101-
# return render(request, 'vue/text_wordlists.html', var_dict)
11021100
return render(request, 'text_wordlists.html', var_dict)

0 commit comments

Comments
 (0)
Please sign in to comment.