Greater independence from the commons app (CommonSpaces),

gtoffoli · gtoffoli · commit 2c15b90b0c27 · 2022-10-01T17:16:40.000+02:00
diff --git a/README.md b/README.md
@@ -3,7 +3,7 @@ Text-analysis support for *Django* clients, talking through HTTP API to an exten
 
 This is a **Django** app implementing a repertoire of **Text Analysis** functions with general objectives of linguistic education, to be used in the context of both L1 and L2, by learners and teachers and by editors of text resources.
 
-*commons-textanalysis* currently supports, quite in similar way, **8 European languages**: English, Italian, Spanish, Greek, French, Portuguese, Croatian and Lithuanian.
+*commons-textanalysis* currently supports, quite in similar way, **8 European languages**: English, Italian, Spanish, Greek, French, Croatian, Danish and Lithuanian.
 Since this largely depends on the availability of *spaCy* (statistical) *language models* and on other *language resources* needed by different text analysis methods, support of additional languages is expected to be added as those resources will be available. This, in turn, will depend on the the interest shown by users and contributors.
 
 ***Origin***
@@ -27,24 +27,33 @@ At present, CommonSpaces hosts also a few *mini-sites* dedicated to the communit
 
 It also includes many **language resources**, mostly concerning specific languages, being available as *open data*. The role of these resources is to make the analysis methods, and often even the algorithms, able to work in a very similar way for different languages.
 
-**Functionality**
+***Functionality***
 
 Currently the following ouput views are implemented:
-1. <u>Keywords in Context</u> thanks to the exploitation of a function of **tmtoolkit** in *commons-language*;
-2. <u>ord lists by POS</u>; sorted lists are produced based on lexical resources concerning word frequencies and/or *CEFR* vocabulary levels;
-3. <u>Annotated text<u>, interactively showing individual attributes of the text *tokens* and the result of *Named Entity Recognition* (NER); at present it reuses some code of **NlpBuddy**;
-4. <u>Noun chunks</u> this comes directly from *spaCy*;
-5. <u>Text readability</u> this is a provisional view putting together some raw (shallow) text features - mainly counts and means -, lexical features and syntactic features, with the results of classical *readability formulas* also based on raw text features;
-6. <u>Text Cohesion</u> this view puts together *text coherence* scores computed with the *entity graph method* (Guinaudeau and Strube), as implemented by **TRUNAJOD**, with *local cohesion* scores based on the lexicon shared among contiguous paragraphs (visual detail is provided) and on *similarity scores* coming directly from spaCy;
-7. <u>Text Summarization</u> this is the result of a very simple extractive algorithm;
-8. <u>Text Analysis Dashboard</u> this is a tentative view putting together some results from 2, 3, 5 and 7; it also includes a sophisticated visualisation of the text structure derived with *dependency parsing*.
-
-**Plans**
-
-This package is work in progress; main activities planned are:
+1. *Keywords in Context*; thanks to the exploitation of a function of **tmtoolkit** in *commons-language*;
+2. *Word lists by POS*; sorted lists are produced based on lexical resources concerning word frequencies and/or *CEFR* vocabulary levels;
+3. *Annotated text*; interactively shows individual attributes of the text *tokens* and the result of *Named Entity Recognition* (NER); at present it reuses some code of **NlpBuddy**;
+4. *Noun chunks*; this comes directly from *spaCy*;
+5. *Text readability*; this is a provisional view putting together some raw (shallow) text features - mainly counts and means -, lexical features and syntactic features, with the results of classical *readability formulas* also based on raw text features;
+6. *Text Cohesion*; this view puts together *text coherence* scores computed with the *entity graph method* (Guinaudeau and Strube), as implemented by **TRUNAJOD**, with *local cohesion* scores based on the lexicon shared among contiguous paragraphs (visual detail is provided) and on *similarity scores* coming directly from spaCy;
+7. *Text Summarization*; this is the result of a very simple extractive algorithm;
+8. *Text Analysis Dashboard*; this is a tentative view putting together some results from 2, 3, 5 and 7; it also includes a sophisticated visualisation of the text structure derived with *dependency parsing*.
+
+***Interfaces***
+
+There are 3 levels of interfaces *commons-textanalysis*, In correspondence with the components of its architecture:
+- an **upward** interface utilizing the generic HTTP API exposed by the *commons-language service*;
+- the **interactive** (user) interface for selecting a text analysis function and executing it on the text inserted in the *input box* (possibly doing *copy-and-paste°), or specified through an URL;
+- a **downward** interface exposing application-level API through a list of *url patterns*, for the convenience of other applications, such as the collaborative learning platform *CommonSpaces*.
+
+Moreover, *commons-textanalysis* acts as *pass-through* for a set of functions provided by *commons-language*, aimed at building and exploiting **corpora** of texts; here the term *corpus* is strictly related to the *DocBin* object type in *spaCy*, which "lets you efficiently serialize the information from a collection of *Doc* objects". Currently, said functionality isn't available in interactive way through *commons-textanalysis*, but is exploited only by *CommonSpaces*. 
+
+***Plans***
+
+This package is *work in progress*; main activities planned are:
 - complete the restructuring of the software stack, in order to make *commons-textanalysis* completely independent from the software of the *Commons Platform*, of wich originally it was part;
 - document the API;
-- retrieve language resources allowing to enable additional languages;
+- retrieve and adapt language resources allowing to enable additional languages;
 - improve and extend the current functionality;
 - reorganize the output views to improve their usability;
 - clean up the code, also to make easier possible contributions.
diff --git a/textanalysis/templates/ta_input.html b/textanalysis/templates/ta_input.html
diff --git a/textanalysis/templates/text_cohesion.html b/textanalysis/templates/text_cohesion.html
@@ -160,7 +160,7 @@ <h3 class="text-center">{% trans "Text Analysis" %} - {% trans "Text cohesion" %
           $this = this;
           if (! this['obj_id'])
             this['obj_id'] = 0;
-            fetch('/text_cohesion/'{% if file_key %}+this['file_key']+'/'{% endif %}{% if obj_type %}+this['obj_type']+'/'+this['obj_id']+'/'{% endif %}, {
+            fetch('/textanalysis/cohesion/'{% if file_key %}+this['file_key']+'/'{% endif %}{% if obj_type %}+this['obj_type']+'/'+this['obj_id']+'/'{% endif %}, {
             method: 'GET',
             headers: {"X-Requested-With": "XMLHttpRequest"}, // this is just a patch
           })
diff --git a/textanalysis/templates/text_dashboard.html b/textanalysis/templates/text_dashboard.html
@@ -332,9 +332,9 @@ <h3 style="color: red; text-align: center; margin: 10px;">{{ error }}</h3>
         mounted: function () {
           $this = this;
           if (`{{ file_key }}`)
-            url = `/text_dashboard/{{ file_key }}/`;
+            url = `/textanalysis/text_dashboard/{{ file_key }}/`;
           else
-            url = `/text_dashboard/{{ obj_type }}/`+encodeURIComponent(`{{ obj_id }}`);
+            url = `/textanalysis/text_dashboard/{{ obj_type }}/`+encodeURIComponent(`{{ obj_id }}`);
 		  $.ajax({
 	        type:'GET',
 	        url: url,
diff --git a/textanalysis/templates/text_wordlists.html b/textanalysis/templates/text_wordlists.html
@@ -167,7 +167,7 @@ <h4 style="color: red; text-align: center; margin: 10px;">{{ error }}</h4>
           $this = this;
           if (! this['obj_id'])
             this['obj_id'] = 0;
-          fetch('/text_wordlists/'{% if file_key %}+this['file_key']+'/'{% endif %}{% if obj_type %}+this['obj_type']+'/'+this['obj_id']+'/'{% endif %}, {
+          fetch('/textanalysis/wordlists/'{% if file_key %}+this['file_key']+'/'{% endif %}{% if obj_type %}+this['obj_type']+'/'+this['obj_id']+'/'{% endif %}, {
             method: 'GET',
             headers: {"X-Requested-With": "XMLHttpRequest"}, // this is just a patch
           })
diff --git a/textanalysis/urls.py b/textanalysis/urls.py
@@ -0,0 +1,25 @@
+from django.conf.urls import url
+
+from textanalysis import views
+
+urlpatterns = [
+    url(r"^ta_input/$", views.ta_input, name="ta_input"),
+    # wordlists and cohesion must be before parametric urls
+    url(r"^wordlists/$", views.text_wordlists, name="text_wordlists_0"),
+    url(r"^wordlists/(?P<file_key>[\w\d-]+)/$", views.text_wordlists, name="text_wordlists_1"),
+    url(r"^wordlists/(?P<file_key>[\w\d-]+)/(?P<obj_type>[\w\.-]+)/(?P<obj_id>[\d-]+)/$", views.text_wordlists, name="text_wordlists_3"),
+    url(r"^wordlists/(?P<obj_type>[\w\.-]+)/(?P<obj_id>[\d-]+)/$", views.text_wordlists, name="text_wordlists_2"),
+    url(r"^cohesion/$", views.text_cohesion, name="text_cohesion_0"),
+    url(r"^cohesion/(?P<file_key>[\w\d-]+)/$", views.text_cohesion, name="text_cohesion_1"),
+    url(r"^cohesion/(?P<file_key>[\w\d-]+)/(?P<obj_type>[\w\.-]+)/(?P<obj_id>[\d-]+)/$", views.text_cohesion, name="text_cohesion_3"),
+    url(r"^cohesion/(?P<obj_type>[\w\.-]+)/(?P<obj_id>[\d-]+)/$", views.text_cohesion, name="text_cohesion_2"),
+    url(r"^text_dashboard/(?P<file_key>[\w\.-]+)/$", views.text_dashboard, name="text_dashboard_1"),
+    url(r"^text_dashboard/(?P<obj_type>[\w\.-]+)/(?P<obj_id>[\d-]+)/$", views.text_dashboard, name="text_dashboard"),
+    url(r"^text_dashboard/(?P<obj_type>[\w\.-]+)/(?P<obj_id>[\d]+)$", views.text_dashboard, name="text_dashboard_unterminated"),
+    url(r"^text_dashboard/(?P<obj_type>[\w\.-]+)/(?P<obj_id>.+)$", views.text_dashboard, name="text_dashboard_by_url"),
+    url(r"^text_dashboard/(?P<obj_type>[\w\.-]+)/(?P<obj_id>[\w\d-]+)/$", views.text_dashboard, name="text_dashboard_by_url"),
+    #
+    url(r"^(?P<function>[\w\.-]+)/(?P<file_key>[\w\.-]+)/$", views.ta, name="text_analysis_1"),
+    url(r"^(?P<function>[\w\.-]+)/(?P<file_key>[\w\.-]+)/(?P<obj_type>[\w\.-]+)/(?P<obj_id>[\d-]+)/$", views.ta, name="text_analysis_3"),
+    url(r"^(?P<function>[\w\.-]+)/(?P<obj_type>[\w\.-]+)/(?P<obj_id>[\d-]+)/$", views.ta, name="text_analysis_2"),
+]
diff --git a/textanalysis/views.py b/textanalysis/views.py
@@ -852,6 +852,7 @@ def ajax_delete_corpus(request):
 @csrf_exempt
 def text_wordlists(request, file_key='', obj_type='', obj_id=''):
     var_dict = {'file_key': file_key, 'obj_type': obj_type, 'obj_id': obj_id}
+    var_dict['VUE'] = True
     # if request.is_ajax():
     if is_ajax(request):
         keys = ['verb_frequencies', 'noun_frequencies', 'adjective_frequencies', 'adverb_frequencies', 
@@ -862,7 +863,6 @@ def text_wordlists(request, file_key='', obj_type='', obj_id=''):
         data.update([[key, dashboard_dict[key]] for key in keys])
         return JsonResponse(data)
     else:
-        # return render(request, 'vue/text_wordlists.html', var_dict)
         return render(request, 'text_wordlists.html', var_dict)
 
 """
@@ -921,6 +921,7 @@ def text_annotations(request, params):
 @csrf_exempt
 def text_cohesion(request, file_key='', obj_type='', obj_id=''):
     var_dict = {'file_key': file_key, 'obj_type': obj_type, 'obj_id': obj_id}
+    var_dict['VUE'] = True
     if is_ajax(request):
         keys = ['paragraphs', 'repeated_lemmas',
                 'cohesion_by_entity_graph', 'cohesion_by_repetitions', 'cohesion_by_similarity', 
@@ -1031,7 +1032,8 @@ def text_readability(request, params):
         var_dict['readability_indexes']['gagatsis_1985'] = index
     return render(request, 'text_readability.html', var_dict)
 
-def text_analysis_input(request):
+# def text_analysis_input(request):
+def ta_input(request):
     var_dict = {}
     if request.POST:
         form = TextAnalysisInputForm(request.POST)
@@ -1045,8 +1047,8 @@ def text_analysis_input(request):
                 # return render(request, 'vue/text_dashboard.html', var_dict)
                 return render(request, 'text_dashboard.html', var_dict)
             else:
-                # return text_analyze(request, function, 'text', 0)
-                return text_analyze(request, function, obj_type='text', obj_id=0)
+                # return text_analyze(request, function, obj_type='text', obj_id=0)
+                return ta(request, function, obj_type='text', obj_id=0)
     else:
         # do not present the input form if the language server is down
         endpoint = nlp_url + '/api/configuration'
@@ -1061,10 +1063,10 @@ def text_analysis_input(request):
             var_dict['form'] = form
         else:
             var_dict['error'] = off_error
-    return render(request, 'text_analysis_input.html', var_dict)
+    return render(request, 'ta_input.html', var_dict)
 
-# def text_analyze(request, function, obj_type, obj_id, file_key='', text=''):
-def text_analyze(request, function, obj_type='', obj_id='', file_key='', text=''):
+# def text_analyze(request, function, obj_type='', obj_id='', file_key='', text=''):
+def ta(request, function, obj_type='', obj_id='', file_key='', text=''):
     var_dict = { 'obj_type': obj_type, 'obj_id': obj_id, 'file_key': file_key, 'title': '' }
     if file_key:
         if obj_type == 'corpus':
@@ -1074,11 +1076,9 @@ def text_analyze(request, function, obj_type='', obj_id='', file_key='', text=''
         if obj_type == 'text':
                 var_dict['obj_id'] = 0
     if function == 'dashboard':
-        # return render(request, 'vue/text_dashboard.html', var_dict)
         return render(request, 'text_dashboard.html', var_dict)
     elif function == 'context':
         var_dict['VUE'] = True
-        # return render(request, 'vue/context_dashboard.html', var_dict)
         return render(request, 'context_dashboard.html', var_dict)
     elif function == 'annotations':
         var_dict['VUE'] = True
@@ -1091,12 +1091,10 @@ def text_analyze(request, function, obj_type='', obj_id='', file_key='', text=''
         return text_readability(request, params=var_dict)
     elif function == 'cohesion':
         var_dict['VUE'] = True
-        # return text_cohesion(request, params=var_dict)
         return render(request, 'text_cohesion.html', var_dict)
     elif function == 'nounchunks':
         var_dict['VUE'] = True
         return text_nounchunks(request, params=var_dict)
     elif function == 'wordlists':
         var_dict['VUE'] = True
-        # return render(request, 'vue/text_wordlists.html', var_dict)
         return render(request, 'text_wordlists.html', var_dict)