TO-DO List

This document lists all the things I should look more carefully in the future.

If an element ID contains _ then the sneaking part in relation_matrix won't work.
Insert some sample files in input and output folders.
Update the code to support the last version of NLTK ParentedTree implementation.
The writers should use an xml library instead of writing strings to a file.
Installation script.
Implement an error measurement framework (in ManTIME class) to get statistics from the models.
Implement a shuffle method and cross-fold validation for the data.
Do I really need to load Stanford Core NLP everytime for every document? Once (the problem)[dasmith/stanford-corenlp-python#13] with long texts is solved I should switch to the new stanford-core-nlp.
Unit-test the code with a proper testing framework (py.test).
Comment the code: better and more verbosely using Google Commenting Style.

Done:

Make the code general with respect to different annotation standards for CRF (IO, BIO, WIO, WBIO, WBIOE, BIOE).
Can the same two objects be connected by two different types of temporal relations? No.
Can an event be anchored to two different MAKEINSTANCE tags? Yes. (not supported yet.)
Move the output folder up.
Complete the InTextEntity class development.
Implement features for the temporal relation extraction task looking at my notes from the literature.
Implement the classifier for Temporal Links.
Adapt the writers to output temporal links too.
Implement the feature extractor for Temporal Links.
Probably some variables in Document and Sentence objects can be deleted.
What's id_token in Word class?
Do we need EventInstance? Yes, we do.
In the attribute training phase, the multi-word expressions should be represented as one sample. The features will be merged according to the order of appearance.
add useful folder (models, output, buffer) in the Git repo
pickle the num2py arrays and remove the dependency
Activate the post processing pipeline.
Fix the logging messages (info, warning, debug)
The method search_subsequence is called many times. A more adequate ADT should be used.
Implement a HTML (CSS3) writer (timesheet.js, TimelineJS).
Make the features as lighter as possible (in terms of storage space).
show the #_files_processed/#files.
convert the gazetteers to Unicode.
Should the attribute data matrix be made of positive samples only?
Correct some morphological gazetteer features according to the English grammar. Are all the things called prepositions actually prepositions? (ask to Marilena Di Bari)
Implement the bufferisation at feature level.
Fix unicode-related bug at utilities.py:76.
Have a look at argparse ... it's not correct right now.
Filter out useless features such as female gazetters, male gazetters, US cities. (commented)
Look carefully at all the features and possibly cut them. (commented)
Instead of the settings.py file, use OS.ENVIRONS variable.
Implement the i2b2 reader.
Implement the i2b2 writer.
Implement a caching system for Stanford Core NLP.
Remove the output produced by CRF++ in the training phase.
Integrate (Norma)[https://github.com/filannim/timex-normaliser].
Introduce model folders instead of files.
Fix and connect the post-processing pipeline.
Attributes models should include identification feature (heavier but hopefully better).
Split identification models (TIMEXes and EVENTs).
CRF based attributes extraction.
There are some print statement somewhere (WARNING cases). I should use something more appropriate for them (log).
Remove the output produced from Stanford Parser in the stdout/stderr (if everything goes ok).
Implement AttributeDataMatrix writer.
Implement TempEval-3 writer.
Implement TempEval-3 reader.
Implement the classifier for events and timexes.
Implement the universal feature extractor for events and timexes.
Find documentation about how to comment the code so that nice Python-doc style web pages can be automatically generated.
Love ManTIME and refactor it!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TODO.md

TODO.md

TO-DO List

Files

TODO.md

Latest commit

History

TODO.md

File metadata and controls

TO-DO List