Viet-trie

Scrape all vietnamese words from VDict.com and construct a trie datastructure to store all of those words.
Utilize the constructed Trie to efficiently tokenize any Vietnamese sentences.

Test:

print(f"VietTrie.has_word(đàn bà) --> {VietTrie.has_word('đàn bà')}")
print(f"VietTrie.has_word(đàn ông) --> {VietTrie.has_word('đàn ông')}")
print(f"VietTrie.has_word(english) --> {VietTrie.has_word('english')}")
print(f"VietTrie.has_word(việt nam) --> {VietTrie.has_word('việt nam')}")
print(f"Extract words from this sentence: thiên nhiên việt nam rất là hùng vĩ -> {VietTrie.extract_words('thiên nhiên việt nam rất là hùng vĩ')}")
print(f"Extract words from this sentence: mày lúc nào cũng í a í ới nhức hết cả đầu -> {VietTrie.extract_words('mày lúc nào cũng í a í ới nhức hết cả đầu')}")
print(f"Extract words from this sentence: chạy chậm ì à ì ạch -> {VietTrie.extract_words('chạy chậm ì à ì ạch')}")

Output:

VietTrie.has_word(đàn bà) --> True
VietTrie.has_word(đàn ông) --> True
VietTrie.has_word(english) --> False
VietTrie.has_word(việt nam) --> True
Extract words from this sentence: thiên nhiên việt nam rất là hùng vĩ -> ['thiên nhiên', 'việt nam', 'rất', 'là', 'hùng vĩ']
Extract words from this sentence: mày lúc nào cũng í a í ới nhức hết cả đầu -> ['mày', 'lúc', 'nào', 'cũng', 'í a í ới', 'nhức', 'hết cả', 'đầu']
Extract words from this sentence: chạy chậm ì à ì ạch -> ['chạy', 'chậm', 'ì à ì ạch']

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
dependencies.txt		dependencies.txt
vdict.py		vdict.py
viet_trie.py		viet_trie.py
words.txt		words.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Viet-trie

Test:

About

Releases

Packages

Languages

vudung45/Viet-trie

Folders and files

Latest commit

History

Repository files navigation

Viet-trie

Test:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages