Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICU stuff #115

Merged
merged 56 commits into from
Jun 30, 2021
Merged
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
3cda4e7
start making a normalization tool
mr-martian May 22, 2021
c9170d0
move normalization to a different repo, simpler ICU check
mr-martian May 26, 2021
dda9502
missed a line
mr-martian May 26, 2021
7a38b4e
makefile cleanup
mr-martian May 26, 2021
ad15367
the long march part 1
mr-martian May 27, 2021
44ef72a
the long march part 2
mr-martian Jun 1, 2021
b99672a
the long march part 3 (compiles, but tests fail)
mr-martian Jun 1, 2021
c7208ea
lt-comp seems to be working now
mr-martian Jun 1, 2021
2e9fac4
lt-print seems to be working
mr-martian Jun 1, 2021
0624071
lt-proc (unweighted) working
mr-martian Jun 2, 2021
3a6ab4d
all tests now pass
mr-martian Jun 2, 2021
ab3ecd0
add a non-BMP test
mr-martian Jun 2, 2021
1679bb5
add the file used in the test :p
mr-martian Jun 2, 2021
ac7867f
use utf-32 sometimes and some type cleanup
mr-martian Jun 3, 2021
b5d6e07
utf-32 in monodix and some type cleanup
mr-martian Jun 3, 2021
0eb748f
cleverness is to be avoided (investigating #85)
mr-martian Jun 3, 2021
3e4bc43
yet more type cleanup
mr-martian Jun 3, 2021
257d33c
finish eliminating wchar and make more use of helper functions
mr-martian Jun 3, 2021
eee5c37
no more need for windows compatibility header
mr-martian Jun 3, 2021
4bd6719
get python bindings to compile
mr-martian Jun 3, 2021
3f8cbb3
see if we can get the Travis tests working
mr-martian Jun 3, 2021
db59366
eliminate use of wide streams
mr-martian Jun 4, 2021
3a293af
extracting string constants
mr-martian Jun 4, 2021
d6c16f9
don't need the whole converter for 1 codepoint
mr-martian Jun 4, 2021
397a7f2
drop unused helpers, add copywrite headers, use _unlocked everywhere
mr-martian Jun 4, 2021
6ee59d5
typo
mr-martian Jun 4, 2021
8fca95e
blah
mr-martian Jun 4, 2021
e14b45b
ok fine, I'll put the tab back
mr-martian Jun 4, 2021
feb2f39
missed some bad casts
mr-martian Jun 4, 2021
2d2abd8
typo
mr-martian Jun 4, 2021
ce3eb90
my continuing battle with indentation and yaml
mr-martian Jun 4, 2021
8a7a0fc
.gitignore cleanup and darn it yaml!
mr-martian Jun 4, 2021
e317a78
assorted nits
mr-martian Jun 4, 2021
dd11193
helper functions for use in apertium
mr-martian Jun 6, 2021
43e225b
more helper stuff
mr-martian Jun 7, 2021
51b0651
add << UChar for newer g++ and switch << back to std::ostream
mr-martian Jun 10, 2021
d49b84b
another helper (not symmetric - should probably fix that)
mr-martian Jun 11, 2021
f7d6a4b
unbundle utfcpp
mr-martian Jun 11, 2021
e30103c
fix tests?
mr-martian Jun 11, 2021
76d0bd1
typo in package name
mr-martian Jun 11, 2021
7d9c359
try again
mr-martian Jun 11, 2021
5360b40
it helps to edit the right test file
mr-martian Jun 11, 2021
210c645
another helper (rather than define this in every repo)
mr-martian Jun 11, 2021
62bb1df
move string_utils into lttoolbox and casehandle better
mr-martian Jun 14, 2021
8d1b620
add caseless compare helper to replace (tolower(a) == tolower(b))
mr-martian Jun 14, 2021
3c17f4c
move xml iterator to lttoolbox
mr-martian Jun 14, 2021
b342dcb
another helper
mr-martian Jun 14, 2021
b86d375
Merge branch 'master' into icu
mr-martian Jun 15, 2021
129e45b
typo in merge
mr-martian Jun 15, 2021
04d5a87
incorporate optimizations from #114
mr-martian Jun 15, 2021
d423e9a
small bugs
mr-martian Jun 15, 2021
9b075d6
move constant initializers to header and make more use of helpers
mr-martian Jun 16, 2021
2cc8be3
make to_ustring() use unsigned chars
mr-martian Jun 17, 2021
a2acea4
version bump
mr-martian Jun 17, 2021
5bd42f2
final elimination of wide strings
mr-martian Jun 17, 2021
96bab35
InputFile block reading should respect null flush
mr-martian Jun 18, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -73,10 +73,12 @@
/lttoolbox/lt-expand
/python/Makefile
/python/Makefile.in
/python/lttoolbox.i
/python/lttoolbox_wrap.cpp
/python/lttoolbox.py
/python/setup.py
/python/build*
*.egg-info/
*.egg
**/.mypy_cache/
*~
5 changes: 5 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,11 @@ compiler:
- clang
- gcc

addons:
homebrew:
packages:
- icu4c
TinoDidriksen marked this conversation as resolved.
Show resolved Hide resolved

before_install:
- if [ $TRAVIS_OS_NAME = linux ]; then sudo apt-get install -y swig; else brew install swig; fi
script:
Expand Down
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ endif()
# Unlocked I/O functions
include(CheckSymbolExists)
set(CMAKE_REQUIRED_DEFINITIONS -D_POSIX_C_SOURCE=200112 -D_GNU_SOURCE)
foreach(func fread_unlocked fwrite_unlocked fgetc_unlocked fputc_unlocked fputs_unlocked fgetwc_unlocked fputwc_unlocked fputws_unlocked)
foreach(func fread_unlocked fwrite_unlocked fgetc_unlocked fputc_unlocked fputs_unlocked)
string(TOUPPER ${func} _uc)
CHECK_SYMBOL_EXISTS(${func} "stdio.h" HAVE_DECL_${_uc})
if(HAVE_DECL_${_uc})
Expand Down
27 changes: 3 additions & 24 deletions configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -38,29 +38,8 @@ AC_ARG_ENABLE(profile,
[CXXFLAGS="-pg -g -Wall"; CFLAGS="-pg -g -Wall"; LDFLAGS="-pg"])


PKG_CHECK_MODULES(LTTOOLBOX, [libxml-2.0 >= 2.6.17])

# Check for wide strings
AC_DEFUN([AC_CXX_WSTRING],[
AC_CACHE_CHECK(whether the compiler supports wide strings,
ac_cv_cxx_wstring,
[AC_LANG_SAVE
AC_LANG_CPLUSPLUS
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[#include <string>]],[[
std::wstring test = L"test";
]])],
[ac_cv_cxx_wstring=yes], [ac_cv_cxx_wstring=no])
AC_LANG_RESTORE
])
])

AC_CXX_WSTRING

if test "$ac_cv_cxx_wstring" = no
then
AC_MSG_ERROR([Missing wide string support])
fi

PKG_CHECK_MODULES(LIBXML, [libxml-2.0 >= 2.6.17])
PKG_CHECK_MODULES(ICU, [icu-i18n, icu-io, icu-uc])

# Checks for libraries.
AC_CHECK_LIB(xml2, xmlReaderForFile)
Expand All @@ -78,7 +57,7 @@ AC_TYPE_SIZE_T
AC_FUNC_ERROR_AT_LINE

AC_CHECK_DECLS([fread_unlocked, fwrite_unlocked, fgetc_unlocked, \
fputc_unlocked, fputs_unlocked, fgetwc_unlocked, fputwc_unlocked, fputws_unlocked, ungetwc_unlocked])
fputc_unlocked, fputs_unlocked])

AC_CHECK_FUNCS([setlocale strdup getopt_long])

Expand Down
1 change: 0 additions & 1 deletion lttoolbox/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,6 @@ if(WIN32)
win32/regex.c
win32/regex.h
win32/unistd.h
${CMAKE_SOURCE_DIR}/utf8/utf8_fwrap.h
${LIBLTTOOLBOX_SOURCES}
)
if(NOT VCPKG_TOOLCHAIN)
Expand Down
38 changes: 9 additions & 29 deletions lttoolbox/Makefile.am
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@

h_sources = alphabet.h att_compiler.h buffer.h compiler.h compression.h \
deserialiser.h entry_token.h expander.h fst_processor.h lt_locale.h \
ltstr.h match_exe.h match_node.h match_state.h my_stdio.h node.h \
deserialiser.h entry_token.h expander.h fst_processor.h input_file.h lt_locale.h \
match_exe.h match_node.h match_state.h my_stdio.h node.h \
pattern_list.h regexp_compiler.h serialiser.h sorted_vector.h state.h \
transducer.h trans_exe.h xml_parse_util.h exception.h tmx_compiler.h \
string_to_wostream.h
ustring.h
cc_sources = alphabet.cc att_compiler.cc compiler.cc compression.cc entry_token.cc \
expander.cc fst_processor.cc lt_locale.cc match_exe.cc \
expander.cc fst_processor.cc input_file.cc lt_locale.cc match_exe.cc \
match_node.cc match_state.cc node.cc pattern_list.cc \
regexp_compiler.cc sorted_vector.cc state.cc transducer.cc \
trans_exe.cc xml_parse_util.cc tmx_compiler.cc
trans_exe.cc xml_parse_util.cc tmx_compiler.cc ustring.cc

library_includedir = $(includedir)/$(PACKAGE_NAME)-$(VERSION_API)/$(PACKAGE_NAME)
library_include_HEADERS = $(h_sources)
Expand All @@ -27,33 +27,16 @@ lttoolboxlib = $(prefix)/lib

lttoolbox_DATA = dix.dtd dix.rng dix.rnc acx.rng xsd/dix.xsd xsd/acx.xsd

lt_print_SOURCES = lt_print.cc
lt_print_LDADD = liblttoolbox$(VERSION_MAJOR).la
lt_print_LDFLAGS = -llttoolbox$(VERSION_MAJOR) $(LTTOOLBOX_LIBS)
LDADD = liblttoolbox$(VERSION_MAJOR).la
AM_LDFLAGS = -llttoolbox$(VERSION_MAJOR) $(LIBXML_LIBS) $(ICU_LIBS)

lt_print_SOURCES = lt_print.cc
lt_trim_SOURCES = lt_trim.cc
lt_trim_LDADD = liblttoolbox$(VERSION_MAJOR).la
lt_trim_LDFLAGS = -llttoolbox$(VERSION_MAJOR) $(LTTOOLBOX_LIBS)

lt_comp_SOURCES = lt_comp.cc
lt_comp_LDADD = liblttoolbox$(VERSION_MAJOR).la
lt_comp_LDFLAGS = -llttoolbox$(VERSION_MAJOR) $(LTTOOLBOX_LIBS)

lt_proc_SOURCES = lt_proc.cc
lt_proc_LDADD = liblttoolbox$(VERSION_MAJOR).la
lt_proc_LDFLAGS = -llttoolbox$(VERSION_MAJOR) $(LTTOOLBOX_LIBS)

lt_expand_SOURCES = lt_expand.cc
lt_expand_LDADD = liblttoolbox$(VERSION_MAJOR).la
lt_expand_LDFLAGS = -llttoolbox$(VERSION_MAJOR) $(LTTOOLBOX_LIBS)

lt_tmxcomp_SOURCES = lt_tmxcomp.cc
lt_tmxcomp_LDADD = liblttoolbox$(VERSION_MAJOR).la
lt_tmxcomp_LDFLAGS = -llttoolbox$(VERSION_MAJOR) $(LTTOOLBOX_LIBS)

lt_tmxproc_SOURCES = lt_tmxproc.cc
lt_tmxproc_LDADD = liblttoolbox$(VERSION_MAJOR).la
lt_tmxproc_LDFLAGS = -llttoolbox$(VERSION_MAJOR) $(LTTOOLBOX_LIBS)

#lt-validate-dictionary: Makefile.am validate-header.sh
# @echo "Creating lt-validate-dictionary script"
Expand All @@ -67,10 +50,7 @@ lt_tmxproc_LDFLAGS = -llttoolbox$(VERSION_MAJOR) $(LTTOOLBOX_LIBS)

man_MANS = lt-comp.1 lt-expand.1 lt-proc.1 lt-tmxcomp.1 lt-tmxproc.1 lt-print.1 lt-trim.1

INCLUDES = -I$(top_srcdir) $(LTTOOLBOX_CFLAGS)
if WINDOWS
INCLUDES += -I$(top_srcdir)/utf8
endif
INCLUDES = -I$(top_srcdir) -I$(top_srcdir)/utf8 $(LIBXML_CFLAGS) $(ICU_CFLAGS)
CLEANFILES = *~

EXTRA_DIST = dix.dtd dix.rng dix.rnc acx.rng xsd/dix.xsd xsd/acx.xsd $(man_MANS)
Loading