Skip to content

Commit 69ac675

Browse files
committed
Indirect realization of Chinese indexing and search support
Signed-off-by: ming300 <[email protected]>
1 parent 1a0a2b7 commit 69ac675

File tree

3 files changed

+79
-6
lines changed

3 files changed

+79
-6
lines changed

example/example_data.json

+58
Original file line numberDiff line numberDiff line change
@@ -3011,6 +3011,64 @@
30113011
"community_owned": false,
30123012
"title": "JavaScript Encode",
30133013
"body": "<p>Surfing on web i find Ext.Gantt plugin for ExtJS, that extension have a special encode. Anybody know how to encode like that or another complicated form. </p>\n\n<p><a href=\"http://www.ext-scheduler.com/js/sch-gantt-all-debug.js\" rel=\"nofollow\">Encoded Gantt Chart</a></p>\n"
3014+
},
3015+
{
3016+
"tags": [
3017+
"球门","两度","两","度","挽救","巴西","哥伦比亚","伦比","亚","轻松","晋级","今天","凌晨","世界杯","世界","杯","正式","进入","淘汰赛","淘汰","赛","的","争夺","在","率先","先进","进行","两场","场","1","8","决赛","中","东道主","东道","主","巴西队","队","通过","点球","大战","以","4","3","2","智利队","智利","惊险","八强","八","强","另一","一场","一","比赛","2-0","0","轻取","乌拉圭","乌拉","圭","与","会师","1-1","这是","本届","第一场","第一","也是","加时赛","和","第","18","分钟","分","钟","开出","角球","蒂","戈","席尔瓦","前","点","后","蹭","大卫","路易斯","路易","斯","后点","垫","射","破门","32","后场","失误","桑切斯","切","接","巴","尔","加","传球","低","扳平","比分","90","内","两队","战成","双方","均无","建树","最终","四","轮","威廉","踢","偏","浩","克","被","扑","皮","尼","利","均被","扑出","第五轮","第五","五","马尔","稳稳","命中","之后","出场","哈","拉","将","球","打中","中立","立柱","弹出","遗憾","失利","28","罗德里","德里","里格斯","里格","禁区","区外","胸部","停","转身","抽射","打出","一记","记","波","50","门前","包抄","梅开二度","二度","二","目前","他","5","个","进球","暂","列","射手榜","射手","手","榜首","首位","而","上届","金球奖","金球","奖得主","得主","弗","兰","则","告别"
3018+
],
3019+
"answer_count": 1,
3020+
"accepted_answer_id": 6410095,
3021+
"favorite_count": 1,
3022+
"question_timeline_url": "/questions/1111111/timeline",
3023+
"question_comments_url": "/questions/1111111/comments",
3024+
"question_answers_url": "/questions/1111111/answers",
3025+
"question_id": 1111111,
3026+
"owner": {
3027+
"user_id": 509789,
3028+
"user_type": "registered",
3029+
"display_name": "richardhell",
3030+
"reputation": 18,
3031+
"email_hash": "9e4f23b0072f4f7d3e2649e3e1a2816b"
3032+
},
3033+
"creation_date": 1308533799,
3034+
"last_edit_date": 1308535618,
3035+
"last_activity_date": 1308579900,
3036+
"up_vote_count": 1,
3037+
"down_vote_count": 0,
3038+
"view_count": 54,
3039+
"score": 1,
3040+
"community_owned": false,
3041+
"title": "球门两度挽救巴西 哥伦比亚轻松晋级",
3042+
"body": "<h2>球门两度挽救巴西 哥伦比亚轻松晋级<h2><br/><p>今天凌晨,巴西世界杯正式进入淘汰赛的争夺。在率先进行的两场1/8决赛中,东道主巴西队通过点球大战,以4:3(点球3:2)淘汰智利队,惊险晋级八强。另一场比赛,哥伦比亚队2-0轻取乌拉圭队,与巴西队会师1/4决赛。<br/>巴西1-1智利(点球3:2)。这是本届世界杯第一场淘汰赛,也是第一场加时赛和点球大战。第18分钟,巴西队开出角球,蒂亚戈·席尔瓦前点后蹭,大卫·路易斯后点垫射破门。第32分钟,巴西队后场失误,桑切斯接巴尔加斯的传球低射扳平比分。90分钟内两队战成1-1。加时赛双方均无建树,最终进入点球大战。<br/>前四轮中,巴西队威廉点球踢偏,浩克点球被扑,智利队皮尼利亚和桑切斯的点球均被扑出。第五轮,内马尔稳稳命中,之后出场的哈拉将球打中立柱弹出,智利队遗憾失利。<br/>哥伦比亚2-0乌拉圭。第28分钟,罗德里格斯禁区外胸部停球转身抽射打出一记世界波。第50分钟,罗德里格斯门前包抄梅开二度。目前,他以5个进球暂列射手榜首位,而上届世界杯金球奖得主弗兰则遗憾告别。</p>\n"
3043+
},
3044+
{
3045+
"tags": [
3046+
"巴西","世界杯","世界","杯","荷兰","惊天","逆转","哥斯达黎加","再创","历史","1","8","决赛","今天","凌晨","再","战","两场","两","场","2-1","2","墨西哥","通过","点球","以","总比分","总比","比分","6-4","6","4","5-3","5","3","淘汰","希腊","和","将在","中","相遇","荷兰队","队","防守","悍将","德","容","开场","9","分钟","分","钟","便","因","伤","被","换下","比赛","在下","下半场","下半","半场","掀起","高潮","第","50","多","斯","桑","托","禁区","区外","外突","突施冷箭","突施","冷箭","打破","僵局","尽管","赛后","被评为","评为","本场","最佳","的","门将","奥","乔","亚","用","自己","神奇","扑救","阻挡","挡了","弗莱","近距离","近距","距离","头球","罗","本","单刀","但","顽强","仍然在","仍然","在","最后","时刻","连","入","球","上演","绝杀","88","内","凌空","抽射","扳平","92","赢得","得点","他","将","让给","了","亨特","拉尔","后者","一蹴而就","一","蹴","而就","橙","衣","军团","惊险","晋级","八强","八","强","52","加队","队长","鲁","伊","右脚","推射","射球","球门","左下角","左下","下角","得手","手","后卫","对","这个","失球","毫无","反应","66","杜","阿尔特","累计","两张","张","黄牌","被罚","罚出","出场","让","少","打","一人","人","最终","停","补","时","阶段","由","帕","塔","索","普洛斯","门前","射","破门","加时赛","双方","均无","建树","大战","前","七个","七","个","4-3","领先","希腊队","第四个","第四","四个","四","耶","卡斯","扑出","随后","乌","马","尼亚","罚中","制胜","胜点"
3047+
],
3048+
"answer_count": 1,
3049+
"accepted_answer_id": 1111112,
3050+
"favorite_count": 1,
3051+
"question_timeline_url": "/questions/1111112/timeline",
3052+
"question_comments_url": "/questions/1111112/comments",
3053+
"question_answers_url": "/questions/1111112/answers",
3054+
"question_id": 1111112,
3055+
"owner": {
3056+
"user_id": 509789,
3057+
"user_type": "registered",
3058+
"display_name": "richardhell",
3059+
"reputation": 18,
3060+
"email_hash": "9e4f23b0072f4f7d3e2649e3e1a2816b"
3061+
},
3062+
"creation_date": 1308533799,
3063+
"last_edit_date": 1308535618,
3064+
"last_activity_date": 1308579900,
3065+
"up_vote_count": 1,
3066+
"down_vote_count": 0,
3067+
"view_count": 54,
3068+
"score": 1,
3069+
"community_owned": false,
3070+
"title": "巴西世界杯,荷兰惊天逆转 哥斯达黎加再创历史",
3071+
"body": "<h2>巴西世界杯,荷兰惊天逆转 哥斯达黎加再创历史<h2><br/><p>巴西世界杯1/8决赛今天凌晨再战两场。荷兰2-1逆转墨西哥,哥斯达黎加通过点球,以总比分6-4(点球5-3)淘汰希腊。荷兰和哥斯达黎加将在1/4决赛中相遇。<br/>荷兰2-1墨西哥。荷兰队防守悍将德容开场9分钟便因伤被换下。比赛在下半场掀起高潮。第50分钟,多斯桑托斯禁区外突施冷箭打破僵局。尽管赛后被评为本场最佳的墨西哥门将奥乔亚用自己神奇的扑救阻挡了德弗莱近距离的头球和罗本的单刀,但顽强的荷兰队仍然在最后时刻连入两球上演逆转绝杀。第88分钟,斯内德禁区外凌空抽射扳平比分!第92分钟,罗本赢得点球,他将点球让给了亨特拉尔,后者一蹴而就。橙衣军团惊险晋级八强。<br/>哥斯达黎加6-4希腊(点球5-3)。第52分钟,哥斯达黎加队长鲁伊斯右脚推射球门左下角得手,希腊门将和中后卫对这个失球毫无反应。但第66分钟,杜阿尔特累计两张黄牌被罚出场让哥斯达黎加少打一人。希腊最终在伤停补时阶段由帕帕斯塔索普洛斯门前补射破门扳平比分。<br/>加时赛双方均无建树。点球大战中,前七个点球哥斯达黎加4-3领先,希腊队第四个出场的耶卡斯点球被门将扑出!随后出场的乌马尼亚罚中制胜点球。哥斯达黎加晋级八强。</p>\n"
30143072
}
30153073
]
30163074
}

example/example_index.json

+1-1
Large diffs are not rendered by default.

lunr.js

+20-5
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,8 @@ lunr.EventEmitter.prototype.hasHandler = function (name) {
177177
* @returns {Array}
178178
*/
179179
lunr.tokenizer = function (obj) {
180-
if (!arguments.length || obj == null || obj == undefined) return []
180+
if (!arguments.length || obj == null || obj == undefined) return [];
181+
181182
if (Array.isArray(obj)) return obj.map(function (t) { return t.toLowerCase() })
182183

183184
var str = obj.toString().replace(/^\s+/, '')
@@ -188,12 +189,15 @@ lunr.tokenizer = function (obj) {
188189
break
189190
}
190191
}
192+
191193

192-
return str
193-
.split(/\s+/)
194+
var rs= str
195+
.split(/[\ |\~|\`|\!|\@|\#|\$|\%|\^|\&|\*|\uFE30-\uFFA0|\(|\)|\-|\_|\+|\=|\||\\|\[|\]|\{|\}|\;|\:|\"|\'|\,|\<|\.|\>|\/|\?]+/)
194196
.map(function (token) {
195-
return token.toLowerCase()
196-
})
197+
return token.replace(/[\ |\~|\`|\!|\@|\#|\$|\%|\^|\&|\*|\uFE30-\uFFA0|\(|\)|\-|\_|\+|\=|\||\\|\[|\]|\{|\}|\;|\:|\"|\'|\,|\<|\.|\>|\/|\?]/g, '').toLowerCase()
198+
});
199+
return rs;
200+
197201
}
198202
/*!
199203
* lunr.Pipeline
@@ -1649,11 +1653,22 @@ lunr.Pipeline.registerFunction(lunr.stopWordFilter, 'stopWordFilter')
16491653
* @see lunr.Pipeline
16501654
*/
16511655
lunr.trimmer = function (token) {
1656+
//by ming300 check token is chinese then not replace
1657+
if(isChineseChar(token)){
1658+
return token;
1659+
}
16521660
return token
16531661
.replace(/^\W+/, '')
16541662
.replace(/\W+$/, '')
16551663
}
16561664

1665+
/**
1666+
*check it contains Chinese (including Japanese and Korean)
1667+
*/
1668+
function isChineseChar(str){
1669+
var reg = /[\u4E00-\u9FA5\uF900-\uFA2D]/;
1670+
return reg.test(str);
1671+
}
16571672
lunr.Pipeline.registerFunction(lunr.trimmer, 'trimmer')
16581673
/*!
16591674
* lunr.stemmer

0 commit comments

Comments
 (0)