Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: search will stop when keyword in title(h1, h2, h3...), even the keyword also exists in content #1372

Closed
liuwenzhuang opened this issue Aug 30, 2024 · 7 comments · Fixed by #1380
Labels
🐞 bug Something isn't working

Comments

@liuwenzhuang
Copy link
Contributor

版本信息

System:
    OS: Linux 5.15 Ubuntu 20.04.3 LTS (Focal Fossa)
    CPU: (8) x64 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
    Memory: 10.97 GB / 15.62 GB
    Container: Yes
    Shell: 5.0.17 - /bin/bash
  npmPackages:
    rspress: ^1.28.0 => 1.28.2

问题详情

md 文档内容类似于:

# 标题

## 批量操作手续费

点击【批量操作手续费】,可以进行【新增】、【修改】、【删除】三项操作,点【新增】操作可以批量增加产品。

手续费设置界面,选择需设置的产品,配置对应比例,点击【保存】。【固定费率】如无特殊要求,建议优先选【否】,配置区间手续费。点击【确认新增】,完成批量新增操作。

保存是标题中没有的文案,多次出现的保存可以搜索到。

保存在这里。

## 二级标题

或者保存在这里。

当搜索 手续费 时,只能搜索到 标题 > 批量操作手续费
image
而搜索 “保存” 时,可以搜索到多个结果:
image

复现链接

https://codesandbox.io/p/devbox/naughty-darwin-pqq75j?workspaceId=edafb2d3-8406-4734-ac75-3017fd4fe3d8

复现步骤

  1. pnpm dev
  2. 浏览器打开 http://localhost:3000
  3. 点击搜索框,键入“手续费”

想要的结果是能够搜索标题中的 批量操作手续费 以及内容中的 手续费

@liuwenzhuang liuwenzhuang added the 🐞 bug Something isn't working label Aug 30, 2024
@Timeless0911
Copy link
Collaborator

Timeless0911 commented Aug 30, 2024

这可能跟 flexsearch 对于某些中文词汇和符号结合的分词在某些情况下存在缺陷有关,可以尝试使用 `` 代码块语法而不是【】,也可以检查构建后产生的索引 json 文件。 中文相关的搜索问题一般很难修复。

@AndersChuang
Copy link

但是如果把标题“批量操作手续费”删掉或者改成不包含“手续费”的标题,再次搜索“手续费”就有多个搜索结果了。

@liuwenzhuang
Copy link
Contributor Author

但是如果把标题“批量操作手续费”删掉或者改成不包含“手续费”的标题,再次搜索“手续费”就有多个搜索结果了。

是这样的

@Timeless0911

@Timeless0911
Copy link
Collaborator

我看了下代码,设计如此,同一区块内匹配到了标题后,没有必要再显示正文内容,这会导致内容的臃肿

// If we have matched header, we don't need to match content
// Because the header is already in the content
if (matchedHeader) {
return;
}

@liuwenzhuang
Copy link
Contributor Author

liuwenzhuang commented Aug 31, 2024

我看了下代码,设计如此,同一区块内匹配到了标题后,没有必要再显示正文内容,这会导致内容的臃肿

// If we have matched header, we don't need to match content
// Because the header is already in the content
if (matchedHeader) {
return;
}

但是在其他同级区块下的内容也搜索不到了。
@Timeless0911

@liuwenzhuang
Copy link
Contributor Author

个人感觉这里有两个问题:
一个是 #matchHeader 中对 toc 的遍历未完成,当前面的 header 匹配,后面的 header 未进行匹配;
另一个如果是为了防止同一区块的 header 和 content 同时匹配,应该在 #matchContent 或者最后根据 link 进行过滤,而不应该在匹配到了其中一个 header 后就不进行 content 的匹配了。

@Timeless0911
Copy link
Collaborator

Thanks for your detailed analysis, I carefully reviewed some related code just now, I think this should be fixed/enhanced. Are you interested in contributing a PR?

liuwenzhuang added a commit to liuwenzhuang/rspress that referenced this issue Sep 2, 2024
liuwenzhuang added a commit to liuwenzhuang/rspress that referenced this issue Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants