Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application level Search Feedback | 应用级搜索反馈 #6482

Open
arvinxx opened this issue Feb 24, 2025 · 94 comments
Open

Application level Search Feedback | 应用级搜索反馈 #6482

arvinxx opened this issue Feb 24, 2025 · 94 comments

Comments

@arvinxx
Copy link
Contributor

arvinxx commented Feb 24, 2025

1.64.0 We have supported application-level networking features through SearchXNG. We welcome everyone to provide feedback on their experience and suggestions.

Set environment variable: SEARXNG_URL=https://searxng-instance.com

Image

There is a searchXNG one-click startup template on Zeabur: https://zeabur.com/templates/77FSH6


1.64.0 通过 SearchXNG 我们支持了应用级联网功能,欢迎大家反馈使用体验和建议。

配置环境变量:SEARXNG_URL=https://searxng-instance.com

Zeabur 上有 searchXNG 的一键启动模板:https://zeabur.com/templates/77FSH6

@arvinxx arvinxx pinned this issue Feb 24, 2025
@lobehub lobehub deleted a comment from lobehubbot Feb 24, 2025
@130aac8
Copy link

130aac8 commented Feb 24, 2025

  1. The model should not override the default search engine configuration if the SearXNG instance is well-adjusted and optimized. Default results from the instance could be better, as it may aggregate multiple sources like Google, DuckDuckGo, Brave, Mojeek, and others, providing more comprehensive results. Allowing the model to select specific engines (e.g., Brave or DuckDuckGo) could lead to fewer results, especially if the selected engine becomes inaccessible due to anti-bot mechanisms. Overriding the default configuration might result in no available results.

  2. Embed search results to improve relevance and allow configuration of the number of results or relevance thresholds provided to the AI.

  3. Enable configuration of the SearXNG instance directly within the user interface.

  4. (Optional) Allow the model to fetch the full content of one or more links to enhance the depth of information and provide more valuable insights.

@Kac001

This comment has been minimized.

@lobehubbot

This comment has been minimized.

@Kac001
Copy link

Kac001 commented Feb 25, 2025

已解决,原因是提供的zeabur模板(https://zeabur.com/templates/77FSH6) 没有设置 json 输出
进去searxng容器修改/etc/searxng/settings.yml文件
找到

formats:
    - html

修改为

  formats:
    - html
    - json

最后重启容器

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Resolved because the provided zeabur template (https://zeabur.com/templates/77FSH6) does not open json output settings
Go into the searxng container to modify the /etc/searxng/settings.yml file
turn up
formats:
- html

Add a json
formats:
- html
- json
Finally restart the container

@SAnBlog

This comment has been minimized.

@zhuozhiyongde

This comment has been minimized.

@lobehubbot

This comment has been minimized.

@Kac001
Copy link

Kac001 commented Feb 25, 2025

可以使用docker自行部署
1.部署searxng容器,端口按需修改

docker run -d --name searxng -p 8080:8080 \
  -v 修改为本地目录:/etc/searxng \
  -e SEARXNG_SETTINGS_FILE=/etc/searxng/settings.yml \
  searxng/searxng

2.修改settings.yml
找到

formats:
- html

修改为

formats:
- html
- json

3.重启,修改lobe配置文件
SEARXNG_URL=https://服务器ip:8080

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


You can use docker to deploy it yourself

  1. Deploy the searxng container and modify the port as needed
    docker run -d --name searxng -p 8080:8080
    -v Modify to local directory: /etc/searxng
    -e SEARXNG_SETTINGS_FILE=/etc/searxng/settings.yml
    searxng/searxng
  2. Modify settings.yml
    turn up
    formats:
  • html

Modified to
formats:

  • html
  • json
  1. Restart and modify the lobe configuration file
    SEARXNG_URL=https://server ip:8080

@arvinxx
Copy link
Contributor Author

arvinxx commented Feb 25, 2025

@Kac001 老哥有没有兴趣来优化docker-compose 哇 😆 直接给一键部署脚本搞上!

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


@Kac001 Are you interested in optimizing docker-compose? Wow 😆 Just add one-click deployment script!

@arvinxx
Copy link
Contributor Author

arvinxx commented Feb 25, 2025

@130aac8 Thanks for your advice! Let me reply one by one .

The model should not override the default search engine configuration if the SearXNG instance is well-adjusted and optimized. Default results from the instance could be better, as it may aggregate multiple sources like Google, DuckDuckGo, Brave, Mojeek, and others, providing more comprehensive results. Allowing the model to select specific engines (e.g., Brave or DuckDuckGo) could lead to fewer results, especially if the selected engine becomes inaccessible due to anti-bot mechanisms. Overriding the default configuration might result in no available results.

I think you are right, I will adjust prompts to make sure only search specific engine when user point out it in the query. Actually what you say is also bother me in some cases.

Embed search results to improve relevance and allow configuration of the number of results or relevance thresholds provided to the AI.

I think it's no need, as SearXNG has already a page rank algorithm. we have use the confience scores to involve useful results. So embedding is no need.

Enable configuration of the SearXNG instance directly within the user interface.

I think it's the next step to support config with SearXNG, also we will support more search provider like tavily and exa.

(Optional) Allow the model to fetch the full content of one or more links to enhance the depth of information and provide more valuable insights.

Yeah! It's also the next step to improve the search ability. Actually we have a plugin named web-crawler that can make same functionality. You can combine use these features together.

@SAnBlog

This comment has been minimized.

@Kac001
Copy link

Kac001 commented Feb 25, 2025

@arvinxxdocker-compose/setup.sh 这个脚本吗

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


@arvinxx Is this script docker-compose/setup.sh

@lobehubbot

This comment has been minimized.

@arvinxx

This comment has been minimized.

@lobehubbot

This comment has been minimized.

@arvinxx
Copy link
Contributor Author

arvinxx commented Feb 25, 2025

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


@Kac001 https://github.com/lobehub/lobe-chat/blob/main/docker-compose/local/docker-compose.yml This file

@SAnBlog

This comment has been minimized.

@lobehubbot

This comment has been minimized.

@ChenLuoi
Copy link

当前已经实现的搜索,应该是拿的搜索引擎结果前5条的title和content丢给AI去分析,但是很多情况下这个体验并不好。
比如我搜索“今日热点新闻”,最终总结出的结果不是热点新闻内容,而是各大新闻网站的简介。
如果拉取网页内容一并发送给AI可能结果会更好看些,但是代价就是响应时间大大延长,token用量也会增加。

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


The search currently implemented should be to throw the titles and contents of the first 5 search engine results to the AI ​​for analysis, but in many cases the experience is not good.
For example, when I searched for "Today's Hot News", the final result was not the hot news content, but the introduction of major news websites.
If you pull the web page content and send it to the AI, the result may be better, but the cost is that the response time is greatly extended and the token usage will also increase.

@130aac8
Copy link

130aac8 commented Feb 25, 2025

@130aac8 Thanks for your advice! Let me reply one by one .

The model should not override the default search engine configuration if the SearXNG instance is well-adjusted and optimized. Default results from the instance could be better, as it may aggregate multiple sources like Google, DuckDuckGo, Brave, Mojeek, and others, providing more comprehensive results. Allowing the model to select specific engines (e.g., Brave or DuckDuckGo) could lead to fewer results, especially if the selected engine becomes inaccessible due to anti-bot mechanisms. Overriding the default configuration might result in no available results.

I think you are right, I will adjust prompts to make sure only search specific engine when user point out it in the query. Actually what you say is also bother me in some cases.

Embed search results to improve relevance and allow configuration of the number of results or relevance thresholds provided to the AI.

I think it's no need, as SearXNG has already a page rank algorithm. we have use the confience scores to involve useful results. So embedding is no need.

Enable configuration of the SearXNG instance directly within the user interface.

I think it's the next step to support config with SearXNG, also we will support more search provider like tavily and exa.

(Optional) Allow the model to fetch the full content of one or more links to enhance the depth of information and provide more valuable insights.

Yeah! It's also the next step to improve the search ability. Actually we have a plugin named web-crawler that can make same functionality. You can combine use these features together.

I think it's no need, as SearXNG has already a page rank algorithm. we have use the confience scores to involve useful results. So embedding is no need.

Yes, it indeed has a ranking algorithm, but if you take a closer look at its source code, you'll find that it doesn't suit the scenarios we currently require. Its algorithm is based on a per-search engine, weight-based approach, similar to the principles of traditional search engines. It factors in the credibility of the sources and incorporates the ranking positions of upstream search engines into the weight calculation, considering that it itself is a meta-search engine.

This method works well for traditional usage scenarios where users search for keywords, quickly browse through the results, and manually select the links they want to open. However, in our scenario, where the input for LLMs is limited and billed by token usage, this approach falls short. This limitation might also explain why you choose to extract only the top five results to submit to the model. Such a small number of results means that the quality of our search results must be exceptionally high. Additionally, since the search only returns partial content and the LLM cannot currently access the target links, the limited information from five results significantly restricts the depth and quality of the model's responses.

In my tests, directly extracting the top five results was not ideal. The relevance and depth of the content are limited, this led to some consumption of search resources and AI costs, but the responses obtained were relatively brief and might only contain a few key points. A lot of valuable information remains buried in other search results that are not included.

Besides the per-engine weight configuration, SearXNG also has a hostname-based priority setting. However, these scoring methods are quite restrictive and often fail to deliver the best search results for every query. This is especially problematic when the user's query requires more than a simple "yes" or "no" answer. In such cases, the current implementation in LobeChat frequently leads to missing information and insufficient depth.

We can also take inspiration from how similar products handle this challenge. For instance, products like Perplexity typically use at least 10–15 search results as the information source for their models. Even then, embeddings are often used to include as much relevant information as possible within the limited input.

I suggest suggest making some optimizations here. For example, consider optionally incorporating embeddings, increasing the number of search results to at least 10–15, or allowing users to configure the number of search results submitted to the model.

@SAnBlog
Copy link

SAnBlog commented Feb 25, 2025

当前已经实现的搜索,应该是拿的搜索引擎结果前5条的title和content丢给AI去分析,但是很多情况下这个体验并不好。 比如我搜索“今日热点新闻”,最终总结出的结果不是热点新闻内容,而是各大新闻网站的简介。 如果拉取网页内容一并发送给AI可能结果会更好看些,但是代价就是响应时间大大延长,token用量也会增加。

是不是可以在联网配置处选择条数数量?

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


The search currently implemented should be to throw the titles and contents of the first 5 search engine results to the AI ​​for analysis, but in many cases the experience is not good. For example, when I searched for "Today's Hot News", the final result was not the hot news content, but the introduction of major news websites. If you pull the web page content and send it to the AI, the result may be better, but the cost is that the response time is greatly extended and the token usage will also increase.

Can I select the number of items in the network configuration?

@KoellM
Copy link

KoellM commented Feb 25, 2025

同样出现TRPCClientError,情况是自部署 + basic auth
URL为
https://username:[email protected]/search?format=json&q=claude%20latest%20model
浏览器中可以直接访问并且返回搜索响应

请求trpc/tools/search.query时返回

[
  {
    "error": {
      "json": {
        "message": "Request cannot be constructed from a URL that includes credentials: https://username:[email protected]/search?format=json&q=claude%20latest%20model",
        "code": -32603,
        "data": {
          "code": "SERVICE_UNAVAILABLE",
          "httpStatus": 503,
          "path": "search.query"
        }
      }
    }
  }
]

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


TRPCClientError also appears, the situation is self-deployment + basic auth
The URL is
```https://username:[email protected]/search?format=json&q=claude%20latest%20model````
You can access it directly in the browser and return to search for the corresponding

Return when requesting ```trpc/tools/search.query``

[
  {
    "error": {
      "json": {
        "message": "Request cannot be constructed from a URL that includes credentials: https://username:[email protected]/search?format=json&q=claude%20latest%20model",
        "code": -32603,
        "data": {
          "code": "SERVICE_UNAVAILABLE",
          "httpStatus": 503,
          "path": "search.query"
        }
      }
    }
  }
]

@arvinxx
Copy link
Contributor Author

arvinxx commented Feb 25, 2025

@KoellM 你这种复杂 case 先不考虑支持…

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


@KoellM You are a complex case, don't consider supporting it for now...

@arvinxx

This comment has been minimized.

@lobehubbot

This comment has been minimized.

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


TRPCClientError appears after deploying it yourself, and you need to add it - json The specific format is as follows

@tom-kate
Copy link

tom-kate commented Feb 25, 2025

@xccado @KoellM 自己部署出现TRPCClientError,需要映射的settings.yml文件添加- json 具体格式如下

search:
  formats:
    - html
    - json

search路径下formats

@xccado
Copy link

xccado commented Feb 25, 2025

@xccado @KoellM 自己部署出现TRPCClientError,需要映射的settings.yml文件添加- json 具体格式如下

search:
  formats:
    - html
    - json

search路径下formats

ok 解决了,这个方法是对的

@SidneyLYZhang
Copy link

Image
我也是自部署的searXNG,设置好之后,查询显示 Failed to search: TOO MANY REQUESTS ……

@tom-kate
Copy link

Image 我也是自部署的searXNG,设置好之后,查询显示 Failed to search: TOO MANY REQUESTS ……

settings.yml里改limiter: false

@SidneyLYZhang
Copy link

Image 我也是自部署的searXNG,设置好之后,查询显示 Failed to search: TOO MANY REQUESTS ……

settings.yml里改limiter: false

解决了感谢

@breakstring
Copy link

呃。。。请教两个问题:

  1. 这里向 SearXNG 发起请求是客户端的请求还是服务端发起的请求?
  2. 如果 SearXNG 里面要配置 API 或者其他鉴权方式怎么做?

@arvinxx

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Well. . . Ask two questions:

  1. Is the request to SearXNG here initiated here by the client or the request initiated by the server?
  2. What if you want to configure API or other authentication methods in SearXNG?

@arvinxx

@breakstring
Copy link

呃。。。请教两个问题:

  1. 这里向 SearXNG 发起请求是客户端的请求还是服务端发起的请求?
  2. 如果 SearXNG 里面要配置 API 或者其他鉴权方式怎么做?

@arvinxx

  1. 看日志里有请求,弄明白了。是 LobeChat 请求过去的。
  2. 牵扯到 2 的话,之前是担心要把 searXNG 暴露出去,现在既然不暴露出去。。。那么对我来说不加鉴权暂时没有影响。

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Uh. . . Ask two questions:

  1. Is the request to SearXNG here initiated by the client or the request initiated by the server?
  2. What if you want to configure API or other authentication methods in SearXNG?

@arvinxx

  1. Look at the request in the log and figure it out. It was LobeChat requested by the past.
  2. If it is involved, I was worried about exposing searXNG, but now I am not exposed. . . Then for me, it will not have any impact on me for the time being.

@Guducat

This comment has been minimized.

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


When using searchWithSearXNG, DeepSeek official API (Deepseek-chat) will search infinitely, and it will not reach the level that can be used. It needs to continue to optimize...

@lamcodes
Copy link

lamcodes commented Feb 26, 2025

Image

sonnet3.5会出现响应结果为空,一直卡在这里,后台能看到已经调用结束,存在日志记录了

使用claude-3-7-sonnet会出现这个错误,
"error": {
"error": {
"error": {
"type": "invalid_request_error",
"message": "messages.1.content.0.type: Expected thinking or redacted_thinking, but found text. When thinking is enabled, a final assistant message must start with a thinking block (preceeding the lastmost set of tool_use and tool_result blocks). We recommend you include thinking blocks from previous turns. To avoid this requirement, disable thinking. Please consult our documentation at https://****/en/docs/build-with-claude/extended-thinking (request id: 20250226083719768385422koglCkYC)"
}
},
"status": 400,
"headers": {
"date": "Wed, 26 Feb 2025 00:37:24 GMT",
"server": "nginx",
"connection": "keep-alive",
"content-type": "application/json; charset=utf-8",
"content-length": "551",
"x-rixapi-request-id": "20250226083719768385422koglCkYC"
}
}

@arvinxx
Copy link
Contributor Author

arvinxx commented Feb 26, 2025

DeepSeek官方API(Deepseek-chat)用searchWithSearXNG的时候会无限循环搜索,完全无法达到可以使用的水平,需要继续优化……

这个是 ds v3 的 function calling 能力不行。你换个模型就好了

@AmossXu
Copy link

AmossXu commented Feb 26, 2025

请问现在不支持Function call的模型可以用联网搜索么 比如 DS R1

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Can models that do not support Function call can be searched online? For example, DS R1

@cachexy123
Copy link

啥时候能支持应用层的搜索呀 😢

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


When will the application layer search be supported?

@breakstring
Copy link

啥时候能支持应用层的搜索呀 😢

现在不就是支持了嘛,这个帖子是在搜集使用反馈信息了啊。。。

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


When can I support application layer search? 😢

Isn't it just supported now? This post is collecting feedback information. . .

@cachexy123
Copy link

啥时候能支持应用层的搜索呀 😢

现在不就是支持了嘛,这个帖子是在搜集使用反馈信息了啊。。。

现在还是需要支持函数的模型才能搜

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


When can I support application layer search? 😢

Isn’t it just supported now? This post is collecting feedback information. . .

Now we still need a model that supports functions to search

@SAnBlog
Copy link

SAnBlog commented Feb 26, 2025

国内部署搜索结果的icon会挂掉,除了Proxy能否可以自定义这个地址?

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


The icon that deploys search results in China will be deactivated. Can Proxy customize this address?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests