Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tiltles synching, content is not, and URLs are incorrect. #2

Open
luca-git opened this issue Jan 1, 2025 · 25 comments
Open

tiltles synching, content is not, and URLs are incorrect. #2

luca-git opened this issue Jan 1, 2025 · 25 comments

Comments

@luca-git
Copy link

luca-git commented Jan 1, 2025

Hi, thanks for building this, it’s badly needed. I’ve been able to build in linux and windows, but it syncs only the titles, no content and wrong URLs, is it working properly for you? any setting hint you’d mind sharing?

@GuentherE
Copy link

GuentherE commented Jan 1, 2025

same here. Titles are fetched, but links are invalid: e. g. site:: [www.reddit.com]()

should be either:
site:: [Updated to latest release, and this is all I get now on local machine or remote](https://www.reddit.com/r/immich/comments/1hfk9ai/updated_to_latest_release_and_this_is_all_i_get/)
or
url:: https://www.reddit.com/r/immich/comments/1hfk9ai/updated_to_latest_release_and_this_is_all_i_get/

date saved is also empty: date-saved:: [[]]

@GuentherE
Copy link

There is also another tool for logseq: https://codeberg.org/strubbl/wallabag-logseq

It is just a command line tool and not a plugin, but it imports all archived articles into a single logseq page.

Maybe someone can reuse some code to fix our import problem ...?

@hnykda
Copy link
Owner

hnykda commented Jan 2, 2025

Thanks for reporting! I will take a look at it.

There is also another tool for logseq

I did find that tool before trying to implement this one, but I couldn't see how it would help. It's written in Go and is a CLI, not a javascript-like logseq plugin.

@hnykda
Copy link
Owner

hnykda commented Jan 2, 2025

@luca-git did you have syncContent option set to true? Without that the content is not synced.

image

@hnykda
Copy link
Owner

hnykda commented Jan 2, 2025

I pushed a new version. It removes some customizable templating and separate-pages-sync for now and hardcodes the template in a way that it can always be rendered. Can you try it for me, please?

@luca-git
Copy link
Author

luca-git commented Jan 3, 2025

Jep it was properly checked. I'll try this in the weekend :)

@luca-git
Copy link
Author

luca-git commented Jan 5, 2025

testing it, synchyng since today at 1pm , 12.5 hours! it used to take 3 hourt to synch only titles, so kinda expected but too much IMO. I have only 900 articles so definitely an issue, if tomorrow morning it's finished i'll follow up, otherwise i'll have to find a way to setup Omnivore somewhere :(

@hnykda
Copy link
Owner

hnykda commented Jan 5, 2025

You do you, but from my limited eyeball testing it seemed that the bottleneck was logseq, not wallabag/omnivore (your wallabag server could be of course very slow if you run it on potato, but I would be somewhat surprised). I think it could be sped up by e.g. increasing the batch size (currently at 30) to e.g. 100, maybe that would speed things up, but I don't think you would notice. I am adding a proper release so you don't have to use this unpacked version - maybe it will be faster when build for production?

I believe this is the reason for the push towards logseq DB version (that has been in development for more than a year), so it's faster for bigger databases/ingestions.

find a way to setup Omnivore somewhere

Btw. that was my first go to strategy. But after trying it, I decided to go with wallabag instead.

@luca-git
Copy link
Author

luca-git commented Jan 5, 2025

Thaknks for your work, I had to stop the process now after 24 hours. I'm not an expert of LogSeq by no means so I'm sure you are right, I believe synching from omnivore was way faster though, but can't say why. I'll try the release when available. (BTW, after force stopping it the created page, presumably partial, didn't have contents as the previous one as well). I'll do y test on ubuntu next. Who knows.

@luca-git
Copy link
Author

luca-git commented Jan 5, 2025

Where is this new version getting the configs? i see my paswords and secretsare still there in the built version you just released, amybe it's remebering some wrong settings as well?

{
  "generalSettings": "",
  "wallabagUrl": "redacted",
  "clientId": "redacted",
  "clientSecret": "redacted",
  "userLogin": "redacted",
  "userPassword": "redacted",
  "syncContent": true,
  "frequency": 60,
  "syncAt": "2025-01-01T21:27:22",
  "graph": "personali",
  "pageName": "Wallabag",
  "disabled": true,
  "filter": "import all my articles",
  "customQuery": "",
  "highlightOrder": "the time that highlights are updated",
  "isSinglePage": true,
  "createTemplate": "",
  "createTemplateDesc": "",
  "articleTemplate": "- [{{{title}}}]({{{url}}})\n      site:: [{{{siteName}}}]({{{url}}})\n      author:: {{{author}}}\n      date-saved:: [[{{{date}}}]]\n      id-wallabag:: {{{id}}}",
  "highlightTemplate": "> {{{text}}} [⤴️]({{{highlightUrl}}}) {{#labels}} #[[{{{name}}}]] {{/labels}}\n\n{{#note.length}}note:: {{{note}}}{{/note.length}}",
  "advancedSettings": "",
  "headingBlockTitle": "## 🔖 Articles",
  "loading": true,
  "syncJobId": 0,
  "version": "1.0.1",
  "apiVersion": "\"2.6.10\"",
  "apiToken": "redacted",
  "refreshToken": "redacted",
  "expireDate": 1736085091542,
  "isTokenExpired": false
}

@hnykda
Copy link
Owner

hnykda commented Jan 5, 2025

I have removed as much as I could (especially things that were not used/implemented). But it should use the same config as any other plugin. You still have some old fields there, but most of them are now ignored. You might remove this and start over, that way it will source only fields that are actually used.

@luca-git
Copy link
Author

luca-git commented Jan 5, 2025

I run it this afternoon on linux, same issue as before. Not sure how to get a rid of the old config as i unistalled and reinstalled the plugin using the complied one, but no luck. Is it working on your end?

@hnykda
Copy link
Owner

hnykda commented Jan 5, 2025

By the same issue you mean that it's taking long?

To wipe the settings, you can go to Settings -> Plugins -> Wallabag sync -> Edit settings.json -> Remove everything and save. Then add any option to the config via UI and it will regenerate a new config.

@luca-git
Copy link
Author

luca-git commented Jan 5, 2025

I mean not synching content after a very long process, but only titles. I figured out where the configs where and ran afresh a couple of hours ago.. will update as soon as I know.

@luca-git
Copy link
Author

luca-git commented Jan 6, 2025

After resetting the configs and a lengthy process now the wallabag page is just empty. If it worls for you it's possibly me, both on windows and Linux for some reasons. Giving up.

@hnykda
Copy link
Owner

hnykda commented Jan 6, 2025

Darn. I am sorry, and I appreciate you trying and helping out, really. Will work on this further.

@luca-git
Copy link
Author

luca-git commented Jan 6, 2025

Thanks for your efforts on this, it's important. Omnivore was such an amazing tool, my workflow would improve a lot if you succeed.

@GuentherE
Copy link

GuentherE commented Jan 7, 2025

Congratulations Daniel!

I've de-installed the old version and deleted everything before installing the new one.

I've tried the latest develop version and it's working as expected: Content, url und highlight were synced. It took about 15 minutes for about 700 articles.

Bildschirmfoto_2025-01-07_19-46-52

Author and published-at were not filled in wallabag either, but they were filled in for other fetched articles, so it's also working.

Bildschirmfoto_2025-01-07_20-48-21

Only tags are not synced.

Now it would be great to get the multiple page feature back ... ;-)

@GuentherE
Copy link

GuentherE commented Jan 7, 2025

During the import process it took a lot of cpu load. Probably logseq is processing the large amount of content into its fulltext engine. During the next sync it took the high cpu load again and it runs even longer. It seems that the plugin starts all over and fetches all articles again. Shouldn't it just fetch newer articles?

Without fetching content the whole first import process takes only about 20 seconds. For the next sync it took a lot more time (about 3 minutes)..

@hnykda
Copy link
Owner

hnykda commented Jan 7, 2025

Oh, thanks @GuentherE for testing, that is encouraging.

Author and published-at were not filled in wallabag either, but they were filled in for other fetched articles, so it's also working.

Yeah, we do capture an author if it's available in wallabag data, so if it's not there, it's rather an issue with wallabag parser and you might propose some feature there to get higher coverage.

Only tags are not synced.

Yeah, we do have them from wallabag but are not part of the template. I think this should be an easy fix.

Now it would be great to get the multiple page feature back ...

Yeah, as mentioned in the other issue, this is bigger. I am pretty sure it would take me like half a day, but that is very expensive for me.

During the import process it took a lot of cpu load. Probably logseq is processing the large amount of content into its fulltext engine. During the next sync it took the high cpu load again and it runs even longer. It seems that the plugin starts all over and fetches all articles again. Shouldn't it just fetch newer articles? Without fetching content the whole first import process takes only about 20 seconds. For the next sync it took a lot more time (about 3 minutes)..

This is useful profiling, thanks for that! It is indeed true that we fetch everything again, and then do some kind of a comparison to see if stuff got changed in the source (wallabag) 🤔 . We should add lastSyncedAt and use that in the filter when syncing again I believe.

@luca-git
Copy link
Author

luca-git commented Jan 9, 2025

still trying to wrap my head around the issue, could you kindly share how the writing dir is built? I've notcide that there might be an issue there in my case. should i inset into the path of the pluging the whole path to my graph? (the one which has 4 dirs, in: assets, journals, logseq, pages?) Thanks.

@hnykda
Copy link
Owner

hnykda commented Jan 9, 2025

Sorry, writing dir? I am confused here. You can now install the current version of the plugin via official way through builtin plugin marketplace (remove the dev one before doing that).

@luca-git
Copy link
Author

luca-git commented Jan 9, 2025

There's a parameter regarding the graph, I guess it uses that to choose where to write.

@hnykda
Copy link
Owner

hnykda commented Jan 9, 2025

Yes - what about it? That's not related to this plugin 🤔 . You can create as many graphs as you want and wherever you want. E.g. you can create a folder "logseqgraphs" on your desktop and create a new called "testingwallabag" and then opening that and setting that parameter to that one.

@luca-git
Copy link
Author

luca-git commented Jan 9, 2025

I was hoping some wrong dir there would cause the issue but i tested it and i still get the good 'ol wallabag page, completely empty. When ill have time ill ask roocline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants