-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Background tasks (task module) #100
Comments
I'm in! |
After working with Gregor today to gain a better understanding of the vision for tasks (thanks @gr2m!), here's what I propose are the low fruit todos to get started:
More to come, please comment on additions/changes to this list and I'll update accordingly. Thanks! |
👍 sounds good! |
@gr2m you wrote:
Can you explain the buggy behavior? I'm running it now and it would be helpful to know what you were experiencing before I dive deeper. So far the PouchDB instance starts, client bundling occurs, and on the surface it seems to be running ok aside from the 404 errors noted as expected. Granted I haven't tried to do anything useful with it yet. :) |
When I run
the |
But note that the setup that spawns I’d spend a moment to try fix the setup with |
An update: After a thorough review of the hoodie-standalone-tasks repo, and an attempt to repair, Gregor and I determined that it's just going to take too much to bring it up to date and functional. Instead, I've started the process of creating a new standalone "hoodie-task" working example using hoodie-store as the basis. So far, I've managed to get the REST portion of the tasks-server working with the endpoints as originally conceived - minus any queue id filtering, which will be necessary in the end product. Aside from filtering, next up will be the task server side API, which will surely include support for task events much like the store API does today. And since this is all based on the store-server module, I believe much of that will already be baked in through the Hoodie Store API. I also plan to leverage the other Store APIs that should allow for server-side manipulation of the lifecycle of tasks, including creation, updates, closing, etc. All of this will likely make more sense when the code is pushed, but I wanted to get the basic thought process out there for comments sooner than later. More to come! |
@gr2m (and anyone else that's interested) - please have a look at this approach and let me know if I'm on the right track. Thanks much! |
@gr2m - as it relates to the start() method here:
I'm wondering if we should go with something like this instead:
For me personally, I would think of queue.start() as a method for starting the entire queue, rather than initiating a single task item. your thoughts? |
With that ^ in mind, we'd likely want something like the following as well:
|
Todos updated |
I think neither are perfect, both add and start look like they just add the task, while the method only resolves after the full life cycle was executed. But I think the functionality we will need is clear and I would suggest to focus on that first, to get the task module itself working. Once we have all in place, we should have an extended discussion about the naming of the methods |
While working on the task modules, there's been some design concepts - some old and some new - that have been discussed between Gregor and I that we feel should be documented and up for discussion while development is underway. Here's the (work-in-progress) list of those items that collectively describe the new tasks approach. I'll do my best to update it as discussions occur in the various channels and as development progresses. Please be sure to reach out either here or in the Slack #Dev channel if any of you have questions, ideas, or comments. Thanks much!
Additionally:
(more to come - including links to relevant issues/materials - this is a quick list off the top of my head) |
Meaning using a CouchDB replication filter to only sync tasks docs to a client that belong to a specific user? There are two problems with this:
Why different APIs on client and server? Could we get a full list of APIs on both sides, including methods and the events they issue? Maybe something like this: Sorry if that exists elsewhere, pointers appreciated ;) |
@janl - thanks for the questions!
I'm admittedly still coming up to speed on how we might accomplish the filtering, so I'll defer to @gr2m to explain the conceptual approach, but I believe the intent is to handle that on the db side of the proxied routes, to ensure there's no leakage. As you can see, that hasn't been determined yet, so your feedback is important here.
The (limited) documentation on Valid points! We'll definitely have to give some thought to what you're suggesting. |
I had the discussion with @glynnbird who is working on envoy, a solution that emulates a-database-per-user while still storing all data in a single database. What we were thinking is to let only initial bootstrapping request through. Once they get "since=now", we would handle them in the Node server. That way we would have a single continuous /_changes request to the database, and then using events in the Node server to respond to the continuous changes requests from the client. This must be well tested and secure of course, but it certainly would scale better than having thousands of separate databases with open changes requests, I think? And the other problem with separate databases is that we won’t be able to query data across them easily, which is more important for tasks than for user data.
Users won’t have access to the database at all. They can’t do |
How is this more important for tasks? What use-cases or implementation details are you considering? What would be the problem with a task db per user? One concern is listening to database changes, and that’s easier with one vs. one per user. With CouchDB 2.0, we can already build an efficient “listen to all databases’ _changes feeds” e.g. fairly easily. For CouchDB 1.x it’d be semi efficient, but we have had this code in Hoodie before. I’d say the “query across dbs” is a very different concern from what we are trying to do here, so I’d like to suggest to keep that a separate issue as long as it isn’t required for implementing a feature. E.g. later, we might consider this so important that we put all user’s dbs behind a single envoy database. FWIW, I haven’t had time to look into envoy thoroughly, but from a glance (and from the README), this is in very early stages and I wouldn’t bet a major Hoodie feature on this just yet. If this proves useful, this could relatively easily be added to CouchDB proper, and then we can just use it from there.
Then I don’t understand how you want to do filtered replication. It means accessing the db. Or do you mean tasks should work online only? |
@inator Thanks for the pointers. I wanted to make clear that I’m okay with having different APIs on client and server, because the environments are different. But I think the same concepts should have the same name on both sides. E.g. on the client we do |
@janl don’t bother looking at all of envoy, I don’t think that more than 10 lines will be relevant for our case. In I would also separate the discussion on tasks and user databases. Tasks will be simpler, we only expose replication APIs and don’t even need to add/remove
In the admin dashboard, I want to see all tasks that failed across all user accounts with their error messages. And I want to query all tasks that succeeded. I don’t know how to make this possible with a database per user.
It must work with PouchDB adapters for persistence, too, so we will have to implement it within Hoodie either way |
My question was more meant like this: how important is it to see globally tasks failed/succeeded? As opposed to e.g. only failed per user, or a log file of errors? And then: there are many features come to mind that would require a query across all user dbs. How important are these features? Should we rather make a strategic decision now to use only one large db for all user data and one for tasks, so we can do all these things. — I’m not necessarily advocating for this, but If we do this, we better do it now.
Those are not all the APIs you need for replication. The complexity needed to proxy every request needed for replication and make them safe so we can guarantee that no data is accidentally leaked is way too high for us at this point. We’ve talked about all this when moving the authentication to the Hoodie layer and the only reason I was, in the end, happy to go with it, is because we do the auth business in the HTTP headers, and it doesn’t matter what kind of HTTP requests are made. There are two options here that are safe:
The statement that makes me nervous is “only the replication API”. There is no such thing. CouchDB replication just uses the APIs that regular end-users can use. There is no “we allow replication, but not regular access to the database”. That means we have to secure every single request to CouchDB and as stated above, I believe the complexity for this is too high for Hoodie at this point in general. And there are are alternatives for solving the required backend features.
What if user |
I meant Envoy. Ideally, to PouchDB, this looks like db-per-user, but behind the scenes it isn’t. If anywhere, this functionality should live in CouchDB and PouchDB need not know about it. |
One more thing, since you mention scale. Either way, this would mean a continuous _changes stream from client to server for the regular per-user db and one for the tasks database (doesn’t matter if one single, or per-user). Are we prepared to do this? |
Yes I thought about this, all requests would be authenticated with a random token that is generated the first time you create a task, the token is stored in a
I think this is very important and we should come up with a decision soon, as the migration might be complex. I don’t mean to push for this to hard, I know this is a security concern. I would suggest the following: We move on with the task module in its unsecure state. We will not merge it into |
@janl - Sorry for the delayed response. Makes perfect sense! I opened this issue to look at making nomenclature and method names consistent where appropriate. |
Heads up - there's been some offline discussion on this, and ultimately @gr2m and I have elected to move forward with this plan:
Please be sure to let us know if there's any concerns with the approach (and other tasks related concepts outlined above). Thanks much! |
Feature list updated:
|
thanks for keeping us in the loop, Dale! |
Todos updated |
Hey @inator how are things? Do you have any progress on the task module? |
This is a high-level issue for a complicated task, meant for discussion and to find a "champion" who wants to take ownership of bringing background tasks back to Hoodie. Comment below if you are interested in helping us :)
❓ The Motivation
Background tasks can be used to trigger server-side logic provided by plugins. Tasks are simply JSON documents just like data. And just as with data, we have offline sync, so users can start tasks even without an internet connectivity, which is more robust than relying on REST APIs.
🤔 What you will need to know to work on this issue
Node.js, hapi, PouchDB and you should have a general understanding of Hoodie’s sync architecture.
We had background tasks implemented in the "old" hoodie, tasks were simply stored in the users database and synchronized to the user’s database in the backend. We found out eventually that this was a bad design decision, as we want to use tasks even if the user did not yet sign up, and listening to all users databases has a very bad performance impact. So for the new hoodie, we decided to go with a dedicated "tasks" database instead.
We already have @hoodie/task-client which is very little code, it is using [@hoodie/store-client under the hood. And we have @hoodie/task-server which exposes only the CouchDB APIs relevant for sync. The idea is that the server is filtering
/_changes
so that a user can only see the tasks belonging to them, but it is not yet implemented. I would recommend to compare to hoodie-store-server’s implementation and start from there. Note thathoodie-store-server
also abstracts away whether the app is using a CouchDB as its backend and if not falls back to PouchDB. We need the same for the task server module.And lastly there is hoodie-standalone-task, a module we created to test the client & server together. I would recommend to use it for local development and testing. Once we got it all working in
hoodie-standalone-task
we can add the server & client code tohoodie
itself (in@hoodie/client
and@hoodie/server
respectively).hoodie-standalone-task
is only a temporary repository so whoever is championing this, I’m happy to give you full admin rights so you can just merge your own pull requests yourself as you need it. For example, I think it would make sense to extenddemo/index.html
to have some UI that might help with debugging, like logging events into a table for example. Unfortunately there seems to be a bug inhoodie-standalone-task
right now, I couldn’t get it work myself, but it worked in the past. So first task will be to get it working again :)The text was updated successfully, but these errors were encountered: