Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tasks with status SPOOLED but missing spooled task files never run #28

Open
rbannon-tc opened this issue Sep 29, 2021 · 16 comments
Open

Comments

@rbannon-tc
Copy link

Hi,

I suppose this is kind of an odd one...not necessarily an issue...

So, I've got a Docker service using a Dockerized Django app that uses your taskmanager (which is great, btw).

I start up the service (after collecting all the commands). I create a scheduled task, start it, and then watch it run at the next scheduled time. Everything seems fine.

Then, I bring down the service and back up again. I believe this causes all the spool files in the container to be deleted (I'm not using a volume). The tasks I created are still there with the status SPOOLED, but they never run again at their scheduled times. (Again, I'm guessing because the spool files are no longer there.) In order to get them to run, I have to "Start" each of the tasks again (which I believe creates the spool files).

This is a mild inconvenience. I could create a volume that attaches to the container so the spool files are never removed, but there is also the case where the tasks themselves are in the Django database but (for some reason) that volume is gone (i.e. the spool files are gone).

Is this a known issue/bug/feature/whatever?

Tx,

Ryan

PS my requirements.txt

alabaster==0.7.12
appdirs==1.4.4
asgiref==3.4.1
attrs==21.2.0
autopage==0.4.0
autopep8==1.5.7
Babel==2.9.1
base58==2.1.0
bcrypt==3.2.0
beautifulsoup4==4.9.1
certifi==2021.5.30
cffi==1.14.6
charset-normalizer==2.0.4
cliff==3.9.0
cmd2==2.1.2
colorama==0.4.4
coreapi==2.3.3
coreschema==0.0.4
cryptography==3.4.8
debtcollector==2.3.0
decorator==5.0.9
Django==3.1.6
django-appconf==1.0.4
django-auth-ldap==2.2.0
django-better-admin-arrayfield==1.3.0
django-compressor==2.4
django-debug-toolbar==2.2
django-extensions==3.0.5
django-filter==2.3.0
django-guardian==2.3.0
django-mysql==3.8.1
django-nested-admin==3.3.2
django-netfields==1.2.2
django-sass-processor==1.0.0
django-simple-history==2.11.0
django-sslserver==0.22
django-uwsgi==0.2.2
django-uwsgi-taskmanager==2.2.12
django-widget-tweaks==1.4.8
djangorestframework==3.11.1
djangorestframework-csv==2.1.0
djangorestframework-guardian==0.3.0
djangorestframework-simplejwt==4.8.0
docutils==0.17.1
dogpile.cache==1.1.4
drf-yasg==1.20.0
et-xmlfile==1.1.0
fabric==2.5.0
file-read-backwards==2.0.0
futurist==2.4.0
gnocchiclient==7.0.6
idna==3.2
imagesize==1.2.0
inflection==0.5.1
invoke==1.6.0
iso8601==0.1.16
itypes==1.2.0
jdcal==1.4.1
Jinja2==3.0.1
jmespath==0.10.0
jsonpatch==1.32
jsonpointer==2.1
jsonschema==3.2.0
keystoneauth1==4.3.1
libsass==0.20.0
lxml==4.5.2
Markdown==3.2.2
MarkupSafe==2.0.1
monotonic==1.6
msgpack==1.0.2
munch==2.5.0
mysqlclient==2.0.1
netaddr==0.8.0
netifaces==0.11.0
openpyxl==3.0.6
openstacksdk==0.59.0
os-client-config==2.1.0
os-service-types==1.7.0
osc-lib==2.4.2
oslo.config==8.7.1
oslo.context==3.3.1
oslo.i18n==5.1.0
oslo.log==4.6.0
oslo.serialization==4.2.0
oslo.utils==4.10.0
packaging==21.0
paramiko==2.7.2
pbr==5.6.0
prettytable==0.7.2
psycopg2-binary==2.8.5
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycodestyle==2.7.0
pycparser==2.20
Pygments==2.10.0
pyinotify==0.9.6
PyJWT==2.1.0
PyNaCl==1.4.0
pyOpenSSL==20.0.1
pyparsing==2.4.7
pyperclip==1.8.2
pyrsistent==0.18.0
python-cinderclient==7.1.0
python-dateutil==2.8.2
python-designateclient==4.1.0
python-glanceclient==3.2.1
python-keystoneclient==4.1.0
python-ldap==3.3.1
python-monkey-business==1.0.0
python-neutronclient==7.2.0
python-novaclient==17.2.0
python-octaviaclient==2.1.0
python-openstackclient==5.3.1
pytz==2021.1
PyYAML==5.3.1
rcssmin==1.0.6
requests==2.26.0
requests-aws==0.1.8
requestsexceptions==1.4.0
rfc3986==1.5.0
rgwadmin==2.3.1
rjsmin==1.1.0
ruamel.yaml==0.17.16
ruamel.yaml.clib==0.2.6
simplejson==3.17.5
six==1.16.0
snowballstemmer==2.1.0
soupsieve==2.2.1
Sphinx==3.2.1
sphinx-rtd-theme==0.5.0
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==2.0.0
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.5
sqlparse==0.4.1
stevedore==3.4.0
toml==0.10.2
ujson==4.1.0
unicodecsv==0.14.1
uritemplate==3.0.1
urllib3==1.26.6
uWSGI==2.0.19.1
warlock==1.3.3
wcwidth==0.2.5
wrapt==1.12.1

@guglielmo
Copy link
Member

Hi,
your hypothesys is correct: the spooler is just a spooler and creates files containing the instructions for execution;
the next execution time is kept in the creation date of the file (which is moved to the future).
If the file is deleted or the file system suddenly disappears before the task is removed from the DB, then the task remains orphaned and the status is inconsistent with reality.

I would say that's a non-blocking issue, but I really don't know how to cope with it. If you have any idea, then let's discuss it.

Usually I add a volume, naming it uwsgi_spooler, attached to the directory where the spooler files live, in the container.

Other issues raising from a restart, could be the loss of the log files, that are kept under ${MEDIA_ROOT}/taskmanager/logs by default (in order for the log contents to be visible during execution and after).

So, this is the docker_compose.yml I would recommend:

services:
    web:
        ...
        volumes:
        - public:/app/public
        - uwsgi_spooler:/var/lib/uwsgi
        - ...
        command: /usr/local/bin/uwsgi --socket=:8000 --master --env DJANGO_SETTINGS_MODULE=config.settings --pythonpath=/app --module=config.wsgi --callable=application --processes=4 --spooler=/var/lib/uwsgi --spooler-processes=4
...

volumes:
    public:
        name: opdmservice_public
    uwsgi_spooler:
        name: opdmservice_uwsgi_spooler

@rbannon-tc
Copy link
Author

rbannon-tc commented Sep 30, 2021

Thanks for the quick response!

I agree, for sure a non-blocker. And yes, creating a volume is a good way to deal with it.

The ONLY thing I can think of is having a Django command: read the database for tasks that are SPOOLED. If a task is spooled, check if the spool file exists. If it does not, change the task to SCHEDULED.

Seems kind of hacky...I'm guessing the taskmanager does NOT interact with the file system at all, so this would be an exception, UNLESS uwsgi has some method to check if a spool ID exists, in which case that could be used to check for the files? Not sure...? Such a command could be run as part of the container startup...

Tx,

Ryan

@guglielmo
Copy link
Member

A manual clean-up task could be ok.

The task would check all tasks in a spool state, without the spooler file, and proceed to a stop/start sequence.
This would re-create the spooler files and put the tasks back to the scheduled status, ready to run at the correct time.

This task could be launched during the container startup.

@rbannon-tc
Copy link
Author

Yes...that would be amazing!

@rbannon-tc
Copy link
Author

BTW great product!

@guglielmo
Copy link
Member

This is now a task: restart_despooled_tasks, available in 2.2.13, just published on Pypi.
Hadn't yet had time to test the task, though. Please do let me know if it works.

@rbannon-tc
Copy link
Author

@guglielmo thanks...gonna try this out today and let you know.

@rbannon-tc
Copy link
Author

rbannon-tc commented Oct 6, 2021

@guglielmo getting the following error

Traceback (most recent call last):
  File "/opt/iaas-openstack-ops/openstack-ops/.env/lib/python3.8/site-packages/taskmanager/management/base.py", line 100, in execute
    output = super(LoggingBaseCommand, self).execute(*args, **options)
  File "/opt/iaas-openstack-ops/openstack-ops/.env/lib/python3.8/site-packages/django/core/management/base.py", line 371, in execute
    output = self.handle(*args, **options)
  File "/opt/iaas-openstack-ops/openstack-ops/.env/lib/python3.8/site-packages/taskmanager/management/commands/restart_despooled_tasks.py", line 57, in handle
    t.start()
AttributeError: 'Task' object has no attribute 'start'
Traceback (most recent call last):
  File "./manage.py", line 21, in <module>
    main()
  File "./manage.py", line 17, in main
    execute_from_command_line(sys.argv)
  File "/opt/iaas-openstack-ops/openstack-ops/.env/lib/python3.8/site-packages/django/core/management/__init__.py", line 401, in execute_from_command_line
    utility.execute()
  File "/opt/iaas-openstack-ops/openstack-ops/.env/lib/python3.8/site-packages/django/core/management/__init__.py", line 395, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/opt/iaas-openstack-ops/openstack-ops/.env/lib/python3.8/site-packages/django/core/management/base.py", line 330, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/opt/iaas-openstack-ops/openstack-ops/.env/lib/python3.8/site-packages/taskmanager/management/base.py", line 100, in execute
    output = super(LoggingBaseCommand, self).execute(*args, **options)
  File "/opt/iaas-openstack-ops/openstack-ops/.env/lib/python3.8/site-packages/django/core/management/base.py", line 371, in execute
    output = self.handle(*args, **options)
  File "/opt/iaas-openstack-ops/openstack-ops/.env/lib/python3.8/site-packages/taskmanager/management/commands/restart_despooled_tasks.py", line 57, in handle
    t.start()
AttributeError: 'Task' object has no attribute 'start'

UPDATE: there are tasks in the database and the spool files had been deleted. This IS the case in which I was describing at the beginning of this thread.

UPDATE 2: I ran it again after some other commands and if worked. I'll have to investigate further.

@guglielmo
Copy link
Member

No, actually start is not the correct method to start a task, which is launch.
So, I've released a new version (2.2.14) that should fix this.

@rbannon-tc
Copy link
Author

rbannon-tc commented Oct 6, 2021

@guglielmo

I had a task created, started it, waited for it to run, then deleted the spool file after it completed. Then, ran the command. Got the following:

[06/Oct/2021 19:10:43] ERROR [request_id=, module=django.commands]: exec_command_task() got an unexpected keyword argument 'at'
Traceback (most recent call last):
  File "/opt/iaas-openstack-ops/openstack-ops/.env/lib/python3.8/site-packages/taskmanager/management/base.py", line 100, in execute
    output = super(LoggingBaseCommand, self).execute(*args, **options)
  File "/opt/iaas-openstack-ops/openstack-ops/.env/lib/python3.8/site-packages/django/core/management/base.py", line 371, in execute
    output = self.handle(*args, **options)
  File "/opt/iaas-openstack-ops/openstack-ops/.env/lib/python3.8/site-packages/taskmanager/management/commands/restart_despooled_tasks.py", line 57, in handle
    t.launch()
  File "/opt/iaas-openstack-ops/openstack-ops/.env/lib/python3.8/site-packages/taskmanager/models.py", line 405, in launch
    task_id = exec_command_task.spool(self, **kwargs)
TypeError: exec_command_task() got an unexpected keyword argument 'at'
Traceback (most recent call last):
  File "./manage.py", line 21, in <module>
    main()
  File "./manage.py", line 17, in main
    execute_from_command_line(sys.argv)
  File "/opt/iaas-openstack-ops/openstack-ops/.env/lib/python3.8/site-packages/django/core/management/__init__.py", line 401, in execute_from_command_line
    utility.execute()
  File "/opt/iaas-openstack-ops/openstack-ops/.env/lib/python3.8/site-packages/django/core/management/__init__.py", line 395, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/opt/iaas-openstack-ops/openstack-ops/.env/lib/python3.8/site-packages/django/core/management/base.py", line 330, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/opt/iaas-openstack-ops/openstack-ops/.env/lib/python3.8/site-packages/taskmanager/management/base.py", line 100, in execute
    output = super(LoggingBaseCommand, self).execute(*args, **options)
  File "/opt/iaas-openstack-ops/openstack-ops/.env/lib/python3.8/site-packages/django/core/management/base.py", line 371, in execute
    output = self.handle(*args, **options)
  File "/opt/iaas-openstack-ops/openstack-ops/.env/lib/python3.8/site-packages/taskmanager/management/commands/restart_despooled_tasks.py", line 57, in handle
    t.launch()
  File "/opt/iaas-openstack-ops/openstack-ops/.env/lib/python3.8/site-packages/taskmanager/models.py", line 405, in launch
    task_id = exec_command_task.spool(self, **kwargs)
TypeError: exec_command_task() got an unexpected keyword argument 'at'

@guglielmo guglielmo reopened this Oct 6, 2021
@guglielmo
Copy link
Member

Will have a look at it, and will make some tests before releasing it, then.

@guglielmo
Copy link
Member

guglielmo commented Oct 6, 2021

It seems to me that there's a conceptual problem I didn't take into consideration:
the mechanisms to create the spooler files are ingrained into the uwsgi spooler sub-module,
which is not available during a django management command execution.

It is only reachable within a web app (the admin app, for example),
so the kind of automation we were looking for is not possible, and that's a limitation for this solution.

@rbannon-tc
Copy link
Author

@guglielmo understood...thanks for trying!

@rbannon-tc
Copy link
Author

rbannon-tc commented Oct 21, 2021

@guglielmo hey, I was thinking...could the taskmanager use a django signal to get notified on startup? I haven't played with Django signals much, but I reckon there's a way for a Django app to know when Django has been started...or a startup hook?

@guglielmo
Copy link
Member

There is no handle to create a signal when the uwsgi server starts.
Everything starts anew with each request and is destroyed with every response.

You could intercept each request, using the https://docs.djangoproject.com/en/3.2/ref/signals/#request-started signal,
then check the existance of the spooler directory and proceed from there.
But I guess you could have both some performance issues, and some security one (you need to create a spooler directory using the same permission that the uwsgi process uses, I'm no expert, but I think that is not a good practice).

@rbannon-tc
Copy link
Author

Oh, I understand more now...yeah, that would most definitely not work. Sorry, I didn't fully understand. Apologies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants