Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use local files instead of arguments #1

Open
krtschmr opened this issue Sep 10, 2017 · 34 comments
Open

use local files instead of arguments #1

krtschmr opened this issue Sep 10, 2017 · 34 comments

Comments

@krtschmr
Copy link

panel can be totally ignored since we have the data in all local files available (those are the data that gets reported anyways).

we can simply run the script, no need for any configurations. makes it way easier.

@transeos
Copy link
Owner

Sorry, I got busy with another got busy with another project.
I did the change just now. I'm doing the commit within an hr.

Also, accessing panel checks whether internet is working. If internet is down, no need for repeated reboot in every 8 min. Obviously there are other ways to check for internet, i thought checking for panel info is not bad approach.

@krtschmr
Copy link
Author

krtschmr commented Sep 10, 2017

in case the panel have falsy stats, no-updates (which happens sometimes for a few rigs if they go zombie-load !) or any other reasons we cant get the data we then actually dont get any current data.

i have sometimes a rig that goes high load but hashes like a champ. it simply is in high-load so i can't ssh into it and he can't update (but he hashes!). if a card fails, we will never see it via panel but via claymore-ethminer.exe or if we check locally. i think it's the better approach. the data is locally, why gather it remotely?

checking do we have internet is a nice thing tho

@transeos
Copy link
Owner

It happened to me once while mining xmr. Instead of gpu mining, it probably started cpu mining.

I think if the panel is not getting refreshed even once in 8 min, there is some problem which should be looked at.

@transeos
Copy link
Owner

I've pushed another change to handle above situation.

@krtschmr
Copy link
Author

invalid url @ 2017-09-11 03:26:50.628405
invalid url @ 2017-09-11 03:30:50.651621

seems like something is wrong here

@transeos
Copy link
Owner

Please change the rig name and panel address to xxx if required and show me the output of "/home/ethos/gpu_crash.log".

@krtschmr
Copy link
Author

i did send you an email

@transeos
Copy link
Owner

trying a workaround

@krtschmr
Copy link
Author

krtschmr commented Sep 10, 2017

i wonder how this can actually happen since the url should be always in the files.

maybe the best is to dump the json and switch to local reads, then we avoid this source of error.

how many rigs do you have to try out?

@transeos
Copy link
Owner

I've made rig name and panel url as optional arguments so that you can use them on those rigs where you are running into error.

@transeos transeos reopened this Sep 10, 2017
@krtschmr
Copy link
Author

krtschmr commented Sep 11, 2017

today from ethosdistro channel
"Configmaker / Stats panel temporarily down. Will update with ETA when available. website/update/update2/get/paste are online"

so, i will fork and make everything locally :P

@transeos
Copy link
Owner

I'm also facing some issue.

@krtschmr
Copy link
Author

    miner_hashes = map( float, commands.getstatusoutput("cat /var/run/ethos/miner_hashes.file")[1].split("\n")[-1].split() )
    numGpus = int(commands.getstatusoutput("cat /var/run/ethos/gpucount.file")[1])
    numRunningGpus = len(filter(lambda a: a > 0, miner_hashes))

we can use these and everything should work?

@krtschmr
Copy link
Author

krtschmr commented Sep 11, 2017

idea:
this shorty does kinda the same?

https://pastebin.com/s4VewKJB
edit:
this even better:
https://pastebin.com/8Zu5G5rA

@transeos
Copy link
Owner

Thanks.

I'll have a look later.

@krtschmr
Copy link
Author

https://github.com/krtschmr/ethos_monitor/blob/master/check_crash.py

this works perfect now, including autoupdate before he reboots, in case we changed anything. i'll run this version for my farm now (but somehow my farm is stable since then. weired ;) )

@transeos
Copy link
Owner

Sorry, I'll be too busy in next 2 days to review this change.

@ghost
Copy link

ghost commented Feb 4, 2018

@krtschmr is it work on ethos 1.2.9 ?

@krtschmr
Copy link
Author

krtschmr commented Feb 5, 2018

@LazyScream absolutely. However 1.2.9 wasn't stable for my farm so i kept them at 1.2.7.
The script itself will work forever untill they have major changes to the GPU-statistic.

@ghost
Copy link

ghost commented Feb 5, 2018

@krtschmr
I found you do not need "rigname" and "ethosdistro.com/?json=yes" in your release
So just put check_crash.py under / home / ethos,And add "@reboot /home/ethos/ethos_monitor/check_crash.py" to crontab -e, your script will run automatically right?

@krtschmr
Copy link
Author

krtschmr commented Feb 5, 2018

almost :-)

wget https://raw.githubusercontent.com/krtschmr/ethos_monitor/master/check_crash.py
crontab -e
@reboot /home/ethos/check_crash.py
ctrl+o
python check_crash.py & # or you can run "r" for reboot

@ghost
Copy link

ghost commented Feb 5, 2018

@krtschmr
ok ! thx all !
and do you have any ides for join 「Pushover」on this scrip ?

@krtschmr
Copy link
Author

krtschmr commented Feb 5, 2018

ya, google knows


import http.client, urllib
conn = http.client.HTTPSConnection("api.pushover.net:443")
conn.request("POST", "/1/messages.json",
  urllib.parse.urlencode({
    "token": "APP_TOKEN",
    "user": "USER_KEY",
    "message": "RIG OFFLINE!!! OMG, we are boke!",
  }), { "Content-type": "application/x-www-form-urlencoded" })
conn.getresponse()

@ghost
Copy link

ghost commented Feb 5, 2018

Copy the code to any place on it
There are replacement APP_TOKEN, USER_KEY?
////
i got some error
File "./check_crash.py", line 22, in
import http.client
ImportError: No module named http.client

@krtschmr
Copy link
Author

krtschmr commented Feb 5, 2018

i really can't help with that, i'm not a specialist in python. obviously you need to bundle the http package first.

@Trigun87
Copy link

Trigun87 commented Feb 27, 2018

i made a reboot function with telegram warning

from urllib import urlopen
from urllib import quote

def RebootRig():
  DumpActivity("Rebooting (" + str(miner_hashes) + ")")
  uptime = float(commands.getstatusoutput("cat /proc/uptime")[1].split()[0])
  m, s = divmod(uptime, 60)
  h, m = divmod(m, 60)
  msg = quote("Rig1 Reboot uptime " + str(h) + ":" + str(m) + ":" + str(s))
  urlopen("https://api.telegram.org/botXXX:APIKEY/sendmessage?chat_id=ID&text=" + msg).read()
  os.system("sudo hard-reboot")
  os.system("sudo reboot")

and now i'm using @krtschmr version
now i need only to test if the uptime var is workiing ^_^ (just use telegram botfather for make a new bot and get api)

@Trigun87
Copy link

Trigun87 commented Feb 27, 2018

i think i found a bug on @krtschmr version...
in the disconnectcount part the script will check 12 times (without waiting) and after that it will trigger the break and the script stop
i think you need to place a reboot or a continue or something else and a time.sleep too
i changed in this way

 if (numRunningGpus != numGpus or numGpus != 13):

    if (waitForReconnect == 1 and numRunningGpus == 0):
      # all GPUs dead. propably TCP disconnect / pool issue
      # we wait 12 times to resolve these issues. this equals to 3 minutes. most likely appears with nicehash.
      disconnectCount += 1
      if (disconnectCount > 12):
        DumpActivity("Waiting for hashes back: " + str(disconnectCount))
        RebootRig()
        break
    else:
     disconnectCount = 0

    RebootRig()
    break
  time.sleep(15)

@jmverges
Copy link

jmverges commented Mar 1, 2018

@krtschmr is what is saying @Trigun87 true?

@krtschmr
Copy link
Author

krtschmr commented Mar 1, 2018

i don't know yet, had no time to look into, still trying to get new 600 gpu farm stable....

i can fix it later

@jmverges
Copy link

jmverges commented Mar 1, 2018

600 gpu? 😮

@Trigun87
Copy link

Trigun87 commented Mar 3, 2018

ok i fixed the check for disconnect (the var waitForReconnect was useless since was always 1)

https://github.com/Trigun87/ethos_monitor

i just forked ^_^ i use a new file for telegram warning (default disabled) and number of gpus on the rig (if start with less gpu it will reboot)

@krtschmr
Copy link
Author

krtschmr commented Mar 6, 2018

@Trigun87 wanna merge into my one?

@Trigun87
Copy link

@krtschmr if u like my version ^_^ (btw is something u should do or something i should do ? never merged anything :-P)

@krtschmr
Copy link
Author

@jmverges how to work in this ethOS FRiends group? i cant create repositories or do anything...

@Trigun87 check gist:

so, my problem is that nicehash terminates the connections sometimes, and/or i dont have work. if i reboot, then they are hashing. sometimes 3/4 farm is dead over night. the issue is the reboot script. ethos 1.2.7 ( all <1.2.9) have issues then with claymore, still reporting SOME hashrate, even tho it's zero. i can't upgrade to 1.3.0 since powerplay messes up and we would use 8% more electricity

this should fix it. maybe usefull for anybody?
https://gist.github.com/krtschmr/a915ee7fa9c9c42961a2376dfebf208b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants