Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make_project keeps aborting due to a mysql error #2618

Closed
km-git-acc opened this issue Jul 21, 2018 · 24 comments
Closed

make_project keeps aborting due to a mysql error #2618

km-git-acc opened this issue Jul 21, 2018 · 24 comments

Comments

@km-git-acc
Copy link

I am a first time Boinc user. Have been trying to set up a Boinc project on a Ubuntu server but it keeps aborting at the make project stage due to a mysql incompatibility.
OperationalError: (1071, 'Specified key was too long; max key length is 767 bytes')

The issue has also been discussed here but there seems to be no reply on it in the past few months.
#2140
It seems the older versions of mysql used to work well but so far it has proved too tricky to install older versions.

The Boinc installation guide I have been trying to use is https://wiki.debian.org/BOINC/ServerGuide/Initialisation
which mentions this issue at the end but the recommended solution hasn't worked. This guide also uses php5 so I assume it's a bit old. Is there a updated guide using php7 and the latest mysql on similar lines? I have tried the server maker approach as well but the documentation for that too doesn't seem to work on the latest Ubuntu.

Also, is this the preferred way to set up Boinc on a server? If there is an easier way, please share that as well.

@marius311
Copy link
Member

Also, is this the preferred way to set up Boinc on a server? If there is an easier way, please share that as well.

You could try https://github.com/marius311/boinc-server-docker/ which might be easier as the whole server is packaged up inside a Docker container.

@km-git-acc
Copy link
Author

km-git-acc commented Jul 22, 2018

@marius311

Thanks for the reply. I tried out the docker approach but still facing some issues. This is the sequence of commands I am currently trying out.

#Docker
sudo vim /etc/apt/sources.list
     add deb http://ftp.debian.org/debian stretch-backports main
sudo apt-get remove docker docker-engine docker.io
sudo apt-get update
sudo apt-get install \
     apt-transport-https \
     ca-certificates \
     curl \
     gnupg2 \
     software-properties-common
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88
sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/debian \
   $(lsb_release -cs) \
   stable"
sudo apt-get update
sudo apt-get install docker-ce
sudo docker run hello-world

#Docker-compose
sudo curl -L https://github.com/docker/compose/releases/download/1.22.0/docker-compose-$(uname -s)-$(uname -m) -o /usr/bin/docker-compose
sudo chmod +x /usr/bin/docker-compose

sudo usermod -aG docker $USER
sudo service docker start

#boinc-server-docker
cd boinc-server-docker
sudo vim docker-compose.yml
     edit version from 3 to 2

docker-compose pull
(ERROR: Couldn't connect to Docker daemon at http+docker://localhost - is it running?

If it's at a non-standard location, specify the URL with the DOCKER_HOST environment variable.)

sudo docker-compose up -d
(ERROR: Service 'makeproject' failed to build: The command '/bin/sh -c cd /root/boinc && ./_autosetup && ./configure --disable-client --disable-manager && make' returned a non-zero code: 127)

The yml version was changed from 3 to 2 in one of the intermediate steps to get around this error

ERROR: Version in "./docker-compose.yml" is unsupported. You might be seeing this error because you're using the wrong Compose file version. Either specify a version of "2" (or "2.0") and place your service definitions under the `services` key, or omit the `version` key and place your service definitions at the root of the file to use version 1.
For more on the Compose file format versions, see https://docs.docker.com/compose/compose-file/

Great if you could take a look. The server is a GCP VM with Debian 9.

@marius311
Copy link
Member

I think you missed a *sudo* docker-compose pull which is why you got that error. Once you do that, nothing will have to built locally so then next command probably won't error either.

Also, changing the version from 3 to 2 definitely is not a workaround, that message is misleading; you should leave it as it was originaly. I'm guessing somehow an older version of docker-compose is being used than the one you downloaded. What's the output of docker-compose --version and which docker-compose ?

@km-git-acc
Copy link
Author

@marius311

Thanks. I have been able to make a lot of progress. Resolving the sudo issue indeed helped complete the installation, and turns out to be much simpler than other approaches.
I then set a custom url and opened it in the browser (on a different machine) which opened successfully as well. The nested links are showing up with the 127.0.0.1 domain (for eg. http://127.0.0.1/boincserver/apps.php) which return the appropriate page if i use curl on the vm server, but can't open on the browser. Another thing I noticed is that if try to set the project variable and build the project, it runs but the project variable does not change. I feel I am missing something obvious but haven't been able to figure it out :(

Documenting the full sequence of commands I have used this time.

sudo apt-get remove docker docker-engine docker.io
sudo apt-get update
sudo apt-get install \
     apt-transport-https \
     ca-certificates \
     curl \
     gnupg2 \
     software-properties-common
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88
sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/debian \
   $(lsb_release -cs) \
   stable"
sudo apt-get update
sudo apt-get install docker-ce
sudo docker run hello-world

#Docker-compose
sudo curl -L https://github.com/docker/compose/releases/download/1.22.0/docker-compose-$(uname -s)-$(uname -m) -o /usr/bin/docker-compose
sudo chmod +x /usr/bin/docker-compose

#Starting the Docker service 
sudo usermod -aG docker $USER
sudo service docker start

#boinc-server-docker
git clone --recursive https://github.com/marius311/boinc-server-docker.git
cd boinc-server-docker

#Example container
sudo docker-compose pull

#Create and start server
URL_BASE=http://a.b.c.d sudo docker-compose up -d


#My Project
cp -r example_project/with_b2d myproj1
sudo docker-compose run makeproject cat /root/project/config.xml > myproj1/images/makeproject/config.xml
sudo vim myproj1/images/makeproject/Dockerfile
     add COPY config.xml /root/project
cd myproj1
URL_BASE=http://a.b.c.d PROJECT=myproj1 sudo docker-compose up -d --build

This was the output of docker-compose --version and which docker-compose

docker-compose version 1.22.0, build f46880fe

/usr/bin/docker-compose

@marius311
Copy link
Member

turns out to be much simpler than other approaches.

Glad to hear!

The nested links are showing up with the 127.0.0.1 domain (for eg. http://127.0.0.1/boincserver/apps.php) which return the appropriate page if i use curl on the vm server, but can't open on the browser

So from the computer running the server you can access http://127.0.0.1/boincserver from a browser, is that correct? But from another computer you can't access http://a.b.c.d/boincserver, where a.b.c.d is the IP of the first computer, is that right? Can you ping the first computer from the second? Sounds like a connectivity thing.

Another thing I noticed is that if try to set the project variable and build the project, it runs but the project variable does not change

You don't need to re-build to have the environment variables take effect, you just have to redo a docker-compose up with them set. However, note that by default sudo does not propagate environment variables, e.g. check URL_BASE=something sudo env and you'l see its not set. Try sudo -E, or alternatively add your user to the docker group so you don't need sudo.

@km-git-acc
Copy link
Author

@marius311

Great. I have a newfound appreciation for the intricacies of sudo :)
I was able to start the server, and the domain related problems disappeared. Was also able to enter the docker shell and the edits to webpages reflected in the browser, although I have to still search how to make the edits permanent.
Also nice tip on not having to rebuild everytime.

Thanks again. Just to update you on the context, we are trying to set up Boinc for a computational math project, https://github.com/km-git-acc/dbn_upper_bound.

@marius311
Copy link
Member

Great to hear its working. For info on how to make changes persistent, this section basically deals with that: https://github.com/marius311/boinc-server-docker/blob/master/docs/cookbook.md#creating-your-own-project

And sounds like a cool project, I'm curious to hear how it goes. I see part of it is written in Julia, of which I'm a huge fan :)

@km-git-acc
Copy link
Author

km-git-acc commented Jul 23, 2018

@marius311
Thanks. The Julia branch is being independently managed by https://github.com/WilCrofter
Do contact him to know more.

I have been able to make a lot of progress. Have been able to permanently copy files and entire libraries as needed, and start the server (using COPY commands). Was able to edit files like project.inc and index.php and make the edits permanent again using the COPY command. Have also checked that the computation works within the docker shell. It seems starting a real server is now not far away.

Do you suggest the issue to be closed? Although I would personally recommend using the docker approach, if someone wants to still try the initial approach they may come across the same issue which should be ideally addressed as well.

@marius311
Copy link
Member

Glad this approach is working well. Don't hesitate to ask any further questions. I'll close this, lets use #2140 to track the original issue, which from what I can tell is a duplicate.

@km-git-acc
Copy link
Author

@marius311
Have been making slow but steady progress. Just started some test jobs in the Boinc client but ran into something puzzling. Great if you could point in the right direction.

Firstly, I created a docker repository of the math application and uploaded it to Dockerhub (dbnupperbound/arbcalc:v1). Then i started the Boinc server and to test it submitted two jobs.

bin/boinc2docker_create_work.py python:alpine python -c "print('Hello BOINC')"

bin/boinc2docker_create_work.py dbnupperbound/arbcalc:v1 ./abbeff 100000 0.2 0.2 20
(where the second one is one of the commands in the custom repository with sample arguments, and runs within a second on a normal terminal)

Post this i downloaded the Boinc+VB installer on my local Windows machine and attached the project. The two workunits with replicas downloaded and started running. Till this step everything seems ok.

However, what I expected to be very quick (under a second) tests seem to take a long time. The Hello Boinc command has been running for over 2 hours and the progress keeps on getting slower. The custom command also progresses slowly but gets stuck in between (with the advanced view showing waiting for memory).
Is the Boinc client running the same command multiple times and is there a way to limit the number of such runs to 1?

@marius311
Copy link
Member

marius311 commented Jul 29, 2018

Some debug ideas:

  • What does the BOINC manager say under "Tasks" for the running job?
  • If you go into the boinc client folder (generally /var/lib/boinc-client/slots/0) and look at the stderr.txt file, it'll show you the output of trying to launch the VM as well as the output of commands being run inside the VM (assuming it made it that far). What does this say?

Minor note, but you don't need the image on the Docker hub, you only need it locally on the machine running the server. And fyi, a normal overhead for the BOINC client launching the job and the VM / Docker booting up might be ~30 seconds, so definitely the 2hrs you're seeing means something is wrong.

@marius311
Copy link
Member

with the advanced view showing waiting for memory

Oh sorry, missed this. This could mean BOINC is suspending your job due to thinking it needs more memory, so its not running at all. Can you check your client settings? Also when you did the boinc2docker_create_work, what did it say it set the memory usage to? You can also try the --rsc_memory_bound option to boinc2docker_create_work to set the memory manually.

@km-git-acc
Copy link
Author

km-git-acc commented Jul 29, 2018

@marius311
Thanks. On running boinc2docker_create_work, it gives

Automatically setting memory allocation for job to 1670MB.
Automatically setting disk allocation for job to 392MB.

I had uploaded the image to Dockerhub so other collaborators could also use docker for all the commands even during the normal course of their work (since it can be somewhat messy to install the required libraries). But on running the create_work script for the first time I saw that it had downloaded the repository before creating the workunit. On running the second time now, it says 'Image already imported into Boinc. Reading existing info'.
Currently the Boinc manager says 'Running' for the python alpine job (stuck at 99.998%). For the custom job it restarts the job from time to time, goes to around 60% and then shows 'Waiting for memory'. Currently i am running the client on a windows machine and there are two files stderrgui.txt and stderrscr.txt which both are empty right now. Client settings are at 80% of CPU time and 25% of CPU allowed, and 15% of memory allowed. The actual usage seems to be far lower than these settings.
I will try setting the --rsc_memory_bound option as well trying out the client on a Ubuntu machine.

EDIT: I used the memory parameter (setting it to 100 MB) and submitted additional jobs. The client downloaded them and ran it for 10 minutes giving an error at the end. I think the highlight of the error was that the virtualbox VM took some time to start and then stopped after another 5 min and gave this error

BOINC has detected that your computer's processor supports hardware acceleration for
    virtual machines but the hypervisor failed to successfully launch with this feature enabled.
    This means that the hardware acceleration feature has been disabled in the computer's BIOS.
    Please enable this feature in your computer's BIOS.

After enabling virtualization in the BIOS (which i felt should have been by default), I again reset the server and submitted some jobs but now they are in unsent mode and do not get sent to my windows client.
On the other hand, I tried installing Virtualbox on a Ubuntu machine, but it seems quite tricky. It keeps complaining about kernel mismatches.

@marius311
Copy link
Member

marius311 commented Aug 1, 2018

Maybe your client is stuck thinking extensions are disabled? Does the solution here help? https://www.cosmologyathome.org/faq.php#i-enabled-vt-xamd-v-but-i-still-dont-receive-jobs

@km-git-acc
Copy link
Author

@marius311
Thanks. I tried that out. Although at first the line <p_vm_extensions_disabled>1</p_vm_extensions_disabled> kept reappearing, but then I uninstalled Boinc, removed all files and reinstalled it, and the line now has changed to <p_vm_extensions_disabled>0</p_vm_extensions_disabled>. I also checked whether I can run other Linux OS in virtualbox (eg. linux mint, tinycorelinux, etc.) and they are running.

Post that, I restarted the server and submitted some jobs which the client then downloaded and started executing. But here the issue remains that the job progress gets slower with time, and after about 1.75 hours, the job exists with an error and there is some output in the corresponding stderr file on the project website.

Attaching the stderr output file
boinc stderr output.txt
It seems the VM now starts successfully and there is now a new section in the log output called the Hypervisor System Log, which counts the execution time from time 0. It gives some error messages from time to time, but it's not clear which of those are the serious ones and which part of the normal process.

@marius311
Copy link
Member

From the log:

2018-08-01 23:37:15 (6372): Setting Memory Size for VM. (100MB)

Can you change the memory back to leave it at its default? The jobs generally need about 3 to 4 * the size of the unzipped Docker image at the moment.

Another option is can you try reducing the number of CPUs per job? You should just be able to edit the plan_class_spec.xml file on the server and set max_threads to 2 or even 1. (You should do so before creating the jobs, it may be best to wipe the server clean / reconnect the client and then do this for it to take effect).

Sorry you're running into all these problem, this is a good case study of how far we still have to go to make setting up a server truly "easy".

@km-git-acc
Copy link
Author

km-git-acc commented Aug 2, 2018

@marius311
Great. I made the changes and it ran this time. Felt awesome and magical when it finally happened :)
Attaching the output from one of the tasks this time (a single test computation)
boinc_successful_stderr_output.txt

I think the documentation is 95% there, and once a troubleshooting section is added like about editing the required xml files, taking certain precautions, etc. it will be much easier to get right the first time. Do let me know if you need any help with that.

@km-git-acc
Copy link
Author

@marius311
We have started the project at http://anthgrid.com/dbnupperbound
and also ran a few jobs. Many thanks for the help and do try it out.
While the server I am running it on should be fairly stable (google vm), was also curious to understand how frequently do you perform mysql backups and is bin/db_dump --dump_spec db_dump_spec.xml from within the docker shell the best way to do it. Also the /var/lib/docker/overlay2 folder fills up rapidly. How do you generally go about shrinking it from time to time..

@marius311
Copy link
Member

While the server I am running it on should be fairly stable (google vm), was also curious to understand how frequently do you perform mysql backups and is bin/db_dump --dump_spec db_dump_spec.xml from within the docker shell the best way to do it

I don't think I know what the best way is, but what I do for Cosmology@Home is putting the following in a docker-compose-admin.yml:

https://github.com/marius311/cosmohome/blob/2425e2f2b8ca9c2754088f18f91c010f6c778a04/docker-compose-admin.yml#L5-L11

which then allows a backup via docker-compose -f docker-compose.yml -f docker-compose-admin.yml run --rm backup-mysql. There's also a command in there for restoring a backup.

My version of docker doesn't seem to have /var/lib/docker/overlay2, what version are you on / what is it filling up with?

@km-git-acc
Copy link
Author

km-git-acc commented Aug 11, 2018

@marius311
Thanks. I was able to back up the database.
Also copied the backup to a replica vm and tried restoring it there using

docker-compose down
docker-compose -f docker-compose.yml -f docker-compose-admin.yml run restore-mysql backups/mysql_2018-08-11.tar

which worked as well.

My docker version is 18.06.0-ce (should be close to the latest). The /var/lib/docker/overlay2 folder is found in the host environment. It seems to contain information related to past and current volumes and has a lot of directories with long names. Also atleast when docker build commands are used it tends to fill up more.
Googled a bit and found some related forum threads, for eg.
https://forums.docker.com/t/some-way-to-clean-up-identify-contents-of-var-lib-docker-overlay/30604/17
https://stackoverflow.com/questions/46672001/is-it-safe-to-clean-docker-overlay2
I tried docker system prune, which does reclaim some space, although it seems safer to keep the boinc docker running at that time, so that none of it's dependencies get affected. Also, if something goes wrong, the mysql backups are reassuring.

@marius311
Copy link
Member

Nice, glad the backup/restore worked out. docker system prune sounds like the right thing. Reading the docs, volumes are never deleted with that unless you specify --volumes so there should be no possibility for data loss, even if none of the containers are running.

@km-git-acc
Copy link
Author

@marius311

Thanks. Also, one thing we have noticed is that sometimes when the environment changes, or a user removes and adds the project back again, all the files except the vmcontext iso and vboxwrapper get downloaded (observed on both windows and Mac machines). Those two however remain stuck and the event log shows a message like, Temporarily failed download of vboxwrapper_26200_windows_x86_64.exe: connect() failed
I have tried restarting the server, and/or doing a docker-compose down and then up again (with and without build), but sometimes it resumes downloading these two and sometimes it doesnt. Have you faced a similar issue in the past? Given that all the other task related files get downloaded, it doesn't seem to be a connectivity issue.

@marius311
Copy link
Member

Can you file an issue for this over at https://github.com/marius311/boinc-server-docker/? Also, when you do, do you have any instruction where you can reproduce 100% of the time? Just tried, I don't see anything like this e.g. on project detach/reattach.

@km-git-acc
Copy link
Author

@marius311
I have submitted the issue at marius311/boinc-server-docker#46.
For restarting the project server, I generally use (sometimes without the --build command)

docker-compose down
URL_BASE=http://anthgrid.com PROJECT=dbnupperbound docker-compose up -d --build

while for submitting jobs, I use commands like the below in a loop

bin/boinc2docker_create_work.py  --min_quorum 1 --target_nresults 1 dbnupperbound/arbcalc:v4 bash -c "./cellssum10e19 $i > /root/shared/results/cell_10e19_$i.txt"

In initial tests, we ran about 50 such jobs successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants