[Bug] 0.9.23 - The game engine died unexpectedly! (exit code 217)
I am the current package maintainer for Fedora Linux and I just built 0.9.23. You can actually finish the game but there's always a pop-up window saying:
The game engine died unexpectedly! (exit code 217)
We are very sorry for the inconvenience
If this keeps happening, please click the ‘Feedback’ button in the main menu!
I tried playing a demo to see if I could reproduce the problem directly with hwengine but none of the demos are new enough but I did get this:
$ hwengine Download/mission_3_trzecie_podejscie.51.hwd
Attempting to play demo file "Download/mission_3_trzecie_podejscie.51.hwd"
Hedgewars engine 0.9.23-r12836 (8610462e3d33) with protocol #53
Init SDL... ok
Init SDL_ttf... ok
Init SDL_image... ok
Loading /Graphics/hwengine.png [flags: 8] ok (32x32)
Number of game controllers: 0
Not using any game controller
Getting game config...
Loading /Fonts/DejaVuSans-Bold.ttf (12pt)... ok
Loading /Fonts/DejaVuSans-Bold.ttf (24pt)... ok
Loading /Fonts/DejaVuSans-Bold.ttf (10pt)... ok
Loading /Fonts/wqy-zenhei.ttc (12pt)... ok
Loading /Fonts/wqy-zenhei.ttc (24pt)... ok
Loading /Fonts/wqy-zenhei.ttc (10pt)... ok
Loading progress sprite: Loading /Graphics/Progress.png [flags: 6] ok (324x972)
Lua: Missions/Campaign/A_Classic_Fairytale/journey.lua loaded
Lua: /Scripts/Locale.lua loaded
Lua: /Scripts/Animate.lua loaded
Protocol version mismatch: engine is too new (got 51, expecting 53)
Freeing resources...
FIXME FIXME FIXME. App shutdown without full cleanup of texture list; read game0.log and please report this problemAn unhandled exception occurred at $00007F063C3B516C:
EAccessViolation:
$00007F063C3B516C
Not sure if it helps any...
Thanks,
Richard
Could you please run hwengine against a more reason test game? One that ends in .53.hwd I can send you one if you need one.
--
Oh, what the heck. 1PLXzL1CBUD1kdEWqMrwNUfGrGiirV1WpH <= tip a hedgewars dev
Yes please, I looked through the demo section but did not see one for engine 53...
Thanks,
Richard
Well I tried using gdb to get some additional info but I'm assuming hwengine is in haskell? I installed the appropriate debuginfo package but didn't get any symbols:
Thread 1 (Thread 0x7ffff7fa7300 (LWP 23534)):
#0 0x00007ffff635416c in _dl_catch_error () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007ffff65d56a5 in _dlerror_run () from /lib64/libdl.so.2
No symbol table info available.
#2 0x00007ffff65d501f in dlclose () from /lib64/libdl.so.2
No symbol table info available.
#3 0x00007ffff4027bee in CleanupVendorNameEntry.constprop.4 ()
from /lib64/libGLX.so.0
No symbol table info available.
#4 0x00007ffff402a161 in __glXMappingTeardown () from /lib64/libGLX.so.0
No symbol table info available.
#5 0x00007ffff4022dd7 in __glXFini () from /lib64/libGLX.so.0
No symbol table info available.
#6 0x00007ffff7de74b3 in _dl_fini () from /lib64/ld-linux-x86-64.so.2
No symbol table info available.
#7 0x00007ffff6239fc8 in __run_exit_handlers () from /lib64/libc.so.6
No symbol table info available.
#8 0x00007ffff623a01a in exit () from /lib64/libc.so.6
No symbol table info available.
#9 0x00007ffff621f891 in __libc_start_main () from /lib64/libc.so.6
No symbol table info available.
#10 0x0000000000000000 in ?? ()
No symbol table info available.
Ok, I got the proper debuginfo files installed and got the following
In GDB:
FIXME FIXME FIXME. App shutdown without full cleanup of texture list; read game0.log and please report this problem
Thread 1 "hwengine" received signal SIGSEGV, Segmentation fault.
__GI__dl_catch_error (objname=0x28d2c30, errstring=0x28d2c38, mallocedp=0x28d2c28, operate=0x7ffff65d4ff0 , args=0x2942aa0) at dl-error-skeleton.c:187
187 c.objname = objname;
In the GDB log:
Thread 1 (Thread 0x7ffff7fa7300 (LWP 24531)):
#0 __GI__dl_catch_error (objname=0x28d2c30, errstring=0x28d2c38, mallocedp=0x28d2c28, operate=0x7ffff65d4ff0 , args=0x2942aa0) at dl-error-skeleton.c:187
errcode = 0
c = {objname = 0x7fffb9b86fb8, errstring = 0xb9b86fc0, malloced = 0x28d2c28, errcode = 0x7fffffffd6b4, env = {{__jmpbuf = {281479271743488, 1, 281479271743489, 65537,
281479271743489, 140733193388033, 281479271743489, -6068862207235213056}, __mask_was_saved = 65537, __saved_mask = {__val = {43264400, 0, 140737289327168, 43264408,
45355424, 0, 12377881866474338560, 0, 43263888, 43263888, 140737289327168, 0, 140737488345304, 140737289324280, 140737323254718, 0}}}}}
old = 0x7fffb9b80000
#1 0x00007ffff65d56a5 in _dlerror_run (operate=operate@entry=0x7ffff65d4ff0 , args=0x2942aa0) at dlerror.c:163
result = 0x28d2c20
#2 0x00007ffff65d501f in __dlclose (handle=) at dlclose.c:46
No locals.
#3 0x00007ffff4027bee in CleanupVendorNameEntry (pEntry=pEntry@entry=0x2942790, unused=0x0) at libglxmapping.c:321
vendor = 0x2942790
#4 0x00007ffff402a161 in __glXMappingTeardown (doReset=doReset@entry=0) at libglxmapping.c:1061
cur__GLXvendorNameHash = 0x2942790
tmp__GLXvendorNameHash = 0x0
pEntry =
tmp =
#5 0x00007ffff4022dd7 in __glXFini () at libglx.c:2099
glas =
#6 0x00007ffff7de74b3 in _dl_fini () at dl-fini.c:235
array = 0x7ffff42306f0
i =
l = 0x28d3160
maps = 0x7fffffffd838
i =
l =
nmaps =
nloaded =
ns = 0
do_audit =
__PRETTY_FUNCTION__ = "_dl_fini"
#7 0x00007ffff6239fc8 in __run_exit_handlers (status=0, listp=0x7ffff65ce5b8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:83
atfct =
onfct =
cxafct =
f =
#8 0x00007ffff623a01a in __GI_exit (status=) at exit.c:105
No locals.
#9 0x00007ffff621f891 in __libc_start_main (main=0x0, argc=0, argv=0x0, init=, fini=, rtld_fini=, stack_end=0x0)
at ../csu/libc-start.c:329
result =
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {4309931, 0, 4217114, 140737488346240, 4210861, 140737322809482, 140737322809482, 140737488346280}, mask_was_saved = -9080}},
priv = {pad = {0x3f4c86b30, 0x404084, 0x0, 0x598d0d1c8f7b7226}, data = {prev = 0x3f4c86b30, cleanup = 0x404084, canceltype = 0}}}
not_first_call =
#10 0x0000000000000000 in ?? ()
No symbol table info available.
Bottom of game0.log:
106021: [Con] ok
106256: [Cmd] confirm
106256: [Con] Freeing resources...
106256: FreeActionsList called
106256: [Con] FIXME FIXME FIXME. App shutdown without full cleanup of texture list; read game0.log and please report this problem
106256: Texture not freed: width=15 height=17 priority=0
halt at 106256 ticks. TurnTimeLeft = 2762
https://m8y.org/hw/test.53.hwd a quick demo against 0.9.23 - please run that
--
Oh, what the heck. 1PLXzL1CBUD1kdEWqMrwNUfGrGiirV1WpH <= tip a hedgewars dev
Thanks for the demo, but I used a "lastgame" and was able to get all the output above... Do you need something else?
The console output from the demo you provided is:
'Thanks for the demo, but I used a "lastgame" and was able to get all the output above... Do you need something else?'
So yeah, as noted before, your output above was an incomplete game:
Protocol version mismatch: engine is too new (got 51, expecting 53)
Followed by a texture misfree. I'd noticed we were destroying the GL context prior to that free so was hoping it was related.
Perhaps the save game was not actually being generated due to the error.
So. I wanted a game that actually ran correctly to completion.
Which we now have. Means I can compare a correctly functioning version to this one.
And... You forgot to give me game0.log ☹
Example:
/tmp$ DISPLAY=:0 ~/games/bin/hwengine test.53.hwd &> test.log
/tmp$ wc -l test.log Logs/game0.log
295 test.log
700 Logs/game0.log
995 total
Would give me a better idea of where your crash is occuring (which doesn't happen for me BTW).
Two additional questions.
1. If you run hwengine --stats-only does the crash occur?
2. If you build the engine against our physfs does the crash occur? -DPHYSFS_SYSTEM=0
--
Oh, what the heck. 1PLXzL1CBUD1kdEWqMrwNUfGrGiirV1WpH <= tip a hedgewars dev
Ok, I see your confusion... Only the first post was with the wrong game engine version. The other detailed posts was with a LastGame of the correct version.
No crash (exit code 0 I checked) when using --stats-only.
test.log:
game0.log:
I'll try rebuilding next.
Using the bundled physfs results are nearly identical:
game0.log looks identical but I didn't save a copy to do a diff...
Thanks,
Richard
The game log you just pasted into the thread does have that odd texture behaviour. Does it crash?
I do wish you were attaching them to an issue at https://issues.hedgewars.org or putting them in a pastebin. The forum is mangling these long pastes as smilies etc which makes diffing against my log difficult.
So at this moment in time I seem to have:
* a complete 0.9.23 protocol 53 game log from you that may or may not crash, and has last minute texture free.
* And a diff of a crash (with some copy/paste inconsistency) that at least confirms physfs is not to blame.
One interesting thing is:
0: OpenGL-- Renderer: GeForce GTS 450/PCIe/SSE2
0: |----- Vendor: NVIDIA Corporation
0: |----- Version: 4.6.0 NVIDIA 387.22
Personally I am using:
0: OpenGL-- Renderer: GeForce GTX 760/PCIe/SSE2
0: |----- Vendor: NVIDIA Corporation
0: |----- Version: 4.5.0 NVIDIA 384.90
We had a report of a crash in 0.9.23 from another linux nvidia closed source driver user (I use it myself due to certain games requiring it, but without crash). In his case, the crash seemed to be in a ~/.nv cache that I had no idea the driver maintained. It's curious because we haven't changed anything in our rendering layer. We switched to SDL2 in this release, but SDL2 simply acquires a GL context.
Anyway, obv for him --stats-only did not crash.
It would be interesting to see what gdb --args hwengine test.53.hwd yields as a backtrace.
HOWEVER. The failure to free all textures completely *could* be causing a crash if you are building 0.9.23-release ... maybe... we do something disturbing in the final cleanup where the fallback texture release was occuring after context release, so I switched that in:
https://hg.hedgewars.org/hedgewars/rev/12844
the release for comparison is
https://hg.hedgewars.org/hedgewars/rev/12839
You might want to maybe try hg up 0.9.23 (the branch not the release tag) to see if it helps. I'm going to try and figure out why a texture didn't get released. Not too sure what texture would be 15x17, but maybe something windbar related. I'll add more logging on creation of textures on my end.
So. Summary of things I'd like to know:
1) Crash only with binary blob NVIDIA? (y/n)
2) If yes to ① . Does changing to another version (mine? ) help?
3) Did --stats-only crash? I would expect the answer to be no if ① is true.
4) Regardless of answer to ① and ③ could I have a complete backtrace against test.53.hwd? Thanks.
--
Oh, what the heck. 1PLXzL1CBUD1kdEWqMrwNUfGrGiirV1WpH <= tip a hedgewars dev
Yes, the game does complete successfully but then I get the crash with exit code 217.
I can see about moving it to the issue tracker...
Yes, I tried it on my wife's Intel Core i5 w/ Intel graphics with the same result.
Ok, i'll see if I can create a patch from those commits (easier for package building that doing a checkout)
1) No
2) Will try a build
3) No
4) Will post to pastebin.
Thanks!
Richard
Ok, I don't seem to be getting a good debug build of hwengine... The standard Fedora compiler flags are more C/C++ centric... What do I need to do for hwengine?
Bah... I set the CMAKE_BUILD_TYPE and theoretically got debuginfo for hwengine but the stack trace still says it's missing symbols...
Well, lack of symbol table stuff would not be surprising with the nvidia blob - rather surprised you were able to reproduce on the intel machine. But if that one is using FOSS, might work better. As for the engine, yes, building with DEBUG is the way to go, that is:
cmake -DCMAKE_BUILD_TYPE="DEBUG" -DNOSERVER=1 -DBUILD_ENGINE_C=0 -DPHYSFS_SYSTEM=0
Pretty sure non-C is default, NOSERVER is just in case you have trouble w/ haskell deps, and no system physfs is to keep that variable out of the mix and to ensure symbols.
--
Oh, what the heck. 1PLXzL1CBUD1kdEWqMrwNUfGrGiirV1WpH <= tip a hedgewars dev
I can try that... Not worried about the server, I doing package builds which take care of the dependencies.
My desktop w/ nvidia is running Fedora 26 x86_64 and my wife's laptop is running Fedora 27 x86_64 without nvidia.
I did see in the build log that it looked like it was compiling the pascal to C first, so that option makes it build in native pascal?
Oh. wow.
Do not use the C build unless you're willing to put some time into it. It does NOT get enough love. It's 99% there and can in theory play games, and even has more performance but does not get enough attention to reliably use.
I would hope what you were seeing in the log was the C++ frontend tho - 99% of build time is the frontend and then the engine in pascal builds in a second or two.
Also physfs is in C
--
Oh, what the heck. 1PLXzL1CBUD1kdEWqMrwNUfGrGiirV1WpH <= tip a hedgewars dev
Ok, never mind, I was mistaken, I'm not building the C engine, I just thought it was like f2c (fortran to C) that it somehow converted the pascal to C first, but I can see that's not the case.
I do see this in the log but it's probably not relevant or a problem:
That's what that option does if you enable it yes. unc0rr wrote a pas2c converter - it was to generate llvm for emscripten/wasm - unfortuantely no one has stepped up on that front, because in theory Hedgewars has been playable in your web browser since 2011. Including fetching versioned files on-demand online and caching locally. Even AI. ah well.
(we still have a version up that will run if you download a browser from back then )
--
Oh, what the heck. 1PLXzL1CBUD1kdEWqMrwNUfGrGiirV1WpH <= tip a hedgewars dev
Ok, after following up on the Fedora development list, it appears the last line is correct, there are no symbols for address 0 so the output in comment 13 is as good as it gets.
Where do we go from here?
Thanks,
Richard
It was mentioned on the devel list that valgrind may be a better tool for this problem.
Here's the output:
https://pastebin.com/SzQYFvuW
Thanks,
Richard
Ok, here's the output from my wife's laptop which I think is a little more complete because she's using glx from mesa instead of nvidia...
https://pastebin.com/14S8fGJv
Thanks,
Richard
So. Are these builds against the 0.9.23-release tag or latest in the 0.9.23 branch?
--
Oh, what the heck. 1PLXzL1CBUD1kdEWqMrwNUfGrGiirV1WpH <= tip a hedgewars dev
Both of these are 0.9.23 plus a patch from the link you supplier earlier.
Didn't see much difference... I did a hg diff from 0.9.23 release to branch and applied it as a patch. The results are pretty much the same.
https://pastebin.com/LCDQpB5d
Can this be related to libglvnd? I don't know what other distros are using it.
Hm. No idea. Don't suppose you can test without it? And, yeah, it's definitely outside of our code. What doesn't make sense is what we could be doing to trigger it. Our GL code has not changed significantly. Hedgewars switched from SDL1.2 to SDL2 this release, but Hedgewars does its own GL calls, it doesn't use SDL for that.
--
Oh, what the heck. 1PLXzL1CBUD1kdEWqMrwNUfGrGiirV1WpH <= tip a hedgewars dev
Well, 0.9.22 works, even with libglvnd... I'm not sure how easy it is to remove or if it's even practical. I'll try tonight after $DAYJOB.
Well. If you are on linux, and have the patience to do so, bisecting would be super helpful. It could probably be semi-automated.
If you confirm the libglvnd thing tho, can try to setup an env to match.
BuildingOnLinux
EngineTestCases
The advantage of the engine test cases is they are simple Lua driven demo games that can be fired off in ctest without the strict synchronisation checks of a demo file.
So, you could just launch hg bisect, make install, ctest an arbitrary test, and look for crash.
Could mark test pass/fail based on crash even and head off for a coffee ☺
(that's assuming all our commits build ofc...)
Even so, I imagine it would probably be only a dozen builds. And, if you only built the engine (make install in hedgewars/), it builds almost instantly (frontend is Qt and kinda slow).
--
Oh, what the heck. 1PLXzL1CBUD1kdEWqMrwNUfGrGiirV1WpH <= tip a hedgewars dev
Bah... I wish something would go right
All the tests are failing with the same error (example below)
Ok, I can get all test results to pass on 0.9.22-release:
The 0.9.22 branch (12 commits more) also passed.
For 0.9.23-release one failed:
When trying to run local builds I keep getting issues with hwengine trying to load graphics files (png) and says "Unsupported image format" but hwengine is clearly linked to libpng.
My full RPM package builds don't run into this problem for some reason.
On a separate note, do you have a handy short version 51 demo?
Could it be because hwengine is linking both to SDL and SDL2?
The nice thing about building package is that it's done in a chroot with just the dependencies it needs instead of having access to every devel package installed on my system, but it makes the bisecting much more difficult.
Bugzilla ticket submitted...
https://issues.hedgewars.org/show_bug.cgi?id=570
Unsupported image format is due to physfs bug probably. That package has given us a lot of problems in past.
The one lua test not passing is not a problem. I'm not sure why a test that always fails was checked in. Need to ask the committer about that. It messed up the Debian builds for a while since they run all tests. Anyway, just need to run one test to check the graphics thing, not all of them.
--
Oh, what the heck. 1PLXzL1CBUD1kdEWqMrwNUfGrGiirV1WpH <= tip a hedgewars dev
Unsupported image format is due to physfs bug probably. That package has given us a lot of problems in past.
The one lua test not passing is not a problem. I'm not sure why a test that always fails was checked in. Need to ask the committer about that. It messed up the Debian builds for a while since they run all tests. Anyway, just need to run one test to check the graphics thing, not all of them.
Did you have any luck with removing libglvnd? You said you were going to test that..
I'll try building 4c4f22cc3fa4 and see if it errors over here. That was build against our physfs and in a clean profile right.
--
Oh, what the heck. 1PLXzL1CBUD1kdEWqMrwNUfGrGiirV1WpH <= tip a hedgewars dev
Could this be relevant?
https://bugzilla.libsdl.org/show_bug.cgi?id=2693
Thanks,
Richard
Maybe! I mean... We don't do the render ops, but we do destroy of course.
That said, it's still only happening on your system... did you have any luck w/ testing without that libglvnd?
Also, yeah, SDL2 was definitely the major change this release and has been all kinds of fun in terms of regressions. If you suspect that's the problem, instead of bisecting, could simply try building 0.9.22 with SDL2 enabled, and seeing what crashes (or doesn't).
I don't know how I would go about testing this particular bug tho. There's no activity, the bug is still open, there's no detail on what triggers it..
Does his test program crash for you?
--
Oh, what the heck. 1PLXzL1CBUD1kdEWqMrwNUfGrGiirV1WpH <= tip a hedgewars dev
Well, the original report came from a user who tested my update in Fedora, and I've reproduced it with two computers using nvidia binary drivers and one with intel graphics.
Yeah, I'm thinking it's SDL2 related but it appears both Fedora 26 and 27 are using the current stable release (2.0.7).
What's still weird is that when I build 0.9.22 on my system (with both SDL and SDL2 development packages installed) that hwengine somehow get's a link to the SDL2 library but it's not anywhere in link.res but it's getting picked up somehow.
Haven't tried removing the libglvnd packages (there's several of them and I don't know if it will break my system), but I'm attempting to setup a chroot and use Xnest to see if I can simulate it.
I may just have to cheat and force in symlinks to get it to skip libglvnd...
So, in 0.9.22 hedgewars/CMakeLists.txt the engine would run:
https://hg.hedgewars.org/hedgewars/file/0.9.22/cmake_modules/FindSDL1or2.cmake
Could short-circuit that to force 2 and see what happens maybe.
And just to be clear, when I was referring to lack of activity, I meant:
https://bugzilla.libsdl.org/show_bug.cgi?id=2693
The bug you linked me to - no idea where I'd go from there on that apart from usual debug technique we've been trying so far of isolating variables.
--
Oh, what the heck. 1PLXzL1CBUD1kdEWqMrwNUfGrGiirV1WpH <= tip a hedgewars dev
Yeah, I updated hedgewars/CMakeLists.txt based on 0.9.33 and got to compiling and then I ran into:
/builddir/build/BUILD/hedgewars-src-0.9.22/hedgewars/CMakeFiles/hwengine.dir/uStore.o: In function `USTORE_$$_CHFULLSCR$SHORTSTRING':
uStore.pas:(.text.n_ustore_$$_chfullscr$shortstring+0x130): undefined reference to `SDL_WM_SetIcon'
An error occurred while linking /builddir/build/BUILD/hedgewars-src-0.9.22/bin/hwengine
make[2]: Leaving directory '/builddir/build/BUILD/hedgewars-src-0.9.22'
make[2]: *** [hedgewars/CMakeFiles/hwengine.dir/build.make:160: bin/hwengine] Error 1
and:
/builddir/build/BUILD/hedgewars-src-0.9.22/QTfrontend/ui/widget/about.cpp: In constructor 'About::About(QWidget*)':
/builddir/build/BUILD/hedgewars-src-0.9.22/QTfrontend/ui/widget/about.cpp:104:34: error: 'SDL_Linked_Version' was not declared in this scope
const SDL_version *sdl_ver = SDL_Linked_Version();
^~~~~~~~~~~~~~~~~~
/builddir/build/BUILD/hedgewars-src-0.9.22/QTfrontend/ui/widget/about.cpp:104:34: note: suggested alternative: 'Mix_Linked_Version'
const SDL_version *sdl_ver = SDL_Linked_Version();
^~~~~~~~~~~~~~~~~~
Mix_Linked_Version
make[2]: *** [QTfrontend/CMakeFiles/hedgewars.dir/build.make:1917: QTfrontend/CMakeFiles/hedgewars.dir/ui/widget/about.cpp.o] Error 1
It doesn't build. I already tried that.
My stuff
Outside of you setting up a test Fedora system, is there anything else I can do on the gdb / stack trace / valgrind front?
I opened an issue with libglvnd just to see what they thought...
https://github.com/NVIDIA/libglvnd/issues/141#issuecomment-351864459
hrm. that seticon thing seems familiar. I think someone else ran into that to. just bitrot in the little-used (at the time) SDL2 code. That line can be removed tho. It's not important.
--
Oh, what the heck. 1PLXzL1CBUD1kdEWqMrwNUfGrGiirV1WpH <= tip a hedgewars dev
I added some console output to see if it could tell me anything interesting but I'm not sure if it's doing any good....
Freeing resources...
Closing SDL...
Freeing textures...
Textures freed...
IO freed...
Land freed...
Painted Land freed...
It's saying textures are freed but the error seems to indicate it's missing one, right?
But then you get other errors related to symbols that were removed in SDL2. I tried that. 0.9.22 uses enough SDL1 only stuff to make it very hard to build with SDL2.
My stuff
The texture free is definitely very weird since I don't experience that myself... I put that list in a while ago when we were doing the iphone port, just to catch cleanup fails, and we haven't had any in a long time.
--
Oh, what the heck. 1PLXzL1CBUD1kdEWqMrwNUfGrGiirV1WpH <= tip a hedgewars dev
Looks like a bug in fpc 3.0.2 but I don't know why it only seems to be cropping up in Fedora...
https://bugzilla.redhat.com/show_bug.cgi?id=1526848
Thanks,
Richard
Windows 10
0.9.23 - The game engine died unexpectedly! (exit code 217)
Please help!