Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on several architectures #313

Closed
df7cb opened this issue May 25, 2022 · 6 comments
Closed

Crash on several architectures #313

df7cb opened this issue May 25, 2022 · 6 comments
Assignees
Labels

Comments

@df7cb
Copy link
Contributor

df7cb commented May 25, 2022

Seen on Debian on ppc64el with pointcloud 1.2.2:

16:03:12   SELECT PC_AsText(PC_Explode(PC_Patch(pt))) FROM pt_test;
16:03:12 ! server closed the connection unexpectedly
16:03:12 ! 	This probably means the server terminated abnormally
16:03:12 ! 	before or while processing the request.
16:03:12 ! connection to server was lost

Build log: https://pgdgbuild.dus.dg-i.net/job/pgpointcloud-binaries/27/architecture=ppc64el,distribution=sid/console
Build status: https://pgdgbuild.dus.dg-i.net/job/pgpointcloud-binaries/

@df7cb df7cb changed the title Crash on ppc64el and some 32-bit intel builds Crash on several architectures Jul 11, 2022
@df7cb
Copy link
Contributor Author

df7cb commented Jul 11, 2022

@pblottiere
Copy link
Member

Hello @df7cb,

First thank you for reporting this issue.

By taking a look with Valgrind on a pc_explode request, I noticed that fishy message:

==154307== 4 errors in context 3 of 4:
==154307== Conditional jump or move depends on uninitialised value(s)
==154307==    at 0x123D6708: pc_schema_from_pcid (pc_pgsql.c:389)
==154307==    by 0x123D4C63: pcpatch_unnest (pc_access.c:584)
==154307==    by 0x3C84E7: ExecMakeFunctionResultSet (execSRF.c:614)
==154307==    by 0x3EA717: ExecProjectSRF (nodeProjectSet.c:175)
==154307==    by 0x3EA804: ExecProjectSet (nodeProjectSet.c:105)
==154307==    by 0x3E42D8: UnknownInlinedFun (executor.h:248)
==154307==    by 0x3E42D8: ExecLimit (nodeLimit.c:96)
==154307==    by 0x3BF8D3: ExecProcNode (executor.h:248)
==154307==    by 0x3BF8D3: ExecutePlan (execMain.c:1632)
==154307==    by 0x3BF8D3: standard_ExecutorRun (execMain.c:350)
==154307==    by 0x5271BA: PortalRunSelect (pquery.c:921)
==154307==    by 0x5285CF: PortalRun (pquery.c:765)
==154307==    by 0x524146: exec_simple_query (postgres.c:1238)
==154307==    by 0x525DE9: PostgresMain (postgres.c:4347)
==154307==    by 0x234E5E: main (main.c:206)
==154307==  Uninitialised value was created by a heap allocation
==154307==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==154307==    by 0x65A78A: AllocSetContextCreateInternal (aset.c:468)
==154307==    by 0x640BE6: init_MultiFuncCall (funcapi.c:85)
==154307==    by 0x123D4C34: pcpatch_unnest (pc_access.c:568)
==154307==    by 0x3C84E7: ExecMakeFunctionResultSet (execSRF.c:614)
==154307==    by 0x3EA717: ExecProjectSRF (nodeProjectSet.c:175)
==154307==    by 0x3EA804: ExecProjectSet (nodeProjectSet.c:105)
==154307==    by 0x3E42D8: UnknownInlinedFun (executor.h:248)
==154307==    by 0x3E42D8: ExecLimit (nodeLimit.c:96)
==154307==    by 0x3BF8D3: ExecProcNode (executor.h:248)
==154307==    by 0x3BF8D3: ExecutePlan (execMain.c:1632)
==154307==    by 0x3BF8D3: standard_ExecutorRun (execMain.c:350)
==154307==    by 0x5271BA: PortalRunSelect (pquery.c:921)
==154307==    by 0x5285CF: PortalRun (pquery.c:765)
==154307==    by 0x524146: exec_simple_query (postgres.c:1238)

Then, when I run that specific request a lot, this leads to a crash sometimes.

Actually, it's due to the fact that we check if fcinfo->flinfo->fn_extra is NULL on the first call (after SRF_IS_FIRSTCALL()) for intializing the internal cache of schemas. However, it's not true due to SRF_FIRSTCALL_INIT() so it leads to some unstable behavior because the cache is not always initialised (thus the Conditional jump or move depends on uninitialised value(s) message returned by Valgrind).

I'm currently working on a PR for fixing that.

@df7cb
Copy link
Contributor Author

df7cb commented Aug 22, 2022

Glad to see progress here, thanks!

@pblottiere
Copy link
Member

Hello @df7cb,

I merged some bugfixes and made a new 1.2.3 release. Let me know if you still have issues.

@df7cb
Copy link
Contributor Author

df7cb commented Sep 13, 2022

Thanks Paul!

The architectures are fine now, modulo some test differences on big-endian machines that I muted in the Debian package, but that isn't a problem.

@pblottiere
Copy link
Member

The architectures are fine now

Good news 🎉

modulo some test differences on big-endian machines

Yes I didn't forget your previous ticket #278. I'm going to take a look asap.

Thanks a lot for your work @df7cb 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants