Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Initial bitfields support #215

Closed
wants to merge 1 commit into from
Closed

Conversation

XVilka
Copy link
Contributor

@XVilka XVilka commented Jan 3, 2019

Trying to address the problem of supporting bitfields (#205), please see the initial draft and provide your opinion/criticism/suggestions.

Please note, for now I implemented bitfields only for an assumption that it is GCC-like behavior, with __attribute__((packed)) or #pragma pack (push,1)

At this point only reading is implemented. Once it is polished and ready - I will do writing too, since more complex.

@XVilka
Copy link
Contributor Author

XVilka commented Jan 3, 2019

Travis error seems unrelated:

opam show /tmp/opam-ci-192 --raw
[WARNING] Failed checks on cstruct package definition from source at file:///tmp/opam-ci-192:
             error 57: Synopsis and description must not be both empty
Errors.
             error 57: Synopsis and description must not be both empty
'unset TESTS; OPAMYES=1 export OPAMYES; set -uex; cp cstruct.opam /tmp/opam-ci-192 && opam show /tmp/opam-ci-192 --raw | opam lint -' exited 1. Terminating with 1
The command "bash -ex ./.travis-docker.sh" exited with 1.

@XVilka
Copy link
Contributor Author

XVilka commented Jan 4, 2019

Please provide your feedback, so I can change the PR accordingly.

@avsm
Copy link
Member

avsm commented Jan 4, 2019

Thanks very much for all the hard work you've put into this! I'm a little torn about the approach, since we are reimplementing some difficult pieces of a C compiler directly within OCaml. In particular, the restriction on attributed-packed is what has me worried.

Before I start reviewing this, I wanted to open one alternate possibility. The ctypes library has comprehensive infrastructure to do compile-time probing of the C compiler in use (including cross-compilation) in order to determine various struct layouts. I've wanted a cstruct->ctypes bridge for some time, so one possibility is to use Ctypes' precise definitions of C layout to implement the Cstruct ppx for bitfields.

Another high level question: what is the usecase you want to solve with bitfields? I just want to make sure that the restrictions placed on the current PR (e.g. packed structs only) are compatible with what you need (e.g. is it a bitfield from a network protocol?)

@XVilka
Copy link
Contributor Author

XVilka commented Jan 5, 2019

@avsm yes, it is required for file formats parsing. Problem, there is no exact definition of even C compiler behavior in case of bitfields (this is why modern formats and protocols prefer to avoid them). Some compilers behave one way, some - another way. Most of the formats and protocols rely on the packed tightly, sequential bitfield layout though.

@XVilka
Copy link
Contributor Author

XVilka commented Jan 7, 2019

I cleaned up the code, solved the problem with length handling and now tests are green.

Strange... seems running dune runtest ppx_test doesn't really fail if I put a wrong test, or even doesn't print anything if I add Printf.printf "blabla\n";

@XVilka
Copy link
Contributor Author

XVilka commented Jan 7, 2019

Added also a fix for hexdump, but don't know how to embed the function properly in the PPX.

@XVilka
Copy link
Contributor Author

XVilka commented Jan 7, 2019

Nevermind, fixed also, now it is ready for the review - if the approach is good, I will add also the setters code.

@XVilka
Copy link
Contributor Author

XVilka commented Jan 7, 2019

Wait, it will not work for big endian machines, since on BE it should read bit-by-bit from the most significant bit, not just swap bytes and read from the least significant bits.

@XVilka
Copy link
Contributor Author

XVilka commented Jan 8, 2019

Ok, pushed also the big endian proper support, tested with various bitfield layouts.

@XVilka
Copy link
Contributor Author

XVilka commented Jan 9, 2019

With big-endian bitfields I met one problem - basically I need to know the size of the structure to count bits from the end, but t.len stores the size of the Cstruct buffer instead. Any advice?

@XVilka
Copy link
Contributor Author

XVilka commented Jan 15, 2019

So? What do you think about this approach?

@XVilka
Copy link
Contributor Author

XVilka commented Jan 21, 2019

Ping?

@avsm
Copy link
Member

avsm commented Jan 21, 2019

thanks for continuing with this -- i'm afraid I won't have time for a detailed review until next week at the earlier, but pinging @mirage/devs if anyone else has time.

To qddress this:

With big-endian bitfields I met one problem - basically I need to know the size of the structure to count bits from the end, but t.len stores the size of the Cstruct buffer instead. Any advice?

t.len is the integer length subset that the cstruct stores the underlying bigarray as well (but you shouldn't overflow that since the cstruct subset is the only thing that is valid to read from the cstruct interface).

It would be really useful to have a concrete network protocol that you want to support with this feature, so that we can verify that the LE/BE parsing is consistent. In particular, it's not clear to me what use non-network-endian parsing is in the case of bitfields since that's what all the internet protocols would use. Is there a consumer for this extra complexity?

@XVilka
Copy link
Contributor Author

XVilka commented Jan 21, 2019

@avsm well, see for example SquashFS 1.x filesystem format: https://github.com/plougher/squashfs-tools/blob/master/squashfs-tools/squashfs_compat.h#L440

Depending on the endianess it can be treated differently, and even C code working with this is a mess. This is why I want to have bitfields in Cstruct out of the box.

@dinosaure dinosaure self-requested a review January 21, 2019 17:48
@avsm
Copy link
Member

avsm commented Jan 22, 2019

@XVilka -- squashfs sounds like an excellent motivator for this feature. Thanks for the clarification.

@XVilka
Copy link
Contributor Author

XVilka commented Feb 13, 2019

Sorry, was busy with other things, will finish the PR in a few days.

@XVilka
Copy link
Contributor Author

XVilka commented Mar 6, 2019

Ok, rebased, updated a bit, now it passes tests also. Let me refactor and cleanup a bit, also I think #231 should be merged first, so I will rebase on top of it.

@XVilka
Copy link
Contributor Author

XVilka commented Mar 13, 2019

@dinosaure @avsm can you please take a look again? I resolved your comment on splitting the function, it is rebased upon latest master and passes dune runtest.

@XVilka
Copy link
Contributor Author

XVilka commented Mar 14, 2019

@emillon you might be interested in reviewing this one too, since you did a lot of changes lately.

Copy link
Contributor

@emillon emillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is a reasonable feature to have, however I don't think I'd be comfortable merging this for now: this adds a very large amount of new code (+50% compared to the existing base), with complex branching, and changes important assumptions that this library makes (for example, it's harder to understand what underlying bytes are affected by each load or store).

If this is possible, I suggest building a separate library on top of cstruct that deals with bitfields and unaligned accesses. If this would rely on primitives that don't exist yet in cstruct, we can add these more easily.

What do you think of this plan?

@@ -27,10 +27,11 @@ type buffer = (char, Bigarray.int8_unsigned_elt, Bigarray.c_layout) Bigarray.Arr
* indeed maintains this invariant.
*)

(* Note that offset is stored in both bits and bytes! *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to rename the field as off_bits or encapsulate this in an abstract size type - that's better than a comment in my mind

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the end, this update adds lot of headache when {get,set}_uint{8,16,32,64} did not changes and believed on off concerns bytes (and not bits).

For example, memset divides by 8 off, I don't understand why.

And it seems that any get_bits requires an absolute off as an argument (and did not rely on internal off offset).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change these functions to use t.off if you think it is better, that's not a problem.

@@ -282,13 +283,923 @@ let get_uint64 swap p t i =
let r = ba_get_int64 t.buffer (t.off+i) in
if swap then swap64 r else r [@@inline]

(* o - bitoffset, i - count of bits *)
(* It assumes we use __attribute__((packed)) with alignment = 1 byte *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this assumption hold in general? What are the alternatives? Should this be configurable? 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could make it configurable, then it would become not +50%, but +500% of additional code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only viable alternative if we do not assume this is to switch to using ocaml-ctypes (which does handle real C alignments) in order to determine a struct layout (see my earlier comment).

I'm in two minds about how to handle this usecase -- an alternative approach to solve the problem of bitfields is to define a ctypes struct layout, do the stub generation/probing that ctypes does already, and then re-expose those offsets using the Cstruct conventions. That would involve actually removing almost all the code in ppx_cstruct since the C compiler would do the bulk of the work then :-)

get_uint16 swap p t i

let get_uint32_chop_boundary swap p t i =
if i + 4 > t.len then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that these branches could be refactored in to a match i - t.len with ... (or match () with _ when cond1 -> ... | _ when cond2 -> ... which would make all branches at the same level)

else
(* Check if it crosses the boundary *)
if i mod align = 0 then
(* TODO: Consider also the length of the buffer here *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's asking you to do this :)

[%expr fun v ->
[%e match prim with
|Char |UInt8 ->
[%expr [%e m "get_bits_uint8"] v [%e num f.off] [%e num len]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These bits can be factored out by computing the getter function for each prim type.

@XVilka
Copy link
Contributor Author

XVilka commented Mar 14, 2019

If this is possible, I suggest building a separate library on top of cstruct that deals with bitfields and unaligned accesses. If this would rely on primitives that don't exist yet in cstruct, we can add these more easily.

I will think about it. Might be a good idea, something like cstruct-bitfield, adding the ability to parse [@bits X] attribute and expanding it before passing the rest to Cstruct. I only have to think about on how to separate the code from the Cstruct.

@XVilka
Copy link
Contributor Author

XVilka commented Mar 14, 2019

I skimmed through code and I don't think it is feasible to do it, since it affects other, non-bit fields, so I have no idea how to transform the AST the way that will be still useful for Cstruct and it will use right offsets, affected by alignments and bits count. See https://github.com/mirage/ocaml-cstruct/pull/215/files#diff-5a05ea9a41bc9b18b0a60f2946652f18R130 for example, it changes the offset value, depending if previous field was a bitfield.

Copy link
Member

@dinosaure dinosaure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a fast pass on your PR and some comments come to my mind again. As @emillon said, your PR will change a lot what cstruct. Semantics are not really coherent (see my first review) and it will be easily a bug niche at the end for others contributors/reviewers.

I will review more deeply next week your PR but I start to believe that it may be more interesting to separate your task in another sub-module which will be more coherent I think.

NOTE: I'm not able to review ppx things.

@@ -27,10 +27,11 @@ type buffer = (char, Bigarray.int8_unsigned_elt, Bigarray.c_layout) Bigarray.Arr
* indeed maintains this invariant.
*)

(* Note that offset is stored in both bits and bytes! *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the end, this update adds lot of headache when {get,set}_uint{8,16,32,64} did not changes and believed on off concerns bytes (and not bits).

For example, memset divides by 8 off, I don't understand why.

And it seems that any get_bits requires an absolute off as an argument (and did not rely on internal off offset).

@avsm
Copy link
Member

avsm commented Mar 20, 2019

I am tending to agree with the review comments here; the complexity of bitfields is enormous vs the general trend for cstruct to be more explicit with respect to atomicity of multi-word operations.

Looking at the squashfs example again, it strikes me that this would be a straightforward ctypes binding since the struct/unions are all there already, iff bitfields were added to ctypes. @XVilka, it might be worth giving that a try -- if the non-bitfield pieces of SquashFS are straightforward to bind with ctypes, then the stub generation support will give really good compiler-driven layout information and so work with both packed and non-packed attributes. (cc @yallop)

@XVilka
Copy link
Contributor Author

XVilka commented Mar 20, 2019 via email

@avsm
Copy link
Member

avsm commented Mar 20, 2019

I've been thinking for some years that the Ctypes description should be the 'type' behind the Cstruct definition. Right now the ppx just interprets a syntax directly, but doesn't really have an OCaml data structure that describes it.

The rough idea is to use Ctypes 'behind the scenes' in ppx_cstruct in order to determine the layout/offsets of each value, and then expose the Cstruct offsets as today. The main difference is that Ctypes will give accurate offsets for any C compiler, vs the reimplementation we have to do today in ppx_cstruct.

@avsm
Copy link
Member

avsm commented Mar 20, 2019

Another thought with this patchset is that to reduce the risks to non-bitfield users, we could also copy the code into a ppx_cstruct_bitfield package here, and maintain it as an experimental ppx alongside ppx_cstruct.

If we advertise it as experimental, it's ok for it to not keep up with the featureset in ppx_cstruct (which I doubt will develop very fast anyway), and @XVilka doesn't need to keep a private fork for his usecases.

@XVilka
Copy link
Contributor Author

XVilka commented Mar 21, 2019

Ok, I will think how to proceed.

@XVilka
Copy link
Contributor Author

XVilka commented Dec 9, 2019

Should I close this one? Seems unlikely it will be the part of Cstruct itself. I just wonder if it is possible to do somehow outside of Cstruct without much of the copy-pasting. Any advice would be welcomed.

@avsm
Copy link
Member

avsm commented Dec 9, 2019

I never got a reply to my comment earlier proposing a way forward.

@XVilka
Copy link
Contributor Author

XVilka commented Dec 9, 2019

@avsm oh, sorry. I indeed can make a separate module/package for ppx. I will do this.

@mrvn
Copy link

mrvn commented Nov 2, 2020

Bitfields should not assume or require attribute packed. I'm not even sure packed has the effect you assume. Lets start with how bitfields are layed out by the compiler. Hopefully from the examples you can build some tests to cover all the corner cases:

The layout of a bitfield can be complex. The first problem is that you have to figure out if the bitfield starts counting from the MSB or LSB. It usually matches the endianness but you just have to know what the compiler will do. I don't believe there is any define for this. All the examples below match what gcc does. I believe this is universal but that's what test cases are for.

The next thing is that the compiler will pack members of a bitfield into the specified type one after the other but a member never bridges the border of the specified type. So for example

struct {
    uint8_t a:5;
    uint8_t b:5;
    uint8_t c:5;
};

will be 3 bytes, each holding 5 bit data and 3 bit padding. The bits are not packed into 15 bit data and 1 bit padding as you get with

struct {
    uint16_t a:5;
    uint16_t b:5;
    uint16_t c:5;
};

Alignment and padding rules of the bitfield follow the specified type. This can be tricky when you mix types:

struct {
    uint16_t a : 7;
    uint8_t  b : 5;
};

Here a takes the first 7 bits, obviously. But then there is 1 padding bit, 5 bits for b and 3 padding bits again. b is packed together with a into a single uint16_t but aligned so it doesn't cross the uint8_t border.

struct {
    uint8_t  a : 5;
    uint16_t b : 7;
};

Similar a and b are combined into a single uint16_t. But now there is no padding between them as b does not cross an uint16_t boundary.

Let me describe this in another way. Bitfields will be packed as closely as possible except padding is added so that each member can be accessed (as if) by a single aligned read/write of the specified type for that member. So in the last example b must be readable with an aligned 16 bit access and then masking and shifting. No padding is required for this. In the example before that b must be readable with a single 8 bit access, which requires 1 padding bit.

PS: I just checked. attribute(packed) ignores those rules and packs bits even more closely. BUT
foo.c:8:1: note: offset of packed bit-field ‘b’ has changed in GCC 4.4
That's probably old enough so it can be ignored. I assume prior to 4.4 packed did not pack bitfields. How hard would it be to support packed and not packed?

@XVilka
Copy link
Contributor Author

XVilka commented Nov 3, 2020

@mrvn thanks a lot for the insights. Sadly, after multiple tries I decided the resulting code would be too complex. Thus, I am not planning to pursue this goal anymore - I handled all bitfields manually in the corresponding parsers.

@XVilka
Copy link
Contributor Author

XVilka commented Aug 17, 2021

I am closing this then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants