Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Fix: SSE Events lines MUST NOT contain \r #5868

Open
wants to merge 6 commits into
base: 3.1
Choose a base branch
from

Conversation

mkarg
Copy link
Member

@mkarg mkarg commented Feb 15, 2025

According to https://html.spec.whatwg.org/multipage/server-sent-events.html#parsing-an-event-stream any line within an SSE Event MUST NOT contain any of the characters \n, \r nor the combination \r\n.

@mkarg
Copy link
Member Author

mkarg commented Feb 16, 2025

@jansupol This PR fails because of incorrect Copyright check in Jersey's POM.xml. According to Eclipse Foundation's rules, all projects MUST accept the short form having only the initial publication date (see https://www.eclipse.org/projects/handbook/#ip-copyright-headers). Apparently Jersey's POM.xml expects to find the latest date, which is wrong. What is your decision how to proceed?

@jansupol
Copy link
Contributor

@mkarg In Jersey project, we follow the advice of Oracle legal department to contain the copyright year with the last year of a change. This is enforced by the glassfish copyright plugin created for that purpose. Do you have a hard time to increase the copyright year in the changed files?

@mkarg
Copy link
Member Author

mkarg commented Feb 19, 2025

@mkarg In Jersey project, we follow the advice of Oracle legal department to contain the copyright year with the last year of a change. This is enforced by the glassfish copyright plugin created for that purpose. Do you have a hard time to increase the copyright year in the changed files?

You mean, besides me being an Eclipse Committer Member bound solely to Eclipse Foundation rules, not employed with Oracle, not bound to Oracle-internal rules?

The EF is pretty clear here:

Do we need to specify a range of years in the copyright statement?
No. In the past, legal advice was that the year of the initial creation of the content and the year of the last change should be reflected in the copyright header. This is no longer the case. Specify the year that the content was initially created in the copyright statement.

@mkarg
Copy link
Member Author

mkarg commented Feb 21, 2025

@jansupol FYI: Fixed Copyright according Oracle rules.

@mkarg
Copy link
Member Author

mkarg commented Feb 23, 2025

Apparently you did not find the time to review / merge this PR, so I used the time to author a commit ontop with a unit test for DataLeadStream. If I didn't miss something, it should contain all possible edge cases around EOL handling (at the start, at the end, mixing of write(int) and write(char[]) etc. Maybe it is beneficial for the review, it actual was very beneficial when authoring performance improvements of SSE (to be found eventually in separate PRs once this one got merged). 😃

@mkarg
Copy link
Member Author

mkarg commented Feb 24, 2025

@jansupol Anything more needed to review / merge this bug fix? 🤔

@jansupol
Copy link
Contributor

This is what I did: I created a brief test as follows:

    private static final class DataLeadStream2 extends OutputStream { //Current PR DataLeadStream
        private final OutputStream entityStream;

        private int lastChar = -1;

        DataLeadStream2(final OutputStream entityStream) {
            this.entityStream = entityStream;
        }

        @Override
        public void write(final int i) throws IOException {
            if (lastChar == -1) {
                entityStream.write(DATA_LEAD);
            } else if (lastChar != '\n' && lastChar != '\r') {
                entityStream.write(lastChar);
            } else if (lastChar == '\n' || lastChar == '\r' && i != '\n') {
                entityStream.write(EOL);
                entityStream.write(DATA_LEAD);
            }

            lastChar = i;
        }

        void finish() throws IOException {
            if (lastChar != -1) {
                write(-1);
            }
        }
    }

    private static final class DataLeadStream1 extends OutputStream { // Previous PR DataLeadStream
        private final OutputStream entityStream;
        private int lastChar = '\n';

        DataLeadStream1(final OutputStream entityStream) {
            this.entityStream = entityStream;
        }

        @Override
        public final void write(final int i) throws IOException {
            if (lastChar == '\n') {
                entityStream.write(DATA_LEAD);
            }
            entityStream.write(i);
            lastChar = i;
        }
    }

    private static final class DataLeadStream0 extends OutputStream { // Original class DataLeadStream
        private final OutputStream entityStream;
        private boolean start = true;

        private DataLeadStream0(OutputStream entityStream) {
            this.entityStream = entityStream;
        }

        @Override
        public void write(final int i) throws IOException {
            if (start) {
                entityStream.write(DATA_LEAD);
                start = false;
            }
            entityStream.write(i);
            if (i == '\n') {
                entityStream.write(DATA_LEAD);
            }
        }
    }

    public static void main(String[] args) throws IOException {
        int SIZE = 100000;
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i != 200 * SIZE; i++) {
            sb.append("0123456789");
        }

        OutputStream voidOS = new OutputStream() {
            @Override
            public void write(int b) throws IOException {
                //Ignore
            }
        };

        long time = System.currentTimeMillis();
        DataLeadStream2 dls = new DataLeadStream2(voidOS); //Try various DataLeadStreams
        for (int i = 0; i != 1000000 / SIZE; i++) {
            dls.write(sb.toString().getBytes());
            dls.finish();                                                                     //This slows DataLeadStream2 down by about 50%
        }
        System.out.println(System.currentTimeMillis() - time);
    }

The OutputStream performance behaves differently for SIZE= 100 & SIZE=10000. The OutputStream0 is better for short messages, OutputStream1 for large messages (SIZE > 10000), but OutputStream2 is now the slowest.

What exactly is the purpose of this change? The original PR mentioned performance, this mentions the \r data in the message, but the real reason to me was the empty message at the end. Can you provide a use-case which justifies the change in SSE? Thanks.

@mkarg
Copy link
Member Author

mkarg commented Feb 26, 2025

TL;DR: The purpose of this PR is not performance but correctness solely, w.r.t to what is told in the PR's description (this PR is just a bug fix). Performance will get recovered by a subsequent PR.

Explanation: The original PR you mention had the intention of improving performance, but we both agreed that it fails because it fixes one bug but opens another bug, plus there since ever was already a bug with \r\n not getting detected as one single EOL. Hence, I have separated work as announced: First, this PR guarantees 100% correct syntax handling of all combinations of \r, \n, \r\n etc. Second, once this PR is merged, there will be another PR ontop which improves performance (by implementing write(char[], int, int). The original PR you mentioned will be superseded by that one. After both new PRs got merged, the status will be that:

  • Handling of `\r\n´ is 100% correct then (thanks to this PR).
  • Performance will be much better as strings then get sent using much less OutputStream.write calls compared to the status quo (which does one such call per char) - in the best case, a single write per String (thanks to subsequent PR); in the worst case, it performs as the status quo.

@jansupol
Copy link
Contributor

While I agree that the current state does not work exactly as the SSE standard describes for the corner case of sending new lines, and there is an extra unnecessary empty message, I do not see a legitimate reason for making a change that sacrifices the performance. I agree that the change might be beneficial, but only if we had a similar performance.

Second, once this PR is merged, there will be another PR ontop which improves performance

Sorry, we cannot do a merge that significantly changes the performance with a hope that some future work may fix it, knowing that it may never come. I am sure you understand this.

@mkarg
Copy link
Member Author

mkarg commented Feb 28, 2025

I do not agree that bug fixes must only get merged if they do not sacrifice performance, as this is rather often that case, actually.

Nevertheless, I will start benchmarking with my already developed performance improvement, so we have comparable numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants