Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treat 'PRECONDITION_FAILED' as 'PASS' for interop scoring purposes #178

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

nt1m
Copy link
Member

@nt1m nt1m commented Jul 8, 2023

'PRECONDITION_FAILED' means that the condition tested by assert_implements_optional is false.

For instance, a test might test multiple times the same feature with different video/audio codecs that are optional. If the codec is not supported, 'PRECONDITION_FAILED' would be returned as status.

https://web-platform-tests.org/writing-tests/testharness-api.html#optional-features

This is different from the API not being supported, in that case, assert_implements_optional would fail with an exception (since it couldn't evaluate the condition), and the status would be 'ERROR'.

'PRECONDITION_FAILED' means that the condition tested by  `assert_implements_optional` is `false`.

For instance, a test might test multiple times the same feature with different video/audio codecs that are optional. If the codec is not supported, 'PRECONDITION_FAILED' would be returned as status.

https://web-platform-tests.org/writing-tests/testharness-api.html#optional-features

This is different from the API not being supported, in that case, `assert_implements_optional` would fail with an exception (since it couldn't evaluate the condition), and the status would be 'ERROR'.
@nt1m nt1m requested review from jgraham and gsnedders July 8, 2023 06:15
@DanielRyanSmith
Copy link
Contributor

This change considers every PRECONDITION_FAILED status encountered to be considered a passing status. Is that how this should always be treated? The documentation linked above mentions that "A failing assert_implements_optional during setup is reported as a status of PRECONDITION_FAILED for the test, and the subtests will not run."

Is there merit in making a fix to aggregate this differently for the WebCodecs category? I'd just like to be certain that there aren't scenarios where PRECONDITION_FAILED could also mean a test should be considered failing (not to mention the confusing wording of "FAILED" here 😅).

@nt1m
Copy link
Member Author

nt1m commented Jul 11, 2023

Usually we recommend assert_implements_optional to test optional things, but we haven't been doing it for most of Interop 2023, and have resorted to splitting tests for most part.

@jgraham
Copy link

jgraham commented Jul 11, 2023

Just treating these like passes makes me nervous; it seems very misleading to claim that not supporting a feature amounts to passing tests for that feature (in particular for something like webcodecs, if you only support, say, one media format out of four, you could get a 75% score without supporting the feature at all).

We could just ignore those subtests with a precondition failed result, which would make the per-browser score work better (you'd score out of the features that you do support rather than from the features you don't). However it would "break" the overall interop score (since that's computed based on the total number of subtests).

For webcodecs in particular, I wonder if the best solution would be to have an "interop" mode for the tests which picks a supported video format and then uses that for all the subtests (this is approximately what authors are expected to do). We could keep the per-format variants in the main wpt suite in case there are format-specific problems.

@nt1m
Copy link
Member Author

nt1m commented Jul 11, 2023

Just treating these like passes makes me nervous; it seems very misleading to claim that not supporting a feature amounts to passing tests for that feature (in particular for something like webcodecs, if you only support, say, one media format out of four, you could get a 75% score without supporting the feature at all).

In the case of WebCodecs, the precondition contains calls to the WebCodecs API, so if the browser does not support WebCodecs, you'd get an error instead of PRECONDITION_FAILED.

We could just ignore those subtests with a precondition failed result, which would make the per-browser score work better (you'd score out of the features that you do support rather than from the features you don't). However it would "break" the overall interop score (since that's computed based on the total number of subtests).

I'm also fine with this if this is an easy solution (but it sounds like it isn't).

For webcodecs in particular, I wonder if the best solution would be to have an "interop" mode for the tests which picks a supported video format and then uses that for all the subtests (this is approximately what authors are expected to do). We could keep the per-format variants in the main wpt suite in case there are format-specific problems.

This sounds similar to just excluding from Interop variants of tests for codecs that are not widely supported across all browsers (which is somewhat what I'm suggesting in #375). This solution is also fine to me.

@jgraham
Copy link

jgraham commented Jul 11, 2023

In the case of WebCodecs, the precondition contains calls to the WebCodecs API, so if the browser does not support WebCodecs, you'd get an error instead of PRECONDITION_FAILED.

We also don't correctly handle tests that return ERROR :/

This sounds similar to just excluding from Interop variants of tests for codecs that are not widely supported across all browsers (which is somewhat what I'm suggesting in #375). This solution is also fine to me.

If there's a single format we expect all participating engines to support I think only including that format in interop is the right way forward here.

@nt1m
Copy link
Member Author

nt1m commented Jul 11, 2023

If there's a single format we expect all participating engines to support I think only including that format in interop is the right way forward here.

I'm also fine with this, though I have a slight preference for keeping all of the formats that are expected to be supported widely, instead of just one. Reason being that different formats sometimes have slightly different decoding/encoding rules that are reflected in the test, so it would be nice to have that test coverage.

I honestly have no strong opinion though., I would also just be fine with just keeping one format.

@jgraham
Copy link

jgraham commented Jul 11, 2023

Yes, sorry s/one/one or more/. I just mean that if the intersection of formats that all participants will implement is not null we should restrict the Interop-included tests to that intersection rather than trying to figure out how to handle the cases where some implementations intentionally don't implement a format.

@gsnedders
Copy link
Member

Just treating these like passes makes me nervous; it seems very misleading to claim that not supporting a feature amounts to passing tests for that feature (in particular for something like webcodecs, if you only support, say, one media format out of four, you could get a 75% score without supporting the feature at all).

How can you get that without supporting the feature at all? If you don't support the feature at all, you'll get 0%, provided the tests fail for non-support of Web Codecs but precondition-fail for non-support of a given codec.

Or are you considering the media format as part of the feature?

For webcodecs in particular, I wonder if the best solution would be to have an "interop" mode for the tests which picks a supported video format and then uses that for all the subtests (this is approximately what authors are expected to do). We could keep the per-format variants in the main wpt suite in case there are format-specific problems.

I think, yes, that would be ideal—but that also requires more work than altering how the scoring works (and we still have this problem in the codebase—we need to deal with PRECONDITION_FAILED somehow).

@jgraham
Copy link

jgraham commented Jul 20, 2023

How can you get that without supporting the feature at all? If you don't support the feature at all, you'll get 0%, provided the tests fail for non-support of Web Codecs but precondition-fail for non-support of a given codec.

Hypothetically one could just implement the API for checking codec support, always return unsupported, and then it would look like you were passing every test. I think done deliberately that would be considered bad faith behaviour, but one can imagine similar situations arising if we unconditionally treat PRECONDITION_FAILED as PASS.

If there's a set of codecs that all of Blink/Gecko/WebKit support, I think the easiest solution here would just be to remove tests for other codecs from the Interop set, since it's apparent that we're not going to get Interop for codecs that not every browser implements. Unless that leaves us with no tests I think it's likely to be easier than the solution of creating a set of tests that pick a supported codec in each browser and use that.

@foolip
Copy link
Member

foolip commented Sep 15, 2023

It seems important to resolve this if it's affecting Interop 2023 scores in an unreasonable way, but I don't see a clear conclusion in web-platform-tests/interop#383.

Did y'all agree on something that would work here that we can go and implement?

Of the options I skimmed, I think updating/splitting the tests or filtering out PRECONDITION_FAILED seems the most practical. But that would leave the question about what wpt.fyi should show, and it should be in sync with interop scoring scripts.

@dalecurtis
Copy link

I felt the conclusion that we parameterize / separate tests with optional features and come to consensus on what's included in interop seemed reasonable since Interop is already a project of consensus.

@jgraham
Copy link

jgraham commented Sep 18, 2023

Yes. I don't think we can include tests for optional features in Interop without getting additional agreement on what all participants will actually implement, and targeting the tests specifically at that agreement (e.g. we might have agreement that no one will implement the optional feature, or everyone will, or that there will be two behaviours allowed by the tests in Interop).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants