Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for Ibis 9.x #1034

Merged
merged 17 commits into from
Jun 28, 2024
Merged

Conversation

tokoko
Copy link
Collaborator

@tokoko tokoko commented Jun 14, 2024

  • Adds support for Ibis 9.x and drops all previous versions.
  • Adds test_parity.py checking execution parity between duckdb (with ibis), duckdb (with substrait) and acero.
  • All but one TPCH queries are now green.

@tokoko
Copy link
Collaborator Author

tokoko commented Jun 15, 2024

Worked around arrow nightly failures by first using duckdb backend's to_pandas method and then converting to arrow. Looks like to_pyarrow doesn't work with nightly arrow. it's passing a tuple to rename_columns method instead of a list. Should probably be "fixed" in ibis.

P.S. Still fighting nix about other failures...

@gforsyth
Copy link
Member

Hey @tokoko -- I'm going to take a more thorough look at this soon, but just an FYI that the nix failures are due to an upstream thing that has been fixed but is still propagating down to the various nixos package repos, so we'll just have to wait another day or two for it to resolve itself.

@tokoko
Copy link
Collaborator Author

tokoko commented Jun 17, 2024

@gforsyth sure, thanks for the info. I think that's the reason for 39 and 310 failures. 311 failure is strange though, duckdb is failing to download substrait extension only when in nix environment, works without a problem in poetry. I'll also look into other failures as well. The windows one is not too surprising :) but I'm not sure what semantic-version is complaining about. Any ideas there?

@gforsyth
Copy link
Member

duckdb is failing to download substrait extension only when in nix environment, works without a problem in poetry

Ahh, that's a nix sandboxing thing -- we have test markers in Ibis (upstream) to skip tests that involve extensions when doing a nix build.

I'm not sure what semantic-version is complaining about. Any ideas there?

Might just need to be updated in CI, let me check

gforsyth added 2 commits June 17, 2024 11:08
We no longer require sqlalchemy upstream so we can drop these
dependencies that were used for downstream testing here.
@gforsyth
Copy link
Member

Might just need to be updated in CI, let me check

Ok, a bump to conventional commits sorted it out.

gforsyth added 2 commits June 17, 2024 14:21
Linux+Nix have sandboxing that prevents duckdb extensions from downloading.
@gforsyth
Copy link
Member

Hey @tokoko -- I bumped the nix flake and it pulled in the upstream fix, so the nix builds are passing now (with the added xfails of the tests that require installing the duckdb substrait extension on linux)

@tokoko
Copy link
Collaborator Author

tokoko commented Jun 17, 2024

@gforsyth I saw that, thanks. I'll try to later reorganize tests with fixtures in a way that only duckdb+substrait portion can be skipped instead of entire tests. I think I added a "fix" for windows failure as well.

@tokoko
Copy link
Collaborator Author

tokoko commented Jun 18, 2024

I tried but substrait extension doesn't seem to work on windows, I'm not even able to download the binary from the browser. Acero seems to support more functionality, it wasn't adding much value anyway, so I went ahead and disabled it from tests for now.

@tokoko
Copy link
Collaborator Author

tokoko commented Jun 26, 2024

@gforsyth Hey, have you had an opportunity to dig through the changes?

@gforsyth
Copy link
Member

@gforsyth Hey, have you had an opportunity to dig through the changes?

currently digging! I have a few small changes to add, but nothing big. I think the biggest pieces left are to get in tests for window functions and to figure out if we need to keep _get_child_relation_field_offsets for anything.
That said, neither of those two things should block this PR.

@tokoko
Copy link
Collaborator Author

tokoko commented Jun 26, 2024

@gforsyth Sweet, sounds good.

get in tests for window functions

This is a bit of a pain as substrait consumer engines I used in parity tests didn't support them. Pretty sure datafusion supports window functions though, just couldn't get datafusion tests working yet.

figure out if we need to keep _get_child_relation_field_offsets for anything

I sort of convinced myself that we don't, but more tests covering edge cases would be better.

Copy link
Member

@gforsyth gforsyth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work here @tokoko -- thanks for tackling this!

Some cleanup comments, and a few questions for you, and a few questions for (future) me.

ibis_substrait/compiler/translate.py Outdated Show resolved Hide resolved
ibis_substrait/compiler/translate.py Show resolved Hide resolved
ibis_substrait/compiler/translate.py Outdated Show resolved Hide resolved
ibis_substrait/compiler/translate.py Outdated Show resolved Hide resolved
ibis_substrait/compiler/translate.py Outdated Show resolved Hide resolved
ibis_substrait/compiler/translate.py Outdated Show resolved Hide resolved
ibis_substrait/compiler/translate.py Outdated Show resolved Hide resolved
ibis_substrait/compiler/translate.py Outdated Show resolved Hide resolved
ibis_substrait/compiler/translate.py Outdated Show resolved Hide resolved
Comment on lines -32 to -47
"name": "extract:date"
}
},
{
"extensionFunction": {
"extensionUriReference": 3,
"functionAnchor": 3,
"name": "multiply:dec_dec"
}
},
{
"extensionFunction": {
"extensionUriReference": 3,
"functionAnchor": 4,
"name": "subtract:dec_dec"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these scalar functions disappearing is a bit odd -- I'll have to dig in deeper (in a follow-up) to make sure this is working correctly

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried following these, but it was too much. couldn't really make sense of the existing plans, at least some of the the functions that disappeared didn't look to be used in original ibis code at all.

@tokoko
Copy link
Collaborator Author

tokoko commented Jun 28, 2024

Added handling for countDistinct, all tpch queries should be green now.

Copy link
Member

@gforsyth gforsyth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all of your work on this, @tokoko !

I'm going to merge this once CI passes. I'd like to add a few TPC-H parity tests just to make sure things are working as expected, but everything here looks good!

@gforsyth gforsyth merged commit 0c4aee8 into ibis-project:main Jun 28, 2024
12 checks passed
@tokoko tokoko deleted the bump-ibis-90 branch June 28, 2024 15:17
gforsyth added a commit that referenced this pull request Jul 1, 2024
BREAKING CHANGE: ibis-substrait no longer supports versions of Ibis <
9.x. If you are using an older version of Ibis, you can pin `ibis-substrait<4`

Co-authored-by: Gil Forsyth <[email protected]>
Co-authored-by: Gil Forsyth <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants