Skip to content

feat(bigframes): Support loading avro, orc data#16555

Open
TrevorBergeron wants to merge 4 commits intomainfrom
tbergeron_bf_read_orc_avro
Open

feat(bigframes): Support loading avro, orc data#16555
TrevorBergeron wants to merge 4 commits intomainfrom
tbergeron_bf_read_orc_avro

Conversation

@TrevorBergeron
Copy link
Copy Markdown
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for reading ORC and Avro files into BigQuery DataFrames by implementing read_orc and read_avro methods in the Session class and providing corresponding API wrappers. Review feedback identifies a bug in the system tests where to_orc is called on a BigFrames DataFrame instead of a pandas DataFrame. Additionally, several improvements are suggested to maintain alphabetical order in imports and function definitions, along with a minor wording update for an error message to improve clarity.

@TrevorBergeron TrevorBergeron marked this pull request as ready for review April 6, 2026 21:10
@TrevorBergeron TrevorBergeron requested review from a team as code owners April 6, 2026 21:10
@TrevorBergeron TrevorBergeron requested a review from tswast April 6, 2026 21:10
@TrevorBergeron TrevorBergeron requested a review from a team as a code owner April 14, 2026 17:47
The write engine used to persist the data to BigQuery if needed.

Returns:
bigframes.dataframe.DataFrame:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: in docs, use bigframes.pandas.DataFrame so that we link here: https://dataframes.bigquery.dev/reference/api/bigframes.pandas.DataFrame.html#bigframes.pandas.DataFrame

This is less of a concern now that we've migrated off of Cloud RAD onto plain sphinx, which does dedupe aliases, AFAIK, but I'd like to ensure we keep consistency.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

The engine used to read the file. Only `bigquery` is supported for Avro.

Returns:
bigframes.dataframe.DataFrame:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: bigframes.pandas.DataFrame in docs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

bigframes.dataframe.DataFrame:
A new DataFrame representing the data from the Avro file.
"""
if engine not in ("auto", "bigquery"):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No action required, but I did see https://arrow.apache.org/blog/2025/10/23/introducing-arrow-avro/ last year, which should be enough to unlock a potential upstream contribution for a read_avro method in pandas.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea

@TrevorBergeron TrevorBergeron requested a review from tswast April 14, 2026 23:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants