Skip to content
This repository was archived by the owner on May 7, 2026. It is now read-only.

fix: fix pandas.cut errors with empty bins#1499

Merged
chelsea-lin merged 3 commits intomainfrom
main_chelsealin_cutbug
Mar 18, 2025
Merged

fix: fix pandas.cut errors with empty bins#1499
chelsea-lin merged 3 commits intomainfrom
main_chelsealin_cutbug

Conversation

@chelsea-lin
Copy link
Copy Markdown
Contributor

Fixes internal issue 403638910 🦕

@chelsea-lin chelsea-lin requested review from a team and tswast March 17, 2025 23:36
@product-auto-label product-auto-label Bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Mar 17, 2025
@chelsea-lin chelsea-lin requested a review from sycai March 17, 2025 23:36
Comment thread tests/system/small/test_pandas.py Outdated
Comment thread bigframes/core/reshape/tile.py Outdated
window_spec=window_specs.unbound(),
)
op = agg_ops.CutOp(bins, right=right, labels=labels)
if isinstance(bins, typing.Iterable) and len(as_index) == 0:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we directly return the [pd.NA] * len(x) result from the elif branch len(list(bins)) == 0?

In that case we don't need the additional if-else branch at the bottom of the function, and the logic looks more straightforward. Plus, we don't need to alter the behavior of CutOp.output_type() in aggregations.py too

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the elif branch can turn to the [pd.NA] * len(x) result, such as

  • 1st elif for pd.IntervalIndex.from_tuples([]).
  • 2nd elif for []
  • 4st elif for [1]
    Because of that, CutOp.output_type() might return different result for each case above.