Fix title() and capitalize()#7717
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yml Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (2)
📒 Files selected for processing (6)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (5)
📝 WalkthroughWalkthroughReplaces ChangesUnicode casing & string title/capitalize overhaul
Sequence Diagram(s)sequenceDiagram
participant Caller
participant PyStr
participant ICU
participant Props
participant Bytes
Caller->>PyStr: call .capitalize() / .title() / .istitle()
PyStr->>ICU: titlecase_string / titlecase_first (language-aware mapping)
PyStr->>Props: query GeneralCategory / TitlecaseLetter (sigma/context checks)
ICU-->>PyStr: mapped titlecase segments (Writeable output)
PyStr->>PyStr: lowercase_or_sigma / handle_capital_sigma adjustments
PyStr->>Bytes: title_ascii for ASCII-only fast path or bytes chunks
Bytes-->>PyStr: ASCII-chunk results
PyStr-->>Caller: return final cased string (WTF-8: preserve invalid bytes)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
c533354 to
873c42b
Compare
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: [x] test: cpython/Lib/test/test_str.py (TODO: 8) dependencies: dependent tests: (no tests depend on str) Legend:
|
873c42b to
109a290
Compare
|
@joshuamegnauth54 can you please fix the merge conflicts? |
Head branch was pushed to by a user without write access
109a290 to
cbce58c
Compare
|
Fixed and also I fixed a small clippy lint that I introduced. I'm not sure why it didn't trigger earlier, but one of my helpers should have been |
That's because we have added a new lint for that at #7762 :) |
|
I am sorry, I merged another PR first and it made conflict. Could you resolve conflict? I will merge it immediately |
Head branch was pushed to by a user without write access
cbce58c to
114fc53
Compare
|
Fixed and force pushed. 😁 |
`icu_casemap` is consistently maintained, official, and tracks the latest Unicode versions. RustPython is also using other `icu4x` crates, so using `icu_casemap` is more consistent. As with islower and isupper, tracking the latest Unicode version is important because character definitions shift over time which causes discrepancies between RustPython and CPython. This commit fixes title().
I dropped unicode-casing because it's cleaner to use icu4x for everything. `icu4x` will also stay up to date whereas unicode-casing will need to be periodically updated with new Unicode tables. Dropping unicode-casing also removes some binary bloat due to the tables. `capitalize()` mimics CPython behavior more closely now as well. Notably, I implemented CPython's sigma edge case handler.
114fc53 to
ce75f32
Compare
I fixed
title()andcapitalize()as well as enabled their respective tests. I fixed both by following CPython's logic closely, including implementing a workaround for sigma.Besides that, I removed an unneeded dependency (unicode-casing) by using icu4x instead. This helps with consistency and removes outdated Unicode tables from the final RustPython binary.
Summary by CodeRabbit
Bug Fixes
Chores
Refactor
Tests