Skip to content

Enforce int_max_str_digits on int-to-str conversions#7688

Merged
youknowone merged 4 commits intoRustPython:mainfrom
changjoon-park:enforce-int-max-str-digits-on-conversion
Apr 27, 2026
Merged

Enforce int_max_str_digits on int-to-str conversions#7688
youknowone merged 4 commits intoRustPython:mainfrom
changjoon-park:enforce-int-max-str-digits-on-conversion

Conversation

@changjoon-park
Copy link
Copy Markdown
Contributor

@changjoon-park changjoon-park commented Apr 26, 2026

RustPython enforced sys.get_int_max_str_digits() on the str → int direction (via bytes_to_int) but not the int → str direction. CPython 3.14 enforces both per PEP 644's DoS mitigation. This adds the missing checks.

Entry points covered

Path Site
repr(int) builtins/int.rs::repr_str
str(int) protocol/object.rs::str fast path
f-string, format(int, …) builtins/int.rs::__format__ (decimal/n/empty only)
'%d' % int (str + bytes %) vm/cformat.rs Decimal{D,I,U} branches

Binary bases ('x', 'o', 'b') remain exempt per CPython spec.

Implementation

  • New check_int_to_str_digits helper in builtins::int — bit-count fast path (skip if < 1500 bits ≈ 452 digits), then upper bound bits × log10(2) + 1 vs the configured limit.
  • New FormatSpec::is_decimal_int_format() in common::format for the decimal-vs-binary branch decision in __format__.

Tests unmasked (8)

  • test_int.IntStrDigitLimitsTests::test_max_str_digits
  • test_int.IntStrDigitLimitsTests::test_denial_of_service_prevented_int_to_str
  • test_int.IntStrDigitLimitsTests::test_int_from_other_bases
  • (same 3 mirrored in IntSubclassStrDigitLimitsTests)
  • test_ast.AST_Tests::test_repr_large_input_crash
  • test_reprlib.ReprTests::test_numbers

Verification

  • 8 modules pass (test_int test_ast test_reprlib test_decimal test_fractions test_compile test_sys test_cmd_line), 1,276 tests, no regressions
  • Snippet at extra_tests/snippets/builtin_int.py covers all 9 decimal paths + 3 binary-base exemptions + disabled-limit; passes on RustPython and CPython 3.14.4
  • Boundary cases (4299/4300/4301 digits at limit=4300) byte-identical with CPython 3.14.4

Summary by CodeRabbit

  • New Features
    • Decimal integer-to-string conversions now enforce configured digit limits and raise ValueError when exceeded across str(), repr(), format(), f-strings, and percent formatting; non-decimal (hex/oct/bin) formatting remains unaffected.
  • Tests
    • Added a regression test validating the digit-limit behavior and the ability to disable the limit.

The str-to-int direction already enforced sys.get_int_max_str_digits()
via bytes_to_int; the int-to-str direction did not. CPython 3.14 enforces
both per PEP 644.

Adds check_int_to_str_digits helper in builtins::int (bit-count fast path
+ digit upper-bound from log10(2)), wired into the four Python-level
entry points: repr, the str fast path in protocol::object, int.__format__
(decimal/n/empty spec only — binary bases x/o/b are exempt per CPython),
and the DecimalD/I/U branches of vm::cformat for both str % and bytes %.

Unmasks 8 expectedFailure tests across test_int (max_str_digits, DoS
prevention, int_from_other_bases — each mirrored in IntSubclass),
test_ast (test_repr_large_input_crash) and test_reprlib (test_numbers).
Boundary cases (4299/4300/4301 digits at limit=4300) match CPython 3.14.4.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 26, 2026

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (1)
  • Lib/test/test_int.py is excluded by !Lib/**

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 470b44a1-24c2-4313-9dd2-f41926a09ae1

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds PEP 644 enforcement for decimal integer-to-string conversions by introducing a FormatSpec helper and a VM-level digit check; the check is invoked across decimal formatting paths (str/repr/format/f-strings) and rejects conversions exceeding sys.get_int_max_str_digits().

Changes

Cohort / File(s) Summary
Format classification
crates/common/src/format.rs
Adds pub fn is_decimal_int_format(&self) -> bool on FormatSpec to classify specs that should follow decimal-integer digit limits (covers unset, d, and lower-case n).
Integer validation core
crates/vm/src/builtins/int.rs
Adds pub(crate) fn check_int_to_str_digits(value: &BigInt, vm: &VirtualMachine) -> PyResult<()> and updates PyInt::repr_str signature to accept vm, running the digit check before producing decimal string output.
Formatting integration
crates/vm/src/cformat.rs, crates/vm/src/protocol/object.rs
Integrates check_int_to_str_digits() into decimal formatting paths for bytes and strings, for PyInt, PyFloat (via bigint conversion), and __int__ fallback; PyObject::str now runs the check on exact-PyInt branch.
Tests
extra_tests/snippets/builtin_int.py
Adds regression test exercising sys.set_int_max_str_digits() behavior across str(), repr(), f-strings, %d, bytes formatting, and format(); verifies non-decimal bases are unaffected and restores original limit.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

"I nibble digits, one by one,
Counting till the work is done.
If numbers grow too tall and wide,
I thump my foot — the VM won't glide.
PEP-safe hops, a rabbit's pride." 🐇✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main change: enforcing int_max_str_digits limits on integer-to-string conversions, which is the core objective of the PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 26, 2026

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

(module 'ast test_int test_reprlib' not found)

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/common/src/format.rs (1)

484-489: Optional: narrow Number(_) to Number(Case::Lower).

Number(Case::Upper) ('N') is rejected as UnknownFormatCode('N', "int") by format_int, so it isn’t actually a decimal-int output path. Including it here means a huge int formatted with "N" will raise ValueError("Exceeds the limit …") before the format-code error has a chance to surface, changing the message users see for an invalid spec. Functionally both are ValueError, so this is a low-priority polish; restricting the helper to formats that genuinely produce decimal output keeps the helper’s name honest:

♻️ Proposed narrowing
     pub fn is_decimal_int_format(&self) -> bool {
         matches!(
             self.format_type,
-            None | Some(FormatType::Decimal) | Some(FormatType::Number(_))
+            None | Some(FormatType::Decimal) | Some(FormatType::Number(Case::Lower))
         )
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/common/src/format.rs` around lines 484 - 489, The helper
is_decimal_int_format currently treats FormatType::Number(_) as decimal, but
Number(Case::Upper) ('N') is rejected later by format_int (producing
UnknownFormatCode('N', "int")), causing an order-change in errors; update
is_decimal_int_format to only return true for None, Some(FormatType::Decimal),
or Some(FormatType::Number(Case::Lower)) so it truly represents formats that
produce decimal output, referencing the is_decimal_int_format function,
FormatType::Number, Case::Lower, and format_int/UnknownFormatCode to locate the
relevant logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/common/src/format.rs`:
- Around line 484-489: The helper is_decimal_int_format currently treats
FormatType::Number(_) as decimal, but Number(Case::Upper) ('N') is rejected
later by format_int (producing UnknownFormatCode('N', "int")), causing an
order-change in errors; update is_decimal_int_format to only return true for
None, Some(FormatType::Decimal), or Some(FormatType::Number(Case::Lower)) so it
truly represents formats that produce decimal output, referencing the
is_decimal_int_format function, FormatType::Number, Case::Lower, and
format_int/UnknownFormatCode to locate the relevant logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 021928ab-60d9-4eb4-a307-192b88cd47d4

📥 Commits

Reviewing files that changed from the base of the PR and between 625e5bf and 734136a.

⛔ Files ignored due to path filters (3)
  • Lib/test/test_ast/test_ast.py is excluded by !Lib/**
  • Lib/test/test_int.py is excluded by !Lib/**
  • Lib/test/test_reprlib.py is excluded by !Lib/**
📒 Files selected for processing (5)
  • crates/common/src/format.rs
  • crates/vm/src/builtins/int.rs
  • crates/vm/src/cformat.rs
  • crates/vm/src/protocol/object.rs
  • extra_tests/snippets/builtin_int.py

Copy link
Copy Markdown
Contributor

@ShaharNaveh ShaharNaveh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TYSM for all the PRs recently:)

The test_denial_of_service_prevented_int_to_str regression test uses
support.Stopwatch, which calls time.get_clock_info('monotonic'). In
RustPython that function is gated to unix/windows targets only, so on
wasm32-wasip1 it surfaces as AttributeError and breaks the wasm-wasi CI.
Guard the test with skipUnless(hasattr(time, 'get_clock_info'), ...) so
it runs everywhere it can and is skipped on wasm.

Also narrow is_decimal_int_format to Number(Case::Lower): 'N' is rejected
by format_int as UnknownFormatCode, so excluding it preserves that error
path instead of intercepting it with the digit-limit check.
@changjoon-park
Copy link
Copy Markdown
Contributor Author

TYSM for all the PRs recently:)

thanks for checking ! and also, i pushed a follow-up that skips the DoS test on wasm — it depends on time.get_clock_info, which is unix/windows-only in RustPython and surfaces as AttributeError via support.Stopwatch.

thanks for keep watching :)

Copy link
Copy Markdown
Contributor

@ShaharNaveh ShaharNaveh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For custom patches we need to set some type of comment saying that this is our patch so update_lib can migrate the custom patches:)

Comment thread Lib/test/test_int.py Outdated
@unittest.expectedFailure # TODO: RUSTPYTHON
@unittest.skipUnless(
hasattr(time, "get_clock_info"),
"requires time.get_clock_info (not available on wasm)",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"requires time.get_clock_info (not available on wasm)",
"TODO: RUSTPYTHON; requires time.get_clock_info (not available on wasm)",

Comment thread Lib/test/test_int.py Outdated
str(i)

@unittest.expectedFailure # TODO: RUSTPYTHON
@unittest.skipUnless(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'd set it to unittest.expectedFailireIf(not hasattr(time, "get_clock_info"), "TODO: RUSTPYTHON; ...")

scripts/update_lib uses TODO: RUSTPYTHON markers inside unittest
decorator reason strings to identify and migrate custom RustPython
patches across CPython library updates.
skipUnless silently hides the test forever; expectedFailureIf surfaces
unexpected success once RustPython implements time.get_clock_info on
wasm, prompting marker removal.
@changjoon-park
Copy link
Copy Markdown
Contributor Author

thanks and all is done
the "expectedFailureIf" is the right call i should've checked it thanks for catching :)

Copy link
Copy Markdown
Contributor

@ShaharNaveh ShaharNaveh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty! looks great:)

Copy link
Copy Markdown
Member

@youknowone youknowone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment thread Lib/test/test_int.py
@unittest.expectedFailure # TODO: RUSTPYTHON
@unittest.expectedFailureIf(
not hasattr(time, "get_clock_info"),
"TODO: RUSTPYTHON; time.get_clock_info is not available on wasm",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how CPython wasm build handle this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's verified via CPython source

the "time_get_clock_info" in Modules/timemodule.c (v3.14.0 lines 1630, 1946) has no cfg guards, so cpython exposes it unconditionally including wasm builds. The "expectedFailureIf" will surface as unexpected success once we lift RustPython's #[cfg(unix)]/#[cfg(windows)] gating in crates/vm/src/stdlib/time.rs

thanks for catching !

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for checking

@youknowone youknowone merged commit 9794ab7 into RustPython:main Apr 27, 2026
38 of 39 checks passed
@changjoon-park changjoon-park deleted the enforce-int-max-str-digits-on-conversion branch April 27, 2026 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants