Skip to content

gh-145261: multiprocessing.shared_memory: fix ShareableList corruption for multi-byte strings and null bytes#145266

Open
zetzschest wants to merge 6 commits intopython:mainfrom
zetzschest:fix/multiprocessing_shareable_list_utf8
Open

gh-145261: multiprocessing.shared_memory: fix ShareableList corruption for multi-byte strings and null bytes#145266
zetzschest wants to merge 6 commits intopython:mainfrom
zetzschest:fix/multiprocessing_shareable_list_utf8

Conversation

@zetzschest
Copy link

@zetzschest zetzschest commented Feb 26, 2026

Issue

ShareableList has two issues:

  1. It uses len(item) (character count) for string slot allocation instead of len(item.encode('utf-8')) (byte count), causing UnicodeDecodeError with multi-byte UTF-8 characters.
  2. It uses rstrip(b'\x00') to recover bytes values, which strips legitimate trailing null bytes.

Reproducer

from multiprocessing.shared_memory import ShareableList

# String corruption
sl = ShareableList(['0\U00010000\U00010000'])
sl[0]  # UnicodeDecodeError

# Bytes corruption
sl = ShareableList([b'\x00'])
sl[0]  # b'' instead of b'\x00'

Fix

Use len(item.encode('utf-8')) for string slot allocation. For bytes, store the actual byte length in the format metadata so retrieval reads exactly the right number of bytes without needing rstrip(b'\x00').

This fix attempts to resolve both the new UTF-8 corruption issue and the long-standing trailing null bytes issue reported in #106939.

Test updates

Two assertions in test_shared_memory_ShareableList_basics needed adjustments since they were based on the previous behavior:

  • Format string assertions updated to reflect actual byte lengths stored for bytes values.
  • Removed format comparison between original and copy, as copies may allocate differently with byte-accurate lengths.

…and bytes with trailing nulls

ShareableList had two bugs:
1. Used character count len(item) instead of byte count
   len(item.encode('utf-8')) for string slot allocation, causing
   UnicodeDecodeError with multi-byte UTF-8 characters.
2. Used rstrip(b'\x00') to recover bytes values, which stripped
   legitimate trailing null bytes.

Fix uses UTF-8 byte length for string allocation and stores the actual
byte length in the format metadata for bytes values, so retrieval reads
exactly the right number of bytes without needing rstrip.
@bedevere-app
Copy link

bedevere-app bot commented Feb 26, 2026

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@zetzschest zetzschest marked this pull request as ready for review February 26, 2026 17:41
@zetzschest zetzschest requested a review from gpshead as a code owner February 26, 2026 17:41
The bug where ShareableList stripped trailing null bytes has been fixed
in Python 3.15. Update documentation to:
- Note the fix with versionchanged directive
- Update doctest to show correct behavior (nulls preserved)
- Clarify workaround is only needed for Python 3.14 and earlier
- Reference both original issue python#106939 and fix issue python#145261

Fixes failing doctest in CI where expected output showed old buggy
behavior instead of corrected behavior.
@zetzschest zetzschest marked this pull request as draft February 27, 2026 13:59
Extended the fix to remove rstrip from strings as well and store actual
byte lengths for both strings and bytes in format metadata.
Updated format string assertions and test data to match the new behavior
where strings are stored with their actual UTF-8 byte length instead of
being padded to 8 bytes minimum.
…servation

Added versionchanged directive for Python 3.15 noting that trailing null
bytes are now preserved in both strings and bytes. Updated doctest example
to show correct behavior and clarified workaround is only needed for 3.14
and earlier.
@zetzschest zetzschest force-pushed the fix/multiprocessing_shareable_list_utf8 branch 2 times, most recently from 7dbbe95 to eb4ce8a Compare February 27, 2026 14:10
@zetzschest zetzschest marked this pull request as ready for review February 27, 2026 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant