This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: [subinterpreter] _PyUnicode_EqualToASCIIId() issue with subinterpreters
Type: Stage: resolved
Components: C API, Subinterpreters Versions: Python 3.11, Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Mark.Shannon, corona10, craigh, diabonas, eric.snow, erlendaasland, hroncok, methane, ndjensen, pablogsal, petr.viktorin, serhiy.storchaka, srittau, vstinner
Priority: Keywords: 3.10regression, patch

Created on 2021-12-07 17:12 by vstinner, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
reproducer.c vstinner, 2021-12-07 17:12
cmp_interned_strings.patch vstinner, 2021-12-10 12:35
Pull Requests
URL Status Linked Edit
PR 30123 closed vstinner, 2021-12-15 14:44
PR 30131 closed eric.snow, 2021-12-16 01:10
PR 30422 merged vstinner, 2022-01-05 16:27
PR 30425 merged vstinner, 2022-01-06 07:59
PR 30433 closed vstinner, 2022-01-06 14:30
Messages (29)
msg407950 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-12-07 17:12
_PyUnicode_EqualToASCIIId() seems to be incompatible with subinterpreter: it makes the assumption that if direct pointer comparison fails and the string is interned, the two strings are not equal.

--

super_init_without_args() of Objects/typeobject.c calls _PyUnicode_EqualToASCIIId(name, &PyId___class__) to test if the Unicode string 'name' is equal to "__class__".

int
_PyUnicode_EqualToASCIIId(PyObject *left, _Py_Identifier *right)
{
    right_uni = _PyUnicode_FromId(right);
    ...
    if (left == right_uni)
        return 1;
    if (PyUnicode_CHECK_INTERNED(left))
        return 0;
    ...
    return unicode_compare_eq(left, right_uni);
}

_PyUnicode_EqualToASCIIId() makes the assumption that left and right are not equal if left and _PyUnicode_FromId(right) pointers are not equal and left is an interned string.

In the reproducer, left object is abc.ABCMeta.__new__.__code__.co_freevars[0].

Depending on how the stdlib abc.py file was loaded (in the main interpreter and in the subinterpreter), __code__.co_freevars[0] may or may not be an interned string.

If __code__.co_freevars[0] is an interned string, _PyUnicode_EqualToASCIIId() fails in a subinterpreter if the direct pointer comparison fails (if left and right_uni pointers are not equal).

--

Reproducer from: https://github.com/ninia/jep/issues/358#issuecomment-988090696

* Build Python 3.10 with "./configure --enable-shared --prefix /opt/py310" and install it.
* Download attached reproducer.c.
* Build the reproducer with: 
  gcc -o reproducer reproducer.c $(/opt/py310/bin/python3.10-config --embed --cflags --ldflags)
* Remove all stdlib .pyc files:
  find /opt/py310 -type d -name __pycache__|xargs rm -rf
* Run the reproducer with:
  LD_LIBRARY_PATH=/opt/py310/lib ./reproducer

Output:
---
Before creating sub interpreter
Traceback (most recent call last):
  File "/opt/py310/lib/python3.10/io.py", line 52, in <module>
  File "/opt/py310/lib/python3.10/abc.py", line 184, in <module>
  File "/opt/py310/lib/python3.10/abc.py", line 106, in __new__
RuntimeError: super(): __class__ cell not found
Fatal Python error: _PyThreadState_Delete: tstate 0x7f9f2001c710 is still current
Python runtime state: initialized

Current thread 0x00007f9f27c99640 (most recent call first):
  <no Python frame>
Abandon (core dumped)
---

py-bt command in gdb:
---
(gdb) py-bt
Traceback (most recent call first):
  File "/opt/py310/lib/python3.10/abc.py", line 106, in __new__
    cls = super().__new__(mcls, name, bases, namespace, **kwargs)
  <built-in method __build_class__ of module object at remote 0x7fffea0b4cc0>
  File "/opt/py310/lib/python3.10/abc.py", line 184, in <module>
    class ABC(metaclass=ABCMeta):
  <built-in method exec of module object at remote 0x7fffea0b4cc0>
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "/opt/py310/lib/python3.10/io.py", line 52, in <module>
    import abc
  <built-in method exec of module object at remote 0x7fffea0b4cc0>
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  <built-in method __import__ of module object at remote 0x7fffea0b4cc0>
---
msg407951 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-12-07 17:19
In Python 3.9, the code works because the _Py_IDENTIFIER() API shares Python Unicode objects between all interpreters.

_PyUnicode_FromId() was modified to be per-interpreter in bpo-39465 by:

New changeset ba3d67c2fb04a7842741b1b6da5d67f22c579f33 by Victor Stinner in branch 'master':
bpo-39465: Fix _PyUnicode_FromId() for subinterpreters (GH-20058)
https://github.com/python/cpython/commit/ba3d67c2fb04a7842741b1b6da5d67f22c579f33
msg407952 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-12-07 17:21
Serhiy: Do you recall the idea of the PyUnicode_CHECK_INTERNED() optimization?

The PyUnicode_CHECK_INTERNED() test is as old as the _PyUnicode_EqualToASCIIId() function.

commit f5894dd646f5e39918377b37b8c8694cebdca103
Author: Serhiy Storchaka <storchaka@gmail.com>
Date:   Wed Nov 16 15:40:39 2016 +0200

    Issue #28701: Replace _PyUnicode_CompareWithId with _PyUnicode_EqualToASCIIId.
    
    The latter function is more readable, faster and doesn't raise exceptions.
    
    Based on patch by Xiang Zhang.
msg407953 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-12-07 17:27
> Depending on how the stdlib abc.py file was loaded (in the main interpreter and in the subinterpreter), __code__.co_freevars[0] may or may not be an interned string.

When the bug occurs, I see that the Python stdlib abc.py file is loaded twice: the main interpreter builds a code object, and then subinterpreter builds its own code object: same content, but different Python object (at different memory addresses so inequal pointers!).

I modified reproducer.c to add "Py_VerboseFlag = 1;" before the Py_Initialize() call. Truncated output:
---
...
# code object from /opt/py310/lib/python3.10/abc.py
...
# code object from /opt/py310/lib/python3.10/abc.py
Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
...
RuntimeError: super(): __class__ cell not found
...
---
msg407956 -