-
-
Notifications
You must be signed in to change notification settings - Fork 401
Description
We are researchers from the University of Athens, working on cross-language analysis of Python packages with C/C++ native extensions.
Problem
We found an issue in the pygit2 package: the RefdbBackend_init() function [1] does not check the return value of its internal memory allocation call (calloc() [2]). Immediately after allocating memory for the backend structure, it calls git_refdb_init_backend() [3] and initializes several fields, which dereference the pointer without verifying that it is non-NULL.
If the allocation fails, RefdbBackend_init() continues execution with a NULL pointer. This can lead to undefined behavior, including potential segmentation faults under memory-constrained conditions.
Observed Behavior in Python
To explore the bug, we ran the following script, which internally trigger the RefdbBackend_init() funtion:
import pygit2
import tempfile
import os
ITERS = 100000
MB = 1024 ** 2
ORDER = 512 * MB
y = []
for i in range(ITERS):
print(i)
x = b"a" * ORDER
y.append(x)
path = tempfile.mkdtemp()
repo = pygit2.init_repository(path)
y.append(repo)
In practice, Python often raises MemoryError first when the interpreter itself cannot allocate memory. As a result, the NULL pointer dereference is rarely observed as a segfault in the Python environment, but the underlying bug could trigger undefined behavior under constrained memory conditions.
Potential Fix
- Ensure that the return value of
calloc()is checked before using the pointer, so the function does not proceed if allocation fails. - Ensure the allocated backend structure is freed if any initialization step fails.
References
[1]
Line 389 in a85f6fb
| RefdbBackend_init(RefdbBackend *self, PyObject *args, PyObject *kwds) |
[2]
Line 403 in a85f6fb
| struct pygit2_refdb_backend *be = calloc(1, sizeof(struct pygit2_refdb_backend)); |
[3]
Line 404 in a85f6fb
| git_refdb_init_backend(&be->backend, GIT_REFDB_BACKEND_VERSION); |