Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Releasing physical memory associated with a frame failed with error code -1: Invalid argument #4718

Open
ubmarco opened this issue Jan 15, 2025 · 14 comments
Assignees
Labels
bug Something isn't working

Comments

@ubmarco
Copy link

ubmarco commented Jan 15, 2025

Kùzu version

v0.7.1

What operating system are you using?

Raspberry Pi 5 arm64, Debian 12 (bookworm)

What happened?

When executing the Python client example script

import kuzu

db = kuzu.Database("./test")
conn = kuzu.Connection(db)

# Define the schema
conn.execute("CREATE NODE TABLE User(name STRING, age INT64, PRIMARY KEY (name))")
conn.execute("CREATE NODE TABLE City(name STRING, population INT64, PRIMARY KEY (name))")
conn.execute("CREATE REL TABLE Follows(FROM User TO User, since INT64)")
conn.execute("CREATE REL TABLE LivesIn(FROM User TO City)")

# Load some data
conn.execute('COPY User FROM "user.csv"')
conn.execute('COPY City FROM "city.csv"')
conn.execute('COPY Follows FROM "follows.csv"')
conn.execute('COPY LivesIn FROM "lives-in.csv"')

# Query the data
results = conn.execute("MATCH (u:User) RETURN u.name, u.age;")
while results.has_next():
    print(results.get_next())

I get

Traceback (most recent call last):
  File "/home/someuser/Downloads/test.py", line 13, in <module>
    conn.execute('COPY User FROM "user.csv"')
  File "/home/someuser/Downloads/.venv/lib/python3.11/site-packages/kuzu/connection.py", line 130, in execute
    _query_result = self._connection.query(query)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Buffer manager exception: Releasing physical memory associated with a frame failed with error code -1: Invalid argument.

Packages installed

$ pip freeze
kuzu==0.7.1

The error appears for both Python 3.11 and 3.12.

The error also appears with the kuzu CLI when doing COPY User FROM "user.csv".
Creating the tables worked.

Are there known steps to reproduce?

No response

@ubmarco ubmarco added the bug Something isn't working label Jan 15, 2025
@ubmarco
Copy link
Author

ubmarco commented Jan 15, 2025

Here is a smaller example

(.venv) someuser@rasp03:~/Downloads $ rm -rf test
(.venv) someuser@rasp03:~/Downloads $ ./kuzu test
Opening the database at path: test in read-write mode.
Enter ":help" for usage hints.
kuzu> CREATE NODE TABLE User(name STRING, age INT64, PRIMARY KEY (name));
┌──────────────────────────────┐
│ result                       │
│ STRING                       │
├──────────────────────────────┤
│ Table User has been created. │
└──────────────────────────────┘
(1 tuple)
(1 column)
Time: 1.74ms (compiling), 4.91ms (executing)
kuzu> CREATE (u:User {name: 'Alice', age: 35});
(0 tuples)
(0 columns)
Time: 1.46ms (compiling), 2.10ms (executing)
kuzu> MATCH (a) RETURN *;
┌────────────────────────────────────────────────┐
│ a                                              │
│ NODE                                           │
├────────────────────────────────────────────────┤
│ {_ID: 0:0, _LABEL: User, name: Alice, age: 35} │
└────────────────────────────────────────────────┘
(1 tuple)
(1 column)
Time: 9.50ms (compiling), 1.15ms (executing)
kuzu> 
(.venv) someuser@rasp03:~/Downloads $ ./kuzu test
Opening the database at path: test in read-write mode.
Enter ":help" for usage hints.
kuzu> MATCH (a) RETURN *;
┌────────────────────────────────────────────────┐
│ a                                              │
│ NODE                                           │
├────────────────────────────────────────────────┤
│ {_ID: 0:0, _LABEL: User, name: Alice, age: 35} │
└────────────────────────────────────────────────┘
(1 tuple)
(1 column)
Time: 0.23ms (compiling), 1.59ms (executing)
kuzu> 
(.venv) someuser@rasp03:~/Downloads $ ./kuzu test
Opening the database at path: test in read-write mode.
Enter ":help" for usage hints.
kuzu> COPY User FROM "user.csv";
Error: Buffer manager exception: Releasing physical memory associated with a frame failed with error code -1: Invalid argument.
kuzu>

This shows I can insert data, restart kuzu and load the data back. But when copying from CSV it fails with the exception.

@ubmarco
Copy link
Author

ubmarco commented Jan 15, 2025

I tested kuzu 0.7.0 and I get a different exception:

kuzu> COPY User FROM "user.csv";
Segmentation fault

@acquamarin acquamarin self-assigned this Jan 16, 2025
@ray6080
Copy link
Contributor

ray6080 commented Jan 16, 2025

Hi @ubmarco, thanks for reporting this! I tried to reproduce on our arm64-based MacOS and linux debian machines, but with no luck to reproduce the exception. It might be something more specific to Raspberry Pi, we need some time to figure out a runtime environment for that. We don't have a Raspberry Pi on hand in the team 😂

@ubmarco
Copy link
Author

ubmarco commented Jan 16, 2025

Hi @ray6080, thanks for the quick feedback. Did you test natively (aarch64-darwin) or inside a Linux Docker container (aarch64-linux)?
I previously had a Raspberry Pi 4 in our CI setup for which I did not see this issue. Here are the differences.
Raspberry Pi 4:

Raspberry Pi 5:

Here's a bigger comparison: https://gadgetversus.com/processor/broadcom-bcm2711-vs-broadcom-bcm2712/

Let me know whether and how I can provide further debug information.
We can also organize a screen sharing session, if that helps.

@ubmarco
Copy link
Author

ubmarco commented Jan 16, 2025

I just set up a ubuntu24.04-aarch64 VM on QEMU/KVM with qemu-system-aarch64. My host is an x64 Arch Linux install.
So it's an (awfully slow) aarch64 emulation.
No issues here:

kuzu> COPY User FROM "user.csv";
┌──────────────────────────────────────────────┐
│ result                                       │
│ STRING                                       │
├──────────────────────────────────────────────┤
│ 4 tuples have been copied to the User table. │
└──────────────────────────────────────────────┘
(1 tuple)
(1 column)
Time: 125.85ms (compiling), 1526.05ms (executing)

@mewim
Copy link
Member

mewim commented Jan 16, 2025

Hi @ray6080, thanks for the quick feedback. Did you test natively (aarch64-darwin) or inside a Linux Docker container (aarch64-linux)? I previously had a Raspberry Pi 4 in our CI setup for which I did not see this issue. Here are the differences. Raspberry Pi 4:

Raspberry Pi 5:

Here's a bigger comparison: https://gadgetversus.com/processor/broadcom-bcm2711-vs-broadcom-bcm2712/

Let me know whether and how I can provide further debug information. We can also organize a screen sharing session, if that helps.

Hi @ubmarco,

I tested on an Oracle Cloud Ampere A1 instance (quad core ARM 64-bit and 24GB RAM). I ran the script both on the machine directly with Oracle Linux 9 and in a Docker container with Debian 12. In both setup the script successfully runs to the end.

Edit: Sorry I did not mean to close the issue. Clicked on the wrong button 😅.

@mewim mewim closed this as completed Jan 16, 2025
@mewim mewim reopened this Jan 16, 2025
@mewim
Copy link
Member

mewim commented Jan 16, 2025

Hi @ray6080, thanks for the quick feedback. Did you test natively (aarch64-darwin) or inside a Linux Docker container (aarch64-linux)? I previously had a Raspberry Pi 4 in our CI setup for which I did not see this issue. Here are the differences. Raspberry Pi 4:

Raspberry Pi 5:

Here's a bigger comparison: https://gadgetversus.com/processor/broadcom-bcm2711-vs-broadcom-bcm2712/
Let me know whether and how I can provide further debug information. We can also organize a screen sharing session, if that helps.

Hi @ubmarco,

I tested on an Oracle Cloud Ampere A1 instance (quad core ARM 64-bit and 24GB RAM). I ran the script both on the machine directly with Oracle Linux 9 and in a Docker container with Debian 12. In both setup the script successfully runs to the end.

Edit: Sorry I did not mean to close the issue. Clicked on the wrong button 😅.

I suppose the issue might have something to do with non-standard parameters in the kernel configuration. Would it be possible for you to test with a different OS image on your Pi 5?

@ubmarco
Copy link
Author

ubmarco commented Jan 16, 2025

Sure, let me test this tomorrow when I'm back at the machines.

The 2 Raspberries both run Raspberry Pi OS, but on the Pi 4 I installed Raspberry Pi OS Lite and then Xfce manually. On the Pi 5 I installed Raspberry Pi OS with desktop which runs PIXEL, an adapted LXDE.
Desktop environment should not make a difference, but the kernel or its parameters could be different.

@ray6080 ray6080 mentioned this issue Jan 16, 2025
71 tasks
@ubmarco
Copy link
Author

ubmarco commented Jan 17, 2025

I installed Raspberry Pi OS Lite on the Pi 5 without any desktop environment and see the same error.
Some system details:

$ uname -a
Linux rasp03 6.6.62+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.62-1+rpt1 (2024-11-25) aarch64 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 12 (bookworm)
Release:	12
Codename:	bookworm

$ cat /proc/cmdline
reboot=w coherent_pool=1M 8250.nr_uarts=1 pci=pcie_bus_safe cgroup_disable=memory numa_policy=interleave  smsc95xx.macaddr=2C:CF:67:C6:BC:F9 vc_mem.mem_base=0x3fc00000 vc_mem.mem_size=0x40000000  console=ttyAMA10,115200 console=tty1 root=PARTUUID=c7f07b68-02 rootfstype=ext4 fsck.repair=yes rootwait

@ubmarco
Copy link
Author

ubmarco commented Jan 17, 2025

Now I installed Ubuntu Server 24.01.1 LTS without any desktop environment and the bug does not appear on the Raspberry Pi 5.
Some system details:

$ uname -a
Linux rasp03 6.8.0-1017-raspi #19-Ubuntu SMP PREEMPT_DYNAMIC Fri Dec  6 20:45:12 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 24.04.1 LTS
Release:	24.04
Codename:	noble

cat /proc/cmdline
reboot=w coherent_pool=1M 8250.nr_uarts=1 pci=pcie_bus_safe snd_bcm2835.enable_compat_alsa=0 snd_bcm2835.enable_hdmi=1  smsc95xx.macaddr=2C:CF:67:C6:BC:F9 vc_mem.mem_base=0x3fc00000 vc_mem.mem_size=0x40000000  console=ttyAMA10,115200 multipath=off dwc_otg.lpm_enable=0 console=tty1 root=LABEL=writable rootfstype=ext4 rootwait fixrtc

Kernel version is different:
Raspberry Pi OS Lite: 6.6.62+rpt-rpi-2712
Ubuntu 24.04.1 LTS: 6.8.0-1017-raspi

Sorted compare of the kernel parameters:
Image

@mewim
Copy link
Member

mewim commented Jan 17, 2025

Now I installed Ubuntu Server 24.01.1 LTS without any desktop environment and the bug does not appear on the Raspberry Pi 5. Some system details:

$ uname -a
Linux rasp03 6.8.0-1017-raspi #19-Ubuntu SMP PREEMPT_DYNAMIC Fri Dec  6 20:45:12 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 24.04.1 LTS
Release:	24.04
Codename:	noble

cat /proc/cmdline
reboot=w coherent_pool=1M 8250.nr_uarts=1 pci=pcie_bus_safe snd_bcm2835.enable_compat_alsa=0 snd_bcm2835.enable_hdmi=1  smsc95xx.macaddr=2C:CF:67:C6:BC:F9 vc_mem.mem_base=0x3fc00000 vc_mem.mem_size=0x40000000  console=ttyAMA10,115200 multipath=off dwc_otg.lpm_enable=0 console=tty1 root=LABEL=writable rootfstype=ext4 rootwait fixrtc

Kernel version is different: Raspberry Pi OS Lite: 6.6.62+rpt-rpi-2712 Ubuntu 24.04.1 LTS: 6.8.0-1017-raspi

Sorted compare of the kernel parameters: Image

Interesting. We may need to do some experiment to see which configuration can cause this so we can reproduce the issue.

@ubmarco
Copy link
Author

ubmarco commented Jan 17, 2025

Another interesting fact. I installed the same Raspberry Pi Lite OS on the Pi 4 as on the Pi 5. Without desktop environment.
The bug does not appear on the Pi 4 while it appears on the Pi 5. The data for the Pi 4:

$ uname -a
Linux rasp02 6.6.62+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.62-1+rpt1 (2024-11-25) aarch64 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 12 (bookworm)
Release:	12
Codename:	bookworm

$ cat /proc/cmdline
coherent_pool=1M 8250.nr_uarts=0 snd_bcm2835.enable_headphones=0 cgroup_disable=memory numa_policy=interleave snd_bcm2835.enable_headphones=1 snd_bcm2835.enable_hdmi=1 snd_bcm2835.enable_hdmi=0  smsc95xx.macaddr=DC:A6:32:55:27:6A vc_mem.mem_base=0x3ec00000 vc_mem.mem_size=0x40000000  console=ttyS0,115200 console=tty1 root=PARTUUID=ffcdc7da-02 rootfstype=ext4 fsck.repair=yes rootwait

Kernel has the same version now, but for a different architecture:
Raspberry Pi OS Lite Pi 4: 6.6.62+rpt-rpi-v8
Raspberry Pi OS Lite Pi 5: 6.6.62+rpt-rpi-2712

Sorted kernel parameter compare between Pi 4 and Pi 5

Image

As I could not reproduce the issue on the Pi 4 and we are dealing with the same kernel version, either the different ARM Cortex core makes the difference, the kernel build or the kernel parameters.

Another idea what to test next? I only have on Pi 5 and would install Ubuntu Server to actually use it.
That makes it harder for me to do more experiments as I have to wipe it for the error to appear.

Just for my information, what is the error Invalid argument actually indicating?
I see the error is produced here and reported here. So which of the arguments is invalid?

@mewim
Copy link
Member

mewim commented Jan 17, 2025

Another interesting fact. I installed the same Raspberry Pi Lite OS on the Pi 4 as on the Pi 5. Without desktop environment. The bug does not appear on the Pi 4 while it appears on the Pi 5. The data for the Pi 4:

$ uname -a
Linux rasp02 6.6.62+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.62-1+rpt1 (2024-11-25) aarch64 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 12 (bookworm)
Release:	12
Codename:	bookworm

$ cat /proc/cmdline
coherent_pool=1M 8250.nr_uarts=0 snd_bcm2835.enable_headphones=0 cgroup_disable=memory numa_policy=interleave snd_bcm2835.enable_headphones=1 snd_bcm2835.enable_hdmi=1 snd_bcm2835.enable_hdmi=0  smsc95xx.macaddr=DC:A6:32:55:27:6A vc_mem.mem_base=0x3ec00000 vc_mem.mem_size=0x40000000  console=ttyS0,115200 console=tty1 root=PARTUUID=ffcdc7da-02 rootfstype=ext4 fsck.repair=yes rootwait

Kernel has the same version now, but for a different architecture: Raspberry Pi OS Lite Pi 4: 6.6.62+rpt-rpi-v8 Raspberry Pi OS Lite Pi 5: 6.6.62+rpt-rpi-2712

Sorted kernel parameter compare between Pi 4 and Pi 5

Image

As I could not reproduce the issue on the Pi 4 and we are dealing with the same kernel version, either the different ARM Cortex core makes the difference, the kernel build or the kernel parameters.

Another idea what to test next? I only have on Pi 5 and would install Ubuntu Server to actually use it. That makes it harder for me to do more experiments as I have to wipe it for the error to appear.

Just for my information, what is the error Invalid argument actually indicating? I see the error is produced here and reported here. So which of the arguments is invalid?

I think this error in general is due to a failure of madvise(..., MADV_DONTNEED) system call when trying to evict a frame from the buffer manager. Since the environment is Linux, the Windows-related system call should not apply.

@ubmarco
Copy link
Author

ubmarco commented Jan 17, 2025

Yeah, just saw I was looking at the Windows lines. Will stop debugging the code :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants