ОШИБКА ЗАДАЧИ: время ожидания истекло при работе с systemd, Proxmox Виртуальная Среда
rauth
Guest
0
21.06.2024 18:59:00
Приветствую! Недавно столкнулся с большой проблемой в кластере Proxmox 8. Все началось с того, что Windows VM не запускалась и в логе задач появлялась следующая ошибка: TASK ERROR: timeout waiting on systemd. После некоторого изучения форумов Proxmox, некоторые пользователи сообщали, что перезапуск хоста решает проблему, и я последовал этому совету. Я начал мигрировать ВМ на другой хост, после чего начался каскад других проблем: некоторые живые миграции "зависали", ВМ переставала отвечать, другие миграции выдавали ошибку, и я мог мигрировать только с выключенной ВМ, другие миграции даже успешно завершались, но задача "migrate" никогда не доходила до конца, достигая 100% и не показывая "success", а ошибка rbd: rbd2: no lock owners detected постоянно присутствовала в syslog. В итоге, когда мне удалось удалить все ВМ, я перезапустил хост, и на мониторе Proxmox host monitor появилось сообщение о чем-то связанном с таймаутом при завершении процессов RBD. Мы используем хранилище Ceph в кластере из 8 нод версии 17.2, и эта проблема возникала конкретно с Windows ВМ, использующими устройство TPM, кажется, что процесс TPM зависал на 2 хостах, поддерживающих Windows ВМ. После перезапуска этих 2 хостов все вернулось в нормальное состояние, но меня беспокоит, что я не знаю, как возникла эта проблема, я знаю лишь, что она связана с процессом RBD на устройстве TPM Windows ВМ.
После того, что произошло, я обновил PVE версию, и она работает на 8.2, этой проблемы больше не возникло. Но я так и не понял, в чем была причина. Надеюсь, так и останется.
lknite
Guest
0
24.10.2024 02:49:00
Я тоже заинтересован в этом. Сейчас вижу ошибку 'Error: timeout waiting on systemd' и ищу возможные решения, чтобы понять, что происходит.
cfgmgr
Guest
0
20.11.2024 14:11:00
Заметил это же, когда переносим на новый хост с 8.2.7. Пытаюсь пропатчить остальные. Ceph все еще v17.2, и нужно его обновить.
davispuh
Guest
0
22.04.2025 20:22:00
Я тоже сталкивался с этой проблемой. При поиске по форуму по запросу "timeout waiting on systemd" можно найти множество похожих сообщений, например: [ >), [ >), [ >). Похоже, что это какой-то баг, который не позволяет корректно завершить работу VM после сбоя, что препятствует её повторному запуску. Я использую Proxmox VE 8.3.3 с ядром 6.8.12-7-pve и не использую Ceph или ZFS. В общем, VM каким-то образом вылетела, и в dmesg я вижу: Code: [17023.949984] kvm: SMP vm created on host with unstable TSC; guest TSC will not be reliable [17571.802191] INFO: task pvedaemon worke:1813 blocked for more than 122 seconds. [17571.802198] Not tainted 6.8.12-7-pve #1 [17571.802200] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [17571.802203] task:pvedaemon worke state:D stack:0 pid:1813 tgid:1813 ppid:1812 flags:0x00000002 [17571.802207] Call Trace: [17571.802212] <TASK> [17571.802308] __schedule+0x42b/0x1500 [17571.802467] ? restore_fpregs_from_fpstate+0x3d/0xd0 [17571.802472] schedule+0x33/0x110 [17571.802475] kvm_async_pf_task_wait_schedule+0x171/0x1b0 [17571.802479] __kvm_handle_async_pf+0x5c/0xe0 [17571.802488] exc_page_fault+0xb6/0x1b0 [17571.802491] asm_exc_page_fault+0x27/0x30 [17571.802613] RIP: 0033:0x57453ce9a170 [17571.802725] RSP: 002b:00007ffdccf00030 EFLAGS: 00010202 [17571.802728] RAX: 00007fd0799000b8 RBX: 0000574545e86520 RCX: 00007fd0798e5010 [17571.802730] RDX: 0000000000000007 RSI: 0000574549bdd888 RDI: 00005745446842a0 [17571.802732] RBP: 00000000cc5e3615 R08: 0000000000000000 R09: 00005745495da058 [17571.802733] R10: 000057454cb74e58 R11: 0000574549bdd880 R12: 000057454468c960 [17571.802734] R13: 0000574545e86520 R14: 0000574549bdd888 R15: 00000000cc5e3615 [17571.802808] </TASK> [17571.802191] INFO: task pvedaemon worke:1813 blocked for more than 122 seconds. [17571.802198] Not tainted 6.8.12-7-pve #1 [17571.802200] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [17571.802203] task:pvedaemon worke state:D stack:0 pid:1813 tgid:1813 ppid:1812 flags:0x00000002 [17571.802207] Call Trace: [17571.802212] <TASK> [17571.802308] __schedule+0x42b/0x1500 [17571.802467] ? restore_fpregs_from_fpstate+0x3d/0xd0 [17571.802472] schedule+0x33/0x110 [17571.802475] kvm_async_pf_task_wait_schedule+0x171/0x1b0 [17571.802479] __kvm_handle_async_pf+0x5c/0xe0 [17571.802488] exc_page_fault+0xb6/0x1b0 [17571.802491] asm_exc_page_fault+0x27/0x30 [17571.802613] RIP: 0033:0x57453ce9a170 [17571.802725] RSP: 002b:00007ffdccf00030 EFLAGS: 00010202 [17571.802728] RAX: 00007fd0799000b8 RBX: 0000574545e86520 RCX: 00007fd0798e5010 [17571.802730] RDX: 0000000000000007 RSI: 0000574549bdd888 RDI: 00005745446842a0 [17571.802732] RBP: 00000000cc5e3615 R08: 0000000000000000 R09: 00005745495da058 [17571.802733] R10: 000057454cb74e58 R11: 0000574549bdd880 R12: 000057454468c960 [17571.802734] R13: 0000574545e86520 R14: 0000574549bdd888 R15: 00000000cc5e3615 [17571.802808] </TASK> [17571.802191] INFO: task pvedaemon worke:1813 blocked for more than 122 seconds. [17571.802198] Not tainted 6.8.12-7-pve #1 [17571.802200] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [17571.802203] task:pvedaemon worke state:D stack:0 pid:1813 tgid:1813 ppid:1812 flags:0x00000002 [17571.802207] Call Trace: [17571.802212] <TASK> [17571.802308] __schedule+0x42b/0x1500 [17571.802467] ? restore_fpregs_from_fpstate+0x3d/0xd0 [17571.802472] schedule+0x33/0x110 [17571.802475] kvm_async_pf_task_wait_schedule+0x171/0x1b0 [17571.802479] __kvm_handle_async_pf+0x5c/0xe0 [17571.802488] exc_page_fault+0xb6/0x1b0 [17571.802491] asm_exc_page_fault+0x27/0x30 [17571.802613] RIP: 0033:0x57453ce9a170 [17571.802725] RSP: 002b:00007ffdccf00030 EFLAGS: 00010202 [17571.802728] RAX: 00007fd0799000b8 RBX: 0000574545e86520 RCX: 00007fd0798e5010 [17571.802730] RDX: 0000000000000007 RSI: 0000574549bdd888 RDI: 00005745446842a0 [17571.802732] RBP: 00000000cc5e3615 R08: 0000000000000000 R09: 00005745495da058 [17571.802733] R10: 000057454cb74e58 R11: 0000574549bdd880 R12: 000057454468c960 [17571.802734] R13: 0000574545e86520 R14: 0000574549bdd888 R15: 00000000cc5e3615 [17571.802808] </TASK> [17571.802191] INFO: task pvedaemon worke:1813 blocked for more than 122 seconds. [17571.802198] Not tainted 6.8.12-7-pve #1 [17571.802200] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [17571.802203] task:pvedaemon worke state:D stack:0 pid:1813 tgid:1813 ppid:1812 flags:0x00000002 [17571.802207] Call Trace: [17571.802212] <TASK> [17571.802308] __schedule+0x42b/0x1500 [17571.802467] ? restore_fpregs_from_fpstate+0x3d/0xd0 [17571.802472] schedule+0x33/0x110 [17571.802475] kvm_async_pf_task_wait_schedule+0x171/0x1b0 [17571.802479] __kvm_handle_async_pf+0x5c/0xe0 [17571.802488] exc_page_fault+0xb6/0x1b0 [17571.802491] asm_exc_page_fault+0x27/0x30 [17571.802613] RIP: 0033:0x57453ce9a170 [17571.802725] RSP: 002b:00007ffdccf00030 EFLAGS: 00010202 [17571.802728] RAX: 00007fd0799000b8 RBX: 0000574545e86520 RCX: 00007fd0798e5010 [17571.802730] RDX: 0000000000000007 RSI: 0000574549bdd888 RDI: 00005745446842a0 [17571.802732] RBP: 00000000cc5e3615 R08: 0000000000000000 R09: 00005745495da058 [17571.802733] R10: 000057454cb74e58 R11: 0000574549bdd880 R12: 000057454468c960 [17571.802734] R13: 0000574545e86520 R14: 0000574549bdd888 R15: 00000000cc5e3615 [17571.802808] </TASK> [17571.802191] INFO: task pvedaemon worke:1813 blocked for more than 122 seconds. [17571.802198] Not tainted 6.8.12-7-pve #1 [17571.802200] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [17571.802203] task:pvedaemon worke state:D stack:0 pid:1813 tgid:1813 ppid:1812 flags:0x00000002 [17571.802207] Call Trace: [17571.802212] <TASK> [17571.802308] __schedule+0x42b/0x1500 [17571.802467] ? restore_fpregs_from_fpstate+0x3d/0xd0 [17571.802472] schedule+0x33/0x110 [17571.802475] kvm_async_pf_task_wait_schedule+0x171/0x1b0 [17571.802479] __kvm_handle_async_pf+0x5c/0xe0 [17571.802488] exc_page_fault+0xb6/0x1b0 [17571.802491] asm_exc_page_fault+0x27/0x30 [17571.802613] RIP: 0033:0x57453ce9a170 [17571.802725] RSP: 002b:00007ffdccf00030 EFLAGS: 00010202 [17571.802728] RAX: 00007fd0799000b8 RBX: 0000574545e86520 RCX: 00007fd0798e5010 [17571.802730] RDX: 0000000000000007 RSI: 0000574549bdd888 RDI: 00005745446842a0 [17571.802732] RBP: 00000000cc5e3615 R08: 0000000000000000 R09: 00005745495da058 [17571.802733] R10: 000057454cb74e58 R11: 0000574549bdd880 R12: 000057454468c960 [17571.802734] R13: 0000574545e86520 R14: 0000574549bdd888 R15: 00000000cc5e3615 [17571.802808] </TASK> [17571.802191] INFO: task pvedaemon worke:1813 blocked for more than 122 seconds. [17571.802198] Not tainted 6.8.12-7-pve #1 [17571.802200] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [17571.802203] task:pvedaemon worke state:D stack:0 pid:1813 tgid:1813 ppid:1812 flags:0x00000002 [17571.802207] Call Trace: [17571.802212] <TASK> [17571.802308] __schedule+0x42b/0x1500 [17571.802467] ? restore_fpregs_from_fpstate+0x3d/0xd0 [17571.802472] schedule+0x33/0x110 [17571.802475] kvm_async_pf_task_wait_schedule+0x171/0x1b0 [17571.802479] __kvm_handle_async_pf+0x5c/0xe0 [17571.802488] exc_page_fault+0xb6/0x1b0 [17571.802491] asm_exc_page_fault+0x27/0x30 [17571.802613] RIP: 0033:0x57453ce9a170 [17571.802725] RSP: 002b:00007ffdccf00030 EFLAGS: 00010202 [17571.802728] RAX: 00007fd0799000b8 RBX: 0000574545e86520 RCX: 00007fd0798e5010 [17571.802730] RDX: 0000000000000007 RSI: 0000574549bdd888 RDI: 00005745446842a0 [17571.802732] RBP: 00000000cc5e3615 R08: 0000000000000000 R09: 00005745495da058 [17571.802733] R10: 000057454cb74e58 R11: 0000574549bdd880 R12: 000057454468c960 [17571.802734] R13: 0000574545e86520 R14: 0000574549bdd888 R15: 00000000cc5e3615 [17571.802808] </TASK>
This is a log message indicating a process named "pvedaemon worke" (likely a Proxmox Virtual Environment daemon worker) has been blocked for a substantial amount of time (over 122 seconds). The "hung task timeout" message tells you that the kernel detected a process is stuck and hasn't returned control to the scheduler. This can lead to system sluggishness or even a complete freeze. The "Call Trace" gives you a stack of function calls that led to the process being stuck. It points to the KVM (Kernel-based Virtual Machine) hypervisor being involved.
Here's a breakdown of what this suggests and how to troubleshoot it:
**Possible Causes:**
* **Virtual Machine Issues:** The most likely culprit is a problem within a virtual machine that this worker thread is managing. This could be: * **Buggy Guest OS:** A flawed driver, application, or OS kernel in the VM. * **Resource Starvation:** The VM is desperately trying to access a resource (disk I/O, memory, network) that is heavily contended. * **Deadlock:** Two or more processes within the VM are waiting for each other, creating a circular dependency. * **Infinite Loop:** An application inside the VM has entered an infinite loop. * **Host Resource Contention:** While less likely than a VM issue, the host system itself might be experiencing resource shortages (CPU, memory, disk I/O) which are impacting the pvedaemon worker. * **Bug in Proxmox VE:** A bug in the Proxmox VE hypervisor itself could be causing the worker to hang. This is less common but possible. * **Disk I/O Bottleneck:** The pvedaemon might be waiting on slow disk I/O to complete, especially if it's performing operations for multiple VMs.
**Troubleshooting Steps:**
1. **Identify Affected VM:** This is crucial. The logs usually contain more information to pinpoint which VM is causing the issue. Check Proxmox VE's web interface or command-line tools to see the state of all VMs. Look for any that are unresponsive or experiencing high resource usage. 2. **Check VM Resource Usage:** * **Proxmox Web Interface:** Use the Proxmox web interface to monitor the VM's CPU, memory, disk I/O, and network usage. Look for any spikes or consistently high utilization. * **Inside the VM:** Connect to the VM (if possible) and use tools like `top`, `htop`, `iostat`, and `netstat` to identify processes consuming excessive resources. 3. **Check Disk I/O:** High disk I/O can block the whole system. Check if any disk are saturated. 4. **Attempt to Restart the VM (Carefully):** If you suspect a VM is the problem, try restarting it. *However*, be aware that if the VM is in a critical state (e.g., performing a database transaction), abruptly restarting it can lead to data corruption. Consider stopping the VM gracefully first, if possible. 5. **Check Proxmox VE Logs:** Look at the `/var/log/syslog` and `/var/log/pve/corosync.log` on the Proxmox VE host. These logs may contain more detailed information about the hang and any errors that occurred. 6. **Update/Upgrade:** Make sure your Proxmox VE host and the guest operating systems are up to date with the latest patches and updates. Bugs are often fixed in newer versions. 7. **Check for Firmware Issues:** Especially if you have a lot of VMs running, disk or network firmware issues can cause problems.
**How to Interpret the Call Trace (Advanced):**
The call trace helps pinpoint where the problem is occurring. Here's a general understanding:
* `kvm_async_pf_task_wait_schedule`: This strongly suggests that the issue is related to KVM's asynchronous page fault handling. Page faults are when a VM tries to access memory that isn't currently in RAM. The VM requests the memory from the hypervisor. The hypervisor needs to retrieve the memory from the disk or find it in memory. If this process is taking too long, it can lead to the worker thread being blocked. * `restore_freg`: the function `restore_freg` is a low level function used in memory management and context switching. * `__schedule`: this is a core kernel function for scheduling processes. It's being called because the thread is waiting on something.
The call trace, combined with resource monitoring and Proxmox VE logs, should provide enough clues to identify the root cause and take corrective action.
GTA_doum
Guest
0
28.05.2025 15:37:00
Привет! У меня та же проблема! Виртуальная машина зависла ночью, я её остановил, а теперь слайс застрял в процессах. kill -9 не помогает. Есть какие-нибудь варианты, кроме перезапуска сервера? Если это единственный выход, придётся ждать до утра...