Привет, мы тестируем новую версию 8.3 на нашей только что обновленной тестовой среде, и у нас возникают проблемы во время миграций ВМ. На физической машине установлена Proxmox 8.3 в актуальном состоянии. Три ВМ установлены с 8.3 Proxmox тоже в кластере CEPH для общего хранилища. Еще три ВМ установлены с 8.3 Proxmox тоже в кластере PVE. Этот кластер PVE использует кластер CEPH для общего хранилища RDB. Тест, который я проводил, заключался в том, чтобы иметь 5 ВМ Debian 12 в кластере PVE. Я запустил нагрузку на процессор и память на каждой ВМ. Я пытался мигрировать их из кластера PVE на другие узлы, по одному за раз, и некоторые миграции сталкивались с проблемами. Вот логи PVE для неудачной миграции:
Код:
2024-11-25 15:34:45 starting migration of VM 103 to node 'test-pve-03' (10.0.0.2)
2024-11-25 15:34:45 starting VM 103 on remote node 'test-pve-03'
2024-11-25 15:34:49 start remote tunnel
2024-11-25 15:34:51 ssh tunnel ver 1
2024-11-25 15:34:51 starting online/live migration on unix:/run/qemu-server/103.migrate
2024-11-25 15:34:51 set migration capabilities
2024-11-25 15:34:51 migration downtime limit: 100 ms
2024-11-25 15:34:51 migration cachesize: 512.0 MiB
2024-11-25 15:34:51 set migration parameters
2024-11-25 15:34:51 start migrate command to unix:/run/qemu-server/103.migrate
2024-11-25 15:34:52 migration active, transferred 111.6 MiB of 3.0 GiB VM-state, 154.3 MiB/s
2024-11-25 15:34:53 migration active, transferred 216.4 MiB of 3.0 GiB VM-state, 562.0 MiB/s
2024-11-25 15:34:54 migration active, transferred 342.5 MiB of 3.0 GiB VM-state, 245.3 MiB/s
2024-11-25 15:34:55 migration active, transferred 452.6 MiB of 3.0 GiB VM-state, 350.1 MiB/s
2024-11-25 15:34:56 migration active, transferred 549.6 MiB of 3.0 GiB VM-state, 477.7 MiB/s
2024-11-25 15:34:58 migration active, transferred 694.6 MiB of 3.0 GiB VM-state, 243.2 MiB/s
query migrate failed: VM 103 not running
2024-11-25 15:34:59 query migrate failed: VM 103 not running
query migrate failed: VM 103 not running
2024-11-25 15:35:00 query migrate failed: VM 103 not running
query migrate failed: VM 103 not running
2024-11-25 15:35:01 query migrate failed: VM 103 not running
query migrate failed: VM 103 not running
2024-11-25 15:35:02 query migrate failed: VM 103 not running
query migrate failed: VM 103 not running
2024-11-25 15:35:03 query migrate failed: VM 103 not running
query migrate failed: VM 103 not running
2024-11-25 15:35:04 query migrate failed: VM 103 not running
2024-11-25 15:35:04 ERROR: online migrate failure - too many query migrate failures - aborting
2024-11-25 15:35:04 aborting phase 2 - cleanup resources
2024-11-25 15:35:04 migrate_cancel
2024-11-25 15:35:04 migrate_cancel error: VM 103 not running
2024-11-25 15:35:04 ERROR: query-status error: VM 103 not running
2024-11-25 15:35:08 ERROR: migration finished with problems (duration 00:00:23)
TASK ERROR: migration problems. Вот что я нашел в системных логах исходного PVE-сервера во время неудачных миграций:
Код:
2024-11-25T11:45:50.208525+01:00 test-pve-01 kernel: [ 3128.811536] kvm[1729]: segfault at 41b8 ip 00005817d7fcfb00 sp 00007979361fff38 error 4 in qemu-system-x86_64[5b71bd9e8000+6a4000] likely on CPU 1 (core 1, socket 0)
2024-11-25T13:26:18.392760+01:00 test-pve-01 kernel: [ 506.380955] kvm[1829]: segfault at 41b8 ip 00005df40925eb00 sp 000070abe59fff38 error 4 in qemu-system-x86_64[5df408d7b000+6a4000] likely on CPU 2 (core 2, socket 0)
2024-11-25T14:15:13.045343+01:00 test-pve-01 kernel: [ 313.101418] kvm[1829]: segfault at 41b8 ip 000060607b764b00 sp 00007b33f9fe1f38 error 4 in qemu-system-x86_64[60607b281000+6a4000] likely on CPU 2 (core 2, socket 0)
2024-11-25T14:39:18.392760+01:00 test-pve-01 kernel: [ 676.009453] kvm[2609]: segfault at 41b8 ip 00005b71bdecbb00 sp 00007222f02ccf38 error 4 in qemu-system-x86_64[5b71bd9e8000+6a4000] likely on CPU 1 (core 1, socket 0)
2024-11-25T15:34:59.208184+01:00 test-pve-01 kernel: [ 1373.633994] kvm[7867]: segfault at 41b8 ip 00005817d7fcfb00 sp 00007979361fff38 error 4 in qemu-system-x86_64[5b71bd9e8000+6a4000] likely on CPU 0 (core 0, socket 0)
Спасибо за вашу помощь.
Фабиен
Код:
2024-11-25 15:34:45 starting migration of VM 103 to node 'test-pve-03' (10.0.0.2)
2024-11-25 15:34:45 starting VM 103 on remote node 'test-pve-03'
2024-11-25 15:34:49 start remote tunnel
2024-11-25 15:34:51 ssh tunnel ver 1
2024-11-25 15:34:51 starting online/live migration on unix:/run/qemu-server/103.migrate
2024-11-25 15:34:51 set migration capabilities
2024-11-25 15:34:51 migration downtime limit: 100 ms
2024-11-25 15:34:51 migration cachesize: 512.0 MiB
2024-11-25 15:34:51 set migration parameters
2024-11-25 15:34:51 start migrate command to unix:/run/qemu-server/103.migrate
2024-11-25 15:34:52 migration active, transferred 111.6 MiB of 3.0 GiB VM-state, 154.3 MiB/s
2024-11-25 15:34:53 migration active, transferred 216.4 MiB of 3.0 GiB VM-state, 562.0 MiB/s
2024-11-25 15:34:54 migration active, transferred 342.5 MiB of 3.0 GiB VM-state, 245.3 MiB/s
2024-11-25 15:34:55 migration active, transferred 452.6 MiB of 3.0 GiB VM-state, 350.1 MiB/s
2024-11-25 15:34:56 migration active, transferred 549.6 MiB of 3.0 GiB VM-state, 477.7 MiB/s
2024-11-25 15:34:58 migration active, transferred 694.6 MiB of 3.0 GiB VM-state, 243.2 MiB/s
query migrate failed: VM 103 not running
2024-11-25 15:34:59 query migrate failed: VM 103 not running
query migrate failed: VM 103 not running
2024-11-25 15:35:00 query migrate failed: VM 103 not running
query migrate failed: VM 103 not running
2024-11-25 15:35:01 query migrate failed: VM 103 not running
query migrate failed: VM 103 not running
2024-11-25 15:35:02 query migrate failed: VM 103 not running
query migrate failed: VM 103 not running
2024-11-25 15:35:03 query migrate failed: VM 103 not running
query migrate failed: VM 103 not running
2024-11-25 15:35:04 query migrate failed: VM 103 not running
2024-11-25 15:35:04 ERROR: online migrate failure - too many query migrate failures - aborting
2024-11-25 15:35:04 aborting phase 2 - cleanup resources
2024-11-25 15:35:04 migrate_cancel
2024-11-25 15:35:04 migrate_cancel error: VM 103 not running
2024-11-25 15:35:04 ERROR: query-status error: VM 103 not running
2024-11-25 15:35:08 ERROR: migration finished with problems (duration 00:00:23)
TASK ERROR: migration problems. Вот что я нашел в системных логах исходного PVE-сервера во время неудачных миграций:
Код:
2024-11-25T11:45:50.208525+01:00 test-pve-01 kernel: [ 3128.811536] kvm[1729]: segfault at 41b8 ip 00005817d7fcfb00 sp 00007979361fff38 error 4 in qemu-system-x86_64[5b71bd9e8000+6a4000] likely on CPU 1 (core 1, socket 0)
2024-11-25T13:26:18.392760+01:00 test-pve-01 kernel: [ 506.380955] kvm[1829]: segfault at 41b8 ip 00005df40925eb00 sp 000070abe59fff38 error 4 in qemu-system-x86_64[5df408d7b000+6a4000] likely on CPU 2 (core 2, socket 0)
2024-11-25T14:15:13.045343+01:00 test-pve-01 kernel: [ 313.101418] kvm[1829]: segfault at 41b8 ip 000060607b764b00 sp 00007b33f9fe1f38 error 4 in qemu-system-x86_64[60607b281000+6a4000] likely on CPU 2 (core 2, socket 0)
2024-11-25T14:39:18.392760+01:00 test-pve-01 kernel: [ 676.009453] kvm[2609]: segfault at 41b8 ip 00005b71bdecbb00 sp 00007222f02ccf38 error 4 in qemu-system-x86_64[5b71bd9e8000+6a4000] likely on CPU 1 (core 1, socket 0)
2024-11-25T15:34:59.208184+01:00 test-pve-01 kernel: [ 1373.633994] kvm[7867]: segfault at 41b8 ip 00005817d7fcfb00 sp 00007979361fff38 error 4 in qemu-system-x86_64[5b71bd9e8000+6a4000] likely on CPU 0 (core 0, socket 0)
Спасибо за вашу помощь.
Фабиен
