ZFS zpool vanishing after upgrading ProxmoxVE 5.4 to 6.2
Posted on Sat 23 May 2020 in Tech
Performing a major version upgrade is never pleasant. I've been using ProxmoxVE for about ten years now though, and it's consistently done a fantastic job. Since it's based on Debian, all the upgrades are done with a simple
apt update && apt upgrade, with a variety of steps in the middle to point to new repositories, etc. Nothing out of the ordinary, nothing scary.
So I've got two servers - one is an old one, with only a couple testing VM's residing on it. I go ahead and work through the 5.x to 6.x upgrade guide, and everything Just Works. No problem with that one.
When I moved onto the other server I had a couple different things I noticed. For example, when trying to upgrade Corosync 2.x to 3.x (a prerequisite for the OS upgrade), I noticed it was trying to uninstall things like
corosync-pve, which really didn't seem right.
dpkg -l contents between the two servers, I learned that
corosync-pve is actually just a transitional package - completely replaced by the new
corosync package in version 3. And although the server was trying to uninstall
corosync-pve, it was then immediately installing
corosync... so no problem.
Things continued to be mildly alarming during the upgrade itself, but not actually toooo bad, as all the removed packages were being replaced. This one, for example:
dpkg: pve-libspice-server1: dependency problems, but removing anyway as you requested: pve-qemu-kvm depends on pve-libspice-server1 (>= 0.12.5-1); however: Package pve-libspice-server1 is to be removed. spiceterm depends on libspice-server1 (>= 0.12.2); however: Package libspice-server1 is not installed. Package pve-libspice-server1 which provides libspice-server1 is to be removed. (Reading database ... 65255 files and directories currently installed.) Removing pve-libspice-server1 (0.14.1-2) ...
... was replaced with
spiceterm. No worries.
However things went sharply downhill following reboot. All of my VM images were missing! I had a ZFS zpool that should have been at
/VM_Local_zpool - that folder existed, and contained an
./images folder, but that was completely empty.
I quickly checked the storage information in Proxmox, and the result was... not good:
... Really, not good. :-|
So I SSH'ed into the server and had a quick look at my list of ZFS stores, and discovered that my dataset was entirely absent:
# zfs list NAME USED AVAIL REFER MOUNTPOINT rpool 83.8G 23.8G 96K /rpool rpool/ROOT 3.65G 23.8G 96K /rpool/ROOT rpool/ROOT/pve-1 3.65G 23.8G 3.65G / rpool/data 71.6G 23.8G 71.6G /rpool/data rpool/swap 8.50G 25.4G 6.85G - # zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT rpool 111G 82.1G 28.9G - - 66% 73% 1.00x ONLINE -
A bit of Googling led me to the
zfs import command, which resulted in the first piece of good news of the night:
# zpool import pool: VM_Local_zpool id: 6614373363984244305 state: ONLINE status: Some supported features are not enabled on the pool. action: The pool can be imported using its name or numeric identifier, though some features will not be available without an explicit 'zpool upgrade'. config: VM_Local_zpool ONLINE mirror-0 ONLINE wwn-0x50014ee0aeee96ef ONLINE wwn-0x50014ee05998ee25 ONLINE
It exists! And it's ... online? But where? I don't get it. I went ahead and tried importing it, but got a "directory not empty" error:
# zpool import VM_Local_zpool cannot mount '/VM_Local_zpool': directory is not empty
Now, if I'd been paying attention there, I would have realised that
zpool list now contained my VM_Local_zpool:
# zfs list NAME USED AVAIL REFER MOUNTPOINT VM_Local_zpool 1.16T 611G 1.16T /VM_Local_zpool rpool 83.8G 23.8G 96K /rpool rpool/ROOT 3.65G 23.8G 96K /rpool/ROOT rpool/ROOT/pve-1 3.65G 23.8G 3.65G / rpool/data 71.6G 23.8G 71.6G /rpool/data rpool/swap 8.50G 25.4G 6.85G -
From this point, all I had to do was empty /VM_Local_zpool, and try to remount the zpool. But there's a second trick here.
ProxmoxVE has a pretty clear idea of How Storage Should Look, so every time it scans a storage device, is makes sure there's an
./images folder in any storage enabled for VM Images. It also makes sure there's a
./dump folder on any storage enabled as a backup target, and a
./template folder on any storage enabled for ISO images. (Reference: https://pve.proxmox.com/wiki/Storage:_Directory)
So, when I removed that /VM_Local_zpool/images folder, Proxmox just... went ahead and recreated it almost immediately, so I still couldn't mount the zpool there.
In the end, I managed to get my zpool remounted by pairing the
mount commands together on the same line:
# rmdir /VM_Local_zpool/images/ # zfs mount VM_Local_zpool cannot mount '/VM_Local_zpool': directory is not empty # rmdir /VM_Local_zpool/images/ && zfs mount VM_Local_zpool # cd VM_Local_zpool/ /VM_Local_zpool# ls backup images
... and there we go! My mount point, and all my VM images, are back unscathed!
And just for future reference - I saw a post on the FreeNAS forums that I should the status of the zfs-import-cache service, and enable it if it wasn't started by default... however it was fine:
# systemctl status zfs-import-cache.service ● zfs-import-cache.service - Import ZFS pools by cache file Loaded: loaded (/lib/systemd/system/zfs-import-cache.service; enabled; vendor preset: enabled) Active: active (exited) since Sat 2020-05-23 20:46:54 NZST; 3min 26s ago Docs: man:zpool(8) Process: 1819 ExecStart=/sbin/zpool import -c /etc/zfs/zpool.cache -aN (code=exited, status=0/SUCCESS) Main PID: 1819 (code=exited, status=0/SUCCESS) May 23 20:46:54 ~ systemd: Starting Import ZFS pools by cache file... May 23 20:46:54 ~ zpool: no pools available to import May 23 20:46:54 ~ systemd: Started Import ZFS pools by cache file.