Spring is in the air
Its March already and I am feeling much better about how my health is improving after longer walks and cycle rides (although medical test results are still concerning if they are to be believed). I should be able to get a lot of interesting work done this month :fingers-crossed: (and probably lots more sleep).
Continued searching for a resolution to the degraded RAID array in the QNap NAS device (Network Attached Storage) which was not automatically rebuilding when adding a new disk.
Clojurist Together has are accepting proposals for Q2 2025 ($33K 5-7 Projects). The deadline to submit a proposal is March 17th.
Consider writing articles for Dev, clojure and more general engineering topics (from the Practicalli Engineering Playbook)
There is also a writing challenge about the future of software
Strava login failureλ︎
I wanted to review the details of a ride I did last summer which is much easier to do on the Strava website than on the Strava App.
Visiting the Strava.com login page (using Firefox version 128.7.0esr) and entering my email address returned an unhelpful error message: "An unexpected error occurred. Please try again."
Unsurprisingly when I tried again the same thing happened over and over again.
Looking at the Strava Support pages showed nothing of relevance, so with trepidation I tried the chatbot (in summary it failed to help and was annoying to interact with).
The chatbot failed in several ways
- referred me to an article that was not relevant (specifically about iOS and several years old)
- gave irrelevant advice (suggested resetting password even though it was the email that was failing and not the password)
- suggested I raise a ticket, which is not possible unless you can login to Strava (A feeling of Catch 22)
Catch 22 definition
A problem for which the only solution is denied by a circumstance inherent in the problem (or by a rule).
In the Strava app on Android I was logged in, so tried to raise a ticket there. Selecting "Create a ticket", I was redirected by the Strava app to the browser on the Mobile phone. This time I was able to login and create the ticket.
My assumption: the error is related to the new passcode feature during the login process when using an email address. The passcode feature allows login with email and a 4 digit passcode, rather than email and password. I declined to set that up as it seemed less secure than using email and a password managed by a password manager (i.e. Nordpass, 1Password, etc).
I also tried Chrome browser which did not have an issue login in, so I'll have to use that for now.
NAS repairλ︎
I have a RAID array in my QNap NAS device that is not co-operating.
One of the drives failed when rebuilding the RAID after adding two replacement drives. Even though the failed hard drive seems to be okay (I think it overheated during the RAID rebuild), the RAID is not accessible
Only replace one hard drive for greater safety rebuilding the RAID
Even with a RAID 6 array, it seems prudent to only replace one drive at a time.
[~] # mdadm --detail /dev/md0
/dev/md0:
Version : 01.00.03
Creation Time : Sat Jan 23 19:04:18 2021
Raid Level : raid6
Array Size : 17572185216 (16758.14 GiB 17993.92 GB)
Used Dev Size : 2928697536 (2793.02 GiB 2998.99 GB)
Raid Devices : 8
Total Devices : 7
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Mon Mar 3 14:29:03 2025
State : clean, degraded
Active Devices : 7
Working Devices : 7
Failed Devices : 0
Spare Devices : 0
Chunk Size : 64K
Name : 0
UUID : cf1b4f4e:6d83f848:0afc40fb:20a69932
Events : 13813
Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 0 0 1 removed
2 8 35 2 active sync /dev/sdc3
3 8 51 3 active sync /dev/sdd3
4 8 67 4 active sync /dev/sde3
8 8 83 5 active sync /dev/sdf3
6 8 99 6 active sync /dev/sdg3
7 8 115 7 active sync /dev/sdh3
The failed drive is showing as removed. Unpluging the drive and plugin in again does not add the drive back to the RAID array (I assume this is because the RAID is degraded).
Checking the file system or trying to stop the RAID array fails as something seems to be using the RAID array. I assume this is the qnap services
[~] # e2fsck_64 -fp -C 0 /dev/md0
/dev/md0 is in use.
e2fsck: Cannot continue, aborting.
[~] # mdadm --stop /dev/md0
mdadm: fail to stop array /dev/md0: Device or resource busy
[~] #
Set global spareλ︎
The QNap FAQ - what to do if rebuilding doesnt start with new drive in degraded mode suggests setting the drive to a Global space. The article says this will start rebuilding the RAID.
In the Qnap Admin web UI I opened the Storage Manager. In the RAID tab I clicked on Drive 2 and selected the Action "Set as Global Spare"
Unfortunately this didn't seem to work as nothing changed in the web UI, even after a refresh (maybe a reboot is required?). After a while I removed the global spare via the Storage Manager web UI.
Switching to the command line.
From the Stack Exchange post, I forced the RAID to go into resync by manually adding the removed disk and now it seems to be rebuilding (according to mdadm --details
)
[/etc/init.d] # mdadm --readwrite /dev/md0
mdadm: failed to set writable for /dev/md0: Device or resource busy
[/etc/init.d] # mdadm --add /dev/md0 /dev/sdb3
mdadm: added /dev/sdb3
## QNap started beeping for a few seconds (assumingly because a new drive was added to the array)
[/etc/init.d] # mdadm --misc --detail /dev/md0
/dev/md0:
Version : 01.00.03
Creation Time : Sat Jan 23 19:04:18 2021
Raid Level : raid6
Array Size : 17572185216 (16758.14 GiB 17993.92 GB)
Used Dev Size : 2928697536 (2793.02 GiB 2998.99 GB)
Raid Devices : 8
Total Devices : 8
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Mon Mar 3 15:27:29 2025
State : clean, degraded, recovering
Active Devices : 7
Working Devices : 8
Failed Devices : 0
Spare Devices : 1
Chunk Size : 64K
Rebuild Status : 0% complete
Name : 0
UUID : cf1b4f4e:6d83f848:0afc40fb:20a69932
Events : 13965
Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
9 8 19 1 spare rebuilding /dev/sdb3
2 8 35 2 active sync /dev/sdc3
3 8 51 3 active sync /dev/sdd3
4 8 67 4 active sync /dev/sde3
8 8 83 5 active sync /dev/sdf3
6 8 99 6 active sync /dev/sdg3
7 8 115 7 active sync /dev/sdh3
The Storage Manager in the QNap Admin webUI does not show the RAID being rebuilt or the spare. However there is a log entry in the System Logs from the QNap Admin webUI.
Type Date Time Users Source IP Computer name Content
Info 2025/03/03 15:27:03 System 127.0.0.1 localhost [RAID6 Disk Volume: Drive 1 3 4 5 6 7 8] Start rebuilding.
I left the RAID rebuild running and went for a long walk. Looking at the rebuild progress when I returned from the walk, I estimate it will take at least 20 hours to rebuild. If all goes well it should finish before lunchtime tomorrow.
Hopefully the drive that overheated will hold on long enough for the RAID array to become healthy. Then I will swap out the failing drive for a relatively new one.
I will check all the important files have been backed up first :)
There are two other drives that have smart warnings, so I will follow my own advice and only replace them one at a time and let the RAID array rebuild with extra redundancy if another drive fails unexpectedly.
Eventually I will replace all the mechanical drives with Solid State Drives (SSD) as these seem to be very reliable from my experiences with recent laptops.
dmesgλ︎
[~] # dmesg
th0
[151084.237046] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[151087.247846] nfsd: last server has exited, flushing export cache
[151087.550782] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
[151087.552884] NFSD: starting 90-second grace period
[151177.783322] ata5: exception Emask 0x50 SAct 0x0 SErr 0x4090800 action 0xe frozen
[151177.785466] ata5: irq_stat 0x00400040, connection status changed
[151177.787670] ata5: SError: { HostInt PHYRdyChg 10B8B DevExch }
[151177.789936] ata5: hard resetting link
[151178.515039] ata5: SATA link down (SStatus 0 SControl 330)
[151183.517041] ata5: hard resetting link
[151183.824035] ata5: SATA link down (SStatus 0 SControl 330)
[151183.826448] ata5: limiting SATA link speed to 1.5 Gbps
[151188.826032] ata5: hard resetting link
[151189.133037] ata5: SATA link down (SStatus 0 SControl 310)
[151189.135341] ata5.00: disabled
[151189.137654] ata5: EH complete
[151189.139027] sd 4:0:0:0: rejecting I/O to offline device
[151189.139035] sd 4:0:0:0: [sdb] killing request
[151189.139069] sd 4:0:0:0: rejecting I/O to offline device
[151189.139079] md: super_written gets error=-5, uptodate=0
[151189.139087] md/raid1:md9: Disk failure on sdb1, disabling device.
[151189.139090] md/raid1:md9: Operation continuing on 7 devices.
[151189.154891] ata5.00: detaching (SCSI 4:0:0:0)
[151189.171768] sd 4:0:0:0: [sdb] Synchronizing SCSI cache
[151189.175107] sd 4:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[151189.178767] sd 4:0:0:0: [sdb] Stopping disk
[151189.181950] sd 4:0:0:0: [sdb] START_STOP FAILED
[151189.184771] sd 4:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[151189.242728] RAID1 conf printout:
[151189.242738] --- wd:7 rd:8
[151189.242746] disk 0, wo:0, o:1, dev:sda1
[151189.242753] disk 1, wo:1, o:0, dev:sdb1
[151189.242760] disk 2, wo:0, o:1, dev:sdf1
[151189.242767] disk 3, wo:0, o:1, dev:sde1
[151189.242774] disk 4, wo:0, o:1, dev:sdd1
[151189.242781] disk 5, wo:0, o:1, dev:sdc1
[151189.242789] disk 6, wo:0, o:1, dev:sdg1
[151189.242796] disk 7, wo:0, o:1, dev:sdh1
[151189.250047] RAID1 conf printout:
[151189.250056] --- wd:7 rd:8
[151189.250063] disk 0, wo:0, o:1, dev:sda1
[151189.250068] disk 2, wo:0, o:1, dev:sdf1
[151189.250073] disk 3, wo:0, o:1, dev:sde1
[151189.250078] disk 4, wo:0, o:1, dev:sdd1
[151189.250083] disk 5, wo:0, o:1, dev:sdc1
[151189.250089] disk 6, wo:0, o:1, dev:sdg1
[151189.250094] disk 7, wo:0, o:1, dev:sdh1
[151190.660690] md/raid1:md8: Disk failure on sdb2, disabling device.
[151190.660694] md/raid1:md8: Operation continuing on 1 devices.
[151190.712117] RAID1 conf printout:
[151190.712126] --- wd:1 rd:2
[151190.712132] disk 0, wo:0, o:1, dev:sda2
[151190.712137] disk 1, wo:1, o:0, dev:sdb2
[151190.717019] RAID1 conf printout:
[151190.717026] --- wd:1 rd:2
[151190.717032] disk 0, wo:0, o:1, dev:sda2
[151190.717048] RAID1 conf printout:
[151190.717052] --- wd:1 rd:2
[151190.717056] disk 0, wo:0, o:1, dev:sda2
[151190.717061] disk 1, wo:1, o:1, dev:sdh2
[151190.717072] RAID1 conf printout:
[151190.717076] --- wd:1 rd:2
[151190.717080] disk 0, wo:0, o:1, dev:sda2
[151190.717085] disk 1, wo:1, o:1, dev:sdh2
[151190.717089] RAID1 conf printout:
[151190.717092] --- wd:1 rd:2
[151190.717097] disk 0, wo:0, o:1, dev:sda2
[151190.717102] disk 1, wo:1, o:1, dev:sdh2
[151190.717106] RAID1 conf printout:
[151190.717109] --- wd:1 rd:2
[151190.717114] disk 0, wo:0, o:1, dev:sda2
[151190.717118] disk 1, wo:1, o:1, dev:sdh2
[151190.717122] RAID1 conf printout:
[151190.717126] --- wd:1 rd:2
[151190.717130] disk 0, wo:0, o:1, dev:sda2
[151190.717135] disk 1, wo:1, o:1, dev:sdh2
[151190.717139] RAID1 conf printout:
[151190.717143] --- wd:1 rd:2
[151190.717147] disk 0, wo:0, o:1, dev:sda2
[151190.717152] disk 1, wo:1, o:1, dev:sdh2
[151190.717235] md: recovery of RAID array md8
[151190.720354] md: minimum _guaranteed_ speed: 5000 KB/sec/disk.
[151190.723657] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[151190.727111] md: using 128k window, over a total of 530128k.
[151192.716122] md: unbind<sdb2>
[151192.724024] md: export_rdev(sdb2)
[151194.704209] md: super_written gets error=-19, uptodate=0
[151194.705145] md/raid1:md13: Disk failure on sdb4, disabling device.
[151194.705145] md/raid1:md13: Operation continuing on 7 devices.
[151194.875102] md: unbind<sdb1>
[151194.886032] md: export_rdev(sdb1)
[151195.028738] RAID1 conf printout:
[151195.028749] --- wd:7 rd:8
[151195.028757] disk 0, wo:1, o:0, dev:sdb4
[151195.028763] disk 1, wo:0, o:1, dev:sdh4
[151195.028768] disk 2, wo:0, o:1, dev:sdg4
[151195.028773] disk 3, wo:0, o:1, dev:sdf4
[151195.028778] disk 4, wo:0, o:1, dev:sde4
[151195.028783] disk 5, wo:0, o:1, dev:sdd4
[151195.028788] disk 6, wo:0, o:1, dev:sdc4
[151195.028793] disk 7, wo:0, o:1, dev:sda4
[151195.033032] RAID1 conf printout:
[151195.033040] --- wd:7 rd:8
[151195.033048] disk 1, wo:0, o:1, dev:sdh4
[151195.033056] disk 2, wo:0, o:1, dev:sdg4
[151195.033063] disk 3, wo:0, o:1, dev:sdf4
[151195.033070] disk 4, wo:0, o:1, dev:sde4
[151195.033077] disk 5, wo:0, o:1, dev:sdd4
[151195.033085] disk 6, wo:0, o:1, dev:sdc4
[151195.033092] disk 7, wo:0, o:1, dev:sda4
[151197.059511] md: unbind<sdb4>
[151197.071027] md: export_rdev(sdb4)
[151201.442510] md: md8: recovery done.
[151201.504318] RAID1 conf printout:
[151201.504328] --- wd:2 rd:2
[151201.504335] disk 0, wo:0, o:1, dev:sda2
[151201.504342] disk 1, wo:0, o:1, dev:sdh2
[151201.548507] RAID1 conf printout:
[151201.548516] --- wd:2 rd:2
[151201.548522] disk 0, wo:0, o:1, dev:sda2
[151201.548527] disk 1, wo:0, o:1, dev:sdh2
[151201.548531] RAID1 conf printout:
[151201.548535] --- wd:2 rd:2
[151201.548540] disk 0, wo:0, o:1, dev:sda2
[151201.548544] disk 1, wo:0, o:1, dev:sdh2
[151201.548548] RAID1 conf printout:
[151201.548552] --- wd:2 rd:2
[151201.548556] disk 0, wo:0, o:1, dev:sda2
[151201.548561] disk 1, wo:0, o:1, dev:sdh2
[151201.548565] RAID1 conf printout:
[151201.548568] --- wd:2 rd:2
[151201.548573] disk 0, wo:0, o:1, dev:sda2
[151201.548578] disk 1, wo:0, o:1, dev:sdh2
[151201.548581] RAID1 conf printout:
[151201.548585] --- wd:2 rd:2
[151201.548589] disk 0, wo:0, o:1, dev:sda2
[151201.548594] disk 1, wo:0, o:1, dev:sdh2
[151247.660022] Retry 20 times(20 seconds),host_no = 4 is not empty to plug in.
[151314.590134] md/raid:md0: Disk failure on sdb3, disabling device.
[151314.590139] md/raid:md0: Operation continuing on 7 devices.
[151314.639505] RAID conf printout:
[151314.639513] --- level:6 rd:8 wd:7
[151314.639519] disk 0, o:1, dev:sda3
[151314.639524] disk 1, o:0, dev:sdb3
[151314.639529] disk 2, o:1, dev:sdc3
[151314.639534] disk 3, o:1, dev:sdd3
[151314.639539] disk 4, o:1, dev:sde3
[151314.639543] disk 5, o:1, dev:sdf3
[151314.639548] disk 6, o:1, dev:sdg3
[151314.639553] disk 7, o:1, dev:sdh3
[151314.645014] RAID conf printout:
[151314.645019] --- level:6 rd:8 wd:7
[151314.645024] disk 0, o:1, dev:sda3
[151314.645029] disk 2, o:1, dev:sdc3
[151314.645034] disk 3, o:1, dev:sdd3
[151314.645038] disk 4, o:1, dev:sde3
[151314.645043] disk 5, o:1, dev:sdf3
[151314.645047] disk 6, o:1, dev:sdg3
[151314.645052] disk 7, o:1, dev:sdh3
[151864.012994] EXT4-fs (dm-0): Mount option "noacl" will be removed by 3.5
[151864.012999] Contact linux-ext4@vger.kernel.org if you think we should keep it.
[151864.013003]
[151864.998652] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 129029 failed (3399!=12998)
[151865.002306] EXT4-fs (dm-0): group descriptors corrupted!
[153325.072301] md: md0 still in use.
Init.d scriptsλ︎
I started investigating the Qnap init scripts in /etc/init.d/
but they werent documented, so I would have to read a lot of shell script to understand what they did.
init_disk.sh
"Usage: init_disk {start|stop|restart|mount_flash_config|umount_flash_config}"
Could AI be useful as a software archaeology tool?
If I give an AI prompt the content of a shell script, would it be able to tell me in a meaningful way what the script is doing?
Restore default network sharesλ︎
The QNap logs and notification show a warning about the network shared not existing.
Type Date Time Users Source IP Computer name Content
Warning 2025/03/04 18:42:11 System 127.0.0.1 localhost Folders(Public, Web, Multimedia, Download) for default network shares do not exist. Restore default network shares or format your disk volume.
Trying to restore the network shares can be done via the QNap Admin web UI, in the Shared Directories section.
Using the "Restore Network Shares" button failed (asumption: the RAID is not mounted so cannot write to the RAID filespace)
Unmountedλ︎
The RAID is not mounted on /share/MD0_DATA
Mounting the RAID array fails.
Check disks for bad blocksλ︎
The QNap Admin Web UI was used to run a bad blocks check on every disk in the array. Storage Manager > Volumes shows the list of disks and each disk has a scan button next to it.
Each scan button was pressed and disk scans started. This is expected to take up to a day to complete.
A day later, all disks had finished bad block checks successfully. Next to test if this check has helped with mounting the RAID array.
Tech Eventsλ︎
I have registered as a volunteer for Devoxx UK, a commercial event that is very popular with the Java & JVM community.
I will get to see some of the sessions for free and it is a chance to catch up with friends and perhaps make new ones.
It will be good to know what topics the people attending are interested in, so to inspire some more articles for the Practicalli books and blog.
Dexoxx UK is held on the 7-9 May at the Business Design Centre, 52 Upper Street, London. N1 0QH. United Kingdom
Healthλ︎
On Monday I completed the longest walk of 2025 so far, at over 10,000 steps (9km). A good pace was kept during the walk and energy and motivation seemed to actually improve from half away along.
On Friday I did the longest walk for a very long time. My medical test results from blood tests a few weeks ago very concerning, but that is helping motivate me to do far more exercise in the coming weeks.
Survived the longest ride of 2025 so far at 61.36 km with 888 meters of elevation. The pace was a relatively slow 19.1 km/hour average compared to 23-24 km/hour rides from 2024 (and very slow compared to 2019 when I cycles across England, Wales and Scotland). Ascents were climbed without too much pain, although I was very slow and steady rhythm. On the flat and downhill segments the bike was very fast, thanks to a good clean and a new chain.
The increased exercise has led to fatigue naturally, but there is also a element of long covid which is manifesting as headaches and greatly increased coughing.
I found a recording of a long road trip to Lenham (and back) that I will start editing.
Thank you.