@@ -949,3 +949,99 @@ mmap_lock held. All in-tree users have been audited and do not seem to
949949depend on the mmap_lock being held, but out of tree users should verify
950950for themselves. If they do need it, they can return VM_FAULT_RETRY to
951951be called with the mmap_lock held.
952+
953+ ---
954+
955+ **mandatory **
956+
957+ The order of opening block devices and matching or creating superblocks has
958+ changed.
959+
960+ The old logic opened block devices first and then tried to find a
961+ suitable superblock to reuse based on the block device pointer.
962+
963+ The new logic tries to find a suitable superblock first based on the device
964+ number, and opening the block device afterwards.
965+
966+ Since opening block devices cannot happen under s_umount because of lock
967+ ordering requirements s_umount is now dropped while opening block devices and
968+ reacquired before calling fill_super().
969+
970+ In the old logic concurrent mounters would find the superblock on the list of
971+ superblocks for the filesystem type. Since the first opener of the block device
972+ would hold s_umount they would wait until the superblock became either born or
973+ was discarded due to initialization failure.
974+
975+ Since the new logic drops s_umount concurrent mounters could grab s_umount and
976+ would spin. Instead they are now made to wait using an explicit wait-wake
977+ mechanism without having to hold s_umount.
978+
979+ ---
980+
981+ **mandatory **
982+
983+ The holder of a block device is now the superblock.
984+
985+ The holder of a block device used to be the file_system_type which wasn't
986+ particularly useful. It wasn't possible to go from block device to owning
987+ superblock without matching on the device pointer stored in the superblock.
988+ This mechanism would only work for a single device so the block layer couldn't
989+ find the owning superblock of any additional devices.
990+
991+ In the old mechanism reusing or creating a superblock for a racing mount(2) and
992+ umount(2) relied on the file_system_type as the holder. This was severly
993+ underdocumented however:
994+
995+ (1) Any concurrent mounter that managed to grab an active reference on an
996+ existing superblock was made to wait until the superblock either became
997+ ready or until the superblock was removed from the list of superblocks of
998+ the filesystem type. If the superblock is ready the caller would simple
999+ reuse it.
1000+
1001+ (2) If the mounter came after deactivate_locked_super() but before
1002+ the superblock had been removed from the list of superblocks of the
1003+ filesystem type the mounter would wait until the superblock was shutdown,
1004+ reuse the block device and allocate a new superblock.
1005+
1006+ (3) If the mounter came after deactivate_locked_super() and after
1007+ the superblock had been removed from the list of superblocks of the
1008+ filesystem type the mounter would reuse the block device and allocate a new
1009+ superblock (the bd_holder point may still be set to the filesystem type).
1010+
1011+ Because the holder of the block device was the file_system_type any concurrent
1012+ mounter could open the block devices of any superblock of the same
1013+ file_system_type without risking seeing EBUSY because the block device was
1014+ still in use by another superblock.
1015+
1016+ Making the superblock the owner of the block device changes this as the holder
1017+ is now a unique superblock and thus block devices associated with it cannot be
1018+ reused by concurrent mounters. So a concurrent mounter in (2) could suddenly
1019+ see EBUSY when trying to open a block device whose holder was a different
1020+ superblock.
1021+
1022+ The new logic thus waits until the superblock and the devices are shutdown in
1023+ ->kill_sb(). Removal of the superblock from the list of superblocks of the
1024+ filesystem type is now moved to a later point when the devices are closed:
1025+
1026+ (1) Any concurrent mounter managing to grab an active reference on an existing
1027+ superblock is made to wait until the superblock is either ready or until
1028+ the superblock and all devices are shutdown in ->kill_sb(). If the
1029+ superblock is ready the caller will simply reuse it.
1030+
1031+ (2) If the mounter comes after deactivate_locked_super() but before
1032+ the superblock has been removed from the list of superblocks of the
1033+ filesystem type the mounter is made to wait until the superblock and the
1034+ devices are shut down in ->kill_sb() and the superblock is removed from the
1035+ list of superblocks of the filesystem type. The mounter will allocate a new
1036+ superblock and grab ownership of the block device (the bd_holder pointer of
1037+ the block device will be set to the newly allocated superblock).
1038+
1039+ (3) This case is now collapsed into (2) as the superblock is left on the list
1040+ of superblocks of the filesystem type until all devices are shutdown in
1041+ ->kill_sb(). In other words, if the superblock isn't on the list of
1042+ superblock of the filesystem type anymore then it has given up ownership of
1043+ all associated block devices (the bd_holder pointer is NULL).
1044+
1045+ As this is a VFS level change it has no practical consequences for filesystems
1046+ other than that all of them must use one of the provided kill_litter_super(),
1047+ kill_anon_super(), or kill_block_super() helpers.
0 commit comments