Skip to content

Commit 3e48a11

Browse files
committed
Merge tag 'f2fs-for-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim: "In this development cycle, we focused on several key performance optimizations: - introducing large folio support to enhance read speeds for immutable files - reducing checkpoint=enable latency by flushing only committed dirty pages - implementing tracepoints to diagnose and resolve lock priority inversion. Additionally, we introduced the packed_ssa feature to optimize the SSA footprint when utilizing large block sizes. Detail summary: Enhancements: - support large folio for immutable non-compressed case - support non-4KB block size without packed_ssa feature - optimize f2fs_enable_checkpoint() to avoid long delay - optimize f2fs_overwrite_io() for f2fs_iomap_begin - optimize NAT block loading during checkpoint write - add write latency stats for NAT and SIT blocks in f2fs_write_checkpoint - pin files do not require sbi->writepages lock for ordering - avoid f2fs_map_blocks() for consecutive holes in readpages - flush plug periodically during GC to maximize readahead effect - add tracepoints to catch lock overheads - add several sysfs entries to tune internal lock priorities Fixes: - fix lock priority inversion issue - fix incomplete block usage in compact SSA summaries - fix to show simulate_lock_timeout correctly - fix to avoid mapping wrong physical block for swapfile - fix IS_CHECKPOINTED flag inconsistency issue caused by concurrent atomic commit and checkpoint writes - fix to avoid UAF in f2fs_write_end_io()" * tag 'f2fs-for-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (61 commits) f2fs: sysfs: introduce critical_task_priority f2fs: introduce trace_f2fs_priority_update f2fs: fix lock priority inversion issue f2fs: optimize f2fs_overwrite_io() for f2fs_iomap_begin f2fs: fix incomplete block usage in compact SSA summaries f2fs: decrease maximum flush retry count in f2fs_enable_checkpoint() f2fs: optimize NAT block loading during checkpoint write f2fs: change size parameter of __has_cursum_space() to unsigned int f2fs: add write latency stats for NAT and SIT blocks in f2fs_write_checkpoint f2fs: pin files do not require sbi->writepages lock for ordering f2fs: fix to show simulate_lock_timeout correctly f2fs: introduce FAULT_SKIP_WRITE f2fs: check skipped write in f2fs_enable_checkpoint() Revert "f2fs: add timeout in f2fs_enable_checkpoint()" f2fs: fix to unlock folio in f2fs_read_data_large_folio() f2fs: fix error path handling in f2fs_read_data_large_folio() f2fs: use folio_end_read f2fs: fix to avoid mapping wrong physical block for swapfile f2fs: avoid f2fs_map_blocks() for consecutive holes in readpages f2fs: advance index and offset after zeroing in large folio read ...
2 parents 770aaed + 5219093 commit 3e48a11

22 files changed

Lines changed: 1672 additions & 547 deletions

File tree

Documentation/ABI/testing/sysfs-fs-f2fs

Lines changed: 59 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -520,7 +520,7 @@ What: /sys/fs/f2fs/<disk>/ckpt_thread_ioprio
520520
Date: January 2021
521521
Contact: "Daeho Jeong" <daehojeong@google.com>
522522
Description: Give a way to change checkpoint merge daemon's io priority.
523-
Its default value is "be,3", which means "BE" I/O class and
523+
Its default value is "rt,3", which means "RT" I/O class and
524524
I/O priority "3". We can select the class between "rt" and "be",
525525
and set the I/O priority within valid range of it. "," delimiter
526526
is necessary in between I/O class and priority number.
@@ -732,7 +732,7 @@ Description: Support configuring fault injection type, should be
732732
FAULT_TRUNCATE 0x00000400
733733
FAULT_READ_IO 0x00000800
734734
FAULT_CHECKPOINT 0x00001000
735-
FAULT_DISCARD 0x00002000
735+
FAULT_DISCARD 0x00002000 (obsolete)
736736
FAULT_WRITE_IO 0x00004000
737737
FAULT_SLAB_ALLOC 0x00008000
738738
FAULT_DQUOT_INIT 0x00010000
@@ -741,8 +741,10 @@ Description: Support configuring fault injection type, should be
741741
FAULT_BLKADDR_CONSISTENCE 0x00080000
742742
FAULT_NO_SEGMENT 0x00100000
743743
FAULT_INCONSISTENT_FOOTER 0x00200000
744-
FAULT_TIMEOUT 0x00400000 (1000ms)
744+
FAULT_ATOMIC_TIMEOUT 0x00400000 (1000ms)
745745
FAULT_VMALLOC 0x00800000
746+
FAULT_LOCK_TIMEOUT 0x01000000 (1000ms)
747+
FAULT_SKIP_WRITE 0x02000000
746748
=========================== ==========
747749

748750
What: /sys/fs/f2fs/<disk>/discard_io_aware_gran
@@ -939,3 +941,57 @@ Description: Controls write priority in multi-devices setups. A value of 0 means
939941
allocate_section_policy = 1 Prioritize writing to section before allocate_section_hint
940942
allocate_section_policy = 2 Prioritize writing to section after allocate_section_hint
941943
=========================== ==========================================================
944+
945+
What: /sys/fs/f2fs/<disk>/max_lock_elapsed_time
946+
Date: December 2025
947+
Contact: "Chao Yu" <chao@kernel.org>
948+
Description: This is a threshold, once a thread enters critical region that lock covers, total
949+
elapsed time exceeds this threshold, f2fs will print tracepoint to dump information
950+
of related context. This sysfs entry can be used to control the value of threshold,
951+
by default, the value is 500 ms.
952+
953+
What: /sys/fs/f2fs/<disk>/inject_timeout_type
954+
Date: December 2025
955+
Contact: "Chao Yu" <chao@kernel.org>
956+
Description: This sysfs entry can be used to change type of injected timeout:
957+
========== ===============================
958+
Flag_Value Flag_Description
959+
========== ===============================
960+
0x00000000 No timeout (default)
961+
0x00000001 Simulate running time
962+
0x00000002 Simulate IO type sleep time
963+
0x00000003 Simulate Non-IO type sleep time
964+
0x00000004 Simulate runnable time
965+
========== ===============================
966+
967+
What: /sys/fs/f2fs/<disk>/adjust_lock_priority
968+
Date: January 2026
969+
Contact: "Chao Yu" <chao@kernel.org>
970+
Description: This sysfs entry can be used to enable/disable to adjust priority for task
971+
which is in critical region covered by lock.
972+
========== ==================
973+
Flag_Value Flag_Description
974+
========== ==================
975+
0x00000000 Disabled (default)
976+
0x00000001 cp_rwsem
977+
0x00000002 node_change
978+
0x00000004 node_write
979+
0x00000008 gc_lock
980+
0x00000010 cp_global
981+
0x00000020 io_rwsem
982+
========== ==================
983+
984+
What: /sys/fs/f2fs/<disk>/lock_duration_priority
985+
Date: January 2026
986+
Contact: "Chao Yu" <chao@kernel.org>
987+
Description: f2fs can tune priority of thread which has entered into critical region covered by
988+
f2fs rwsemphore lock. This sysfs entry can be used to control priority value, the
989+
range is [100,139], by default the value is 120.
990+
991+
What: /sys/fs/f2fs/<disk>/critical_task_priority
992+
Date: February 2026
993+
Contact: "Chao Yu" <chao@kernel.org>
994+
Description: It can be used to tune priority of f2fs critical task, e.g. f2fs_ckpt, f2fs_gc
995+
threads, limitation as below:
996+
- it requires user has CAP_SYS_NICE capability.
997+
- the range is [100, 139], by default the value is 100.

Documentation/filesystems/f2fs.rst

Lines changed: 47 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -206,7 +206,7 @@ fault_type=%d Support configuring fault injection type, should be
206206
FAULT_TRUNCATE 0x00000400
207207
FAULT_READ_IO 0x00000800
208208
FAULT_CHECKPOINT 0x00001000
209-
FAULT_DISCARD 0x00002000
209+
FAULT_DISCARD 0x00002000 (obsolete)
210210
FAULT_WRITE_IO 0x00004000
211211
FAULT_SLAB_ALLOC 0x00008000
212212
FAULT_DQUOT_INIT 0x00010000
@@ -215,8 +215,10 @@ fault_type=%d Support configuring fault injection type, should be
215215
FAULT_BLKADDR_CONSISTENCE 0x00080000
216216
FAULT_NO_SEGMENT 0x00100000
217217
FAULT_INCONSISTENT_FOOTER 0x00200000
218-
FAULT_TIMEOUT 0x00400000 (1000ms)
218+
FAULT_ATOMIC_TIMEOUT 0x00400000 (1000ms)
219219
FAULT_VMALLOC 0x00800000
220+
FAULT_LOCK_TIMEOUT 0x01000000 (1000ms)
221+
FAULT_SKIP_WRITE 0x02000000
220222
=========================== ==========
221223
mode=%s Control block allocation mode which supports "adaptive"
222224
and "lfs". In "lfs" mode, there should be no random
@@ -1033,3 +1035,46 @@ the reserved space back to F2FS for its own use.
10331035
So, the key idea is, user can do any file operations on /dev/vdc, and
10341036
reclaim the space after the use, while the space is counted as /data.
10351037
That doesn't require modifying partition size and filesystem format.
1038+
1039+
Per-file Read-Only Large Folio Support
1040+
--------------------------------------
1041+
1042+
F2FS implements large folio support on the read path to leverage high-order
1043+
page allocation for significant performance gains. To minimize code complexity,
1044+
this support is currently excluded from the write path, which requires handling
1045+
complex optimizations such as compression and block allocation modes.
1046+
1047+
This optional feature is triggered only when a file's immutable bit is set.
1048+
Consequently, F2FS will return EOPNOTSUPP if a user attempts to open a cached
1049+
file with write permissions, even immediately after clearing the bit. Write
1050+
access is only restored once the cached inode is dropped. The usage flow is
1051+
demonstrated below:
1052+
1053+
.. code-block::
1054+
1055+
# f2fs_io setflags immutable /data/testfile_read_seq
1056+
1057+
/* flush and reload the inode to enable the large folio */
1058+
# sync && echo 3 > /proc/sys/vm/drop_caches
1059+
1060+
/* mmap(MAP_POPULATE) + mlock() */
1061+
# f2fs_io read 128 0 1024 mmap 1 0 /data/testfile_read_seq
1062+
1063+
/* mmap() + fadvise(POSIX_FADV_WILLNEED) + mlock() */
1064+
# f2fs_io read 128 0 1024 fadvise 1 0 /data/testfile_read_seq
1065+
1066+
/* mmap() + mlock2(MLOCK_ONFAULT) + madvise(MADV_POPULATE_READ) */
1067+
# f2fs_io read 128 0 1024 madvise 1 0 /data/testfile_read_seq
1068+
1069+
# f2fs_io clearflags immutable /data/testfile_read_seq
1070+
1071+
# f2fs_io write 1 0 1 zero buffered /data/testfile_read_seq
1072+
Failed to open /mnt/test/test: Operation not supported
1073+
1074+
/* flush and reload the inode to disable the large folio */
1075+
# sync && echo 3 > /proc/sys/vm/drop_caches
1076+
1077+
# f2fs_io write 1 0 1 zero buffered /data/testfile_read_seq
1078+
Written 4096 bytes with pattern = zero, total_time = 29 us, max_latency = 28 us
1079+
1080+
# rm /data/testfile_read_seq

0 commit comments

Comments
 (0)