Skip to content

Commit 0ede61d

Browse files
committed
file: convert to SLAB_TYPESAFE_BY_RCU
In recent discussions around some performance improvements in the file handling area we discussed switching the file cache to rely on SLAB_TYPESAFE_BY_RCU which allows us to get rid of call_rcu() based freeing for files completely. This is a pretty sensitive change overall but it might actually be worth doing. The main downside is the subtlety. The other one is that we should really wait for Jann's patch to land that enables KASAN to handle SLAB_TYPESAFE_BY_RCU UAFs. Currently it doesn't but a patch for this exists. With SLAB_TYPESAFE_BY_RCU objects may be freed and reused multiple times which requires a few changes. So it isn't sufficient anymore to just acquire a reference to the file in question under rcu using atomic_long_inc_not_zero() since the file might have already been recycled and someone else might have bumped the reference. In other words, callers might see reference count bumps from newer users. For this reason it is necessary to verify that the pointer is the same before and after the reference count increment. This pattern can be seen in get_file_rcu() and __files_get_rcu(). In addition, it isn't possible to access or check fields in struct file without first aqcuiring a reference on it. Not doing that was always very dodgy and it was only usable for non-pointer data in struct file. With SLAB_TYPESAFE_BY_RCU it is necessary that callers first acquire a reference under rcu or they must hold the files_lock of the fdtable. Failing to do either one of this is a bug. Thanks to Jann for pointing out that we need to ensure memory ordering between reallocations and pointer check by ensuring that all subsequent loads have a dependency on the second load in get_file_rcu() and providing a fixup that was folded into this patch. Cc: Jann Horn <jannh@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
1 parent 93faf42 commit 0ede61d

13 files changed

Lines changed: 191 additions & 103 deletions

File tree

Documentation/filesystems/files.rst

Lines changed: 24 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -62,51 +62,30 @@ the fdtable structure -
6262
be held.
6363

6464
4. To look up the file structure given an fd, a reader
65-
must use either lookup_fd_rcu() or files_lookup_fd_rcu() APIs. These
65+
must use either lookup_fdget_rcu() or files_lookup_fdget_rcu() APIs. These
6666
take care of barrier requirements due to lock-free lookup.
6767

6868
An example::
6969

7070
struct file *file;
7171
7272
rcu_read_lock();
73-
file = lookup_fd_rcu(fd);
74-
if (file) {
75-
...
76-
}
77-
....
73+
file = lookup_fdget_rcu(fd);
7874
rcu_read_unlock();
79-
80-
5. Handling of the file structures is special. Since the look-up
81-
of the fd (fget()/fget_light()) are lock-free, it is possible
82-
that look-up may race with the last put() operation on the
83-
file structure. This is avoided using atomic_long_inc_not_zero()
84-
on ->f_count::
85-
86-
rcu_read_lock();
87-
file = files_lookup_fd_rcu(files, fd);
8875
if (file) {
89-
if (atomic_long_inc_not_zero(&file->f_count))
90-
*fput_needed = 1;
91-
else
92-
/* Didn't get the reference, someone's freed */
93-
file = NULL;
76+
...
77+
fput(file);
9478
}
95-
rcu_read_unlock();
9679
....
97-
return file;
98-
99-
atomic_long_inc_not_zero() detects if refcounts is already zero or
100-
goes to zero during increment. If it does, we fail
101-
fget()/fget_light().
10280

103-
6. Since both fdtable and file structures can be looked up
81+
5. Since both fdtable and file structures can be looked up
10482
lock-free, they must be installed using rcu_assign_pointer()
10583
API. If they are looked up lock-free, rcu_dereference()
10684
must be used. However it is advisable to use files_fdtable()
107-
and lookup_fd_rcu()/files_lookup_fd_rcu() which take care of these issues.
85+
and lookup_fdget_rcu()/files_lookup_fdget_rcu() which take care of these
86+
issues.
10887

109-
7. While updating, the fdtable pointer must be looked up while
88+
6. While updating, the fdtable pointer must be looked up while
11089
holding files->file_lock. If ->file_lock is dropped, then
11190
another thread expand the files thereby creating a new
11291
fdtable and making the earlier fdtable pointer stale.
@@ -126,3 +105,19 @@ the fdtable structure -
126105
Since locate_fd() can drop ->file_lock (and reacquire ->file_lock),
127106
the fdtable pointer (fdt) must be loaded after locate_fd().
128107

108+
On newer kernels rcu based file lookup has been switched to rely on
109+
SLAB_TYPESAFE_BY_RCU instead of call_rcu(). It isn't sufficient anymore
110+
to just acquire a reference to the file in question under rcu using
111+
atomic_long_inc_not_zero() since the file might have already been
112+
recycled and someone else might have bumped the reference. In other
113+
words, callers might see reference count bumps from newer users. For
114+
this is reason it is necessary to verify that the pointer is the same
115+
before and after the reference count increment. This pattern can be seen
116+
in get_file_rcu() and __files_get_rcu().
117+
118+
In addition, it isn't possible to access or check fields in struct file
119+
without first aqcuiring a reference on it under rcu lookup. Not doing
120+
that was always very dodgy and it was only usable for non-pointer data
121+
in struct file. With SLAB_TYPESAFE_BY_RCU it is necessary that callers
122+
either first acquire a reference or they must hold the files_lock of the
123+
fdtable.

arch/powerpc/platforms/cell/spufs/coredump.c

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -66,18 +66,21 @@ static int match_context(const void *v, struct file *file, unsigned fd)
6666
*/
6767
static struct spu_context *coredump_next_context(int *fd)
6868
{
69-
struct spu_context *ctx;
69+
struct spu_context *ctx = NULL;
7070
struct file *file;
7171
int n = iterate_fd(current->files, *fd, match_context, NULL);
7272
if (!n)
7373
return NULL;
7474
*fd = n - 1;
7575

7676
rcu_read_lock();
77-
file = lookup_fd_rcu(*fd);
78-
ctx = SPUFS_I(file_inode(file))->i_ctx;
79-
get_spu_context(ctx);
77+
file = lookup_fdget_rcu(*fd);
8078
rcu_read_unlock();
79+
if (file) {
80+
ctx = SPUFS_I(file_inode(file))->i_ctx;
81+
get_spu_context(ctx);
82+
fput(file);
83+
}
8184

8285
return ctx;
8386
}

drivers/gpu/drm/i915/gem/i915_gem_mman.c

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -916,9 +916,7 @@ static struct file *mmap_singleton(struct drm_i915_private *i915)
916916
struct file *file;
917917

918918
rcu_read_lock();
919-
file = READ_ONCE(i915->gem.mmap_singleton);
920-
if (file && !get_file_rcu(file))
921-
file = NULL;
919+
file = get_file_rcu(&i915->gem.mmap_singleton);
922920
rcu_read_unlock();
923921
if (file)
924922
return file;

fs/file.c

Lines changed: 107 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -853,8 +853,79 @@ void do_close_on_exec(struct files_struct *files)
853853
spin_unlock(&files->file_lock);
854854
}
855855

856+
static struct file *__get_file_rcu(struct file __rcu **f)
857+
{
858+
struct file __rcu *file;
859+
struct file __rcu *file_reloaded;
860+
struct file __rcu *file_reloaded_cmp;
861+
862+
file = rcu_dereference_raw(*f);
863+
if (!file)
864+
return NULL;
865+
866+
if (unlikely(!atomic_long_inc_not_zero(&file->f_count)))
867+
return ERR_PTR(-EAGAIN);
868+
869+
file_reloaded = rcu_dereference_raw(*f);
870+
871+
/*
872+
* Ensure that all accesses have a dependency on the load from
873+
* rcu_dereference_raw() above so we get correct ordering
874+
* between reuse/allocation and the pointer check below.
875+
*/
876+
file_reloaded_cmp = file_reloaded;
877+
OPTIMIZER_HIDE_VAR(file_reloaded_cmp);
878+
879+
/*
880+
* atomic_long_inc_not_zero() above provided a full memory
881+
* barrier when we acquired a reference.
882+
*
883+
* This is paired with the write barrier from assigning to the
884+
* __rcu protected file pointer so that if that pointer still
885+
* matches the current file, we know we have successfully
886+
* acquired a reference to the right file.
887+
*
888+
* If the pointers don't match the file has been reallocated by
889+
* SLAB_TYPESAFE_BY_RCU.
890+
*/
891+
if (file == file_reloaded_cmp)
892+
return file_reloaded;
893+
894+
fput(file);
895+
return ERR_PTR(-EAGAIN);
896+
}
897+
898+
/**
899+
* get_file_rcu - try go get a reference to a file under rcu
900+
* @f: the file to get a reference on
901+
*
902+
* This function tries to get a reference on @f carefully verifying that
903+
* @f hasn't been reused.
904+
*
905+
* This function should rarely have to be used and only by users who
906+
* understand the implications of SLAB_TYPESAFE_BY_RCU. Try to avoid it.
907+
*
908+
* Return: Returns @f with the reference count increased or NULL.
909+
*/
910+
struct file *get_file_rcu(struct file __rcu **f)
911+
{
912+
for (;;) {
913+
struct file __rcu *file;
914+
915+
file = __get_file_rcu(f);
916+
if (unlikely(!file))
917+
return NULL;
918+
919+
if (unlikely(IS_ERR(file)))
920+
continue;
921+
922+
return file;
923+
}
924+
}
925+
EXPORT_SYMBOL_GPL(get_file_rcu);
926+
856927
static inline struct file *__fget_files_rcu(struct files_struct *files,
857-
unsigned int fd, fmode_t mask)
928+
unsigned int fd, fmode_t mask)
858929
{
859930
for (;;) {
860931
struct file *file;
@@ -865,12 +936,6 @@ static inline struct file *__fget_files_rcu(struct files_struct *files,
865936
return NULL;
866937

867938
fdentry = fdt->fd + array_index_nospec(fd, fdt->max_fds);
868-
file = rcu_dereference_raw(*fdentry);
869-
if (unlikely(!file))
870-
return NULL;
871-
872-
if (unlikely(file->f_mode & mask))
873-
return NULL;
874939

875940
/*
876941
* Ok, we have a file pointer. However, because we do
@@ -879,10 +944,15 @@ static inline struct file *__fget_files_rcu(struct files_struct *files,
879944
*
880945
* Such a race can take two forms:
881946
*
882-
* (a) the file ref already went down to zero,
883-
* and get_file_rcu() fails. Just try again:
947+
* (a) the file ref already went down to zero and the
948+
* file hasn't been reused yet or the file count
949+
* isn't zero but the file has already been reused.
884950
*/
885-
if (unlikely(!get_file_rcu(file)))
951+
file = __get_file_rcu(fdentry);
952+
if (unlikely(!file))
953+
return NULL;
954+
955+
if (unlikely(IS_ERR(file)))
886956
continue;
887957

888958
/*
@@ -893,12 +963,20 @@ static inline struct file *__fget_files_rcu(struct files_struct *files,
893963
*
894964
* If so, we need to put our ref and try again.
895965
*/
896-
if (unlikely(rcu_dereference_raw(files->fdt) != fdt) ||
897-
unlikely(rcu_dereference_raw(*fdentry) != file)) {
966+
if (unlikely(rcu_dereference_raw(files->fdt) != fdt)) {
898967
fput(file);
899968
continue;
900969
}
901970

971+
/*
972+
* This isn't the file we're looking for or we're not
973+
* allowed to get a reference to it.
974+
*/
975+
if (unlikely(file->f_mode & mask)) {
976+
fput(file);
977+
return NULL;
978+
}
979+
902980
/*
903981
* Ok, we have a ref to the file, and checked that it
904982
* still exists.
@@ -948,7 +1026,14 @@ struct file *fget_task(struct task_struct *task, unsigned int fd)
9481026
return file;
9491027
}
9501028

951-
struct file *task_lookup_fd_rcu(struct task_struct *task, unsigned int fd)
1029+
struct file *lookup_fdget_rcu(unsigned int fd)
1030+
{
1031+
return __fget_files_rcu(current->files, fd, 0);
1032+
1033+
}
1034+
EXPORT_SYMBOL_GPL(lookup_fdget_rcu);
1035+
1036+
struct file *task_lookup_fdget_rcu(struct task_struct *task, unsigned int fd)
9521037
{
9531038
/* Must be called with rcu_read_lock held */
9541039
struct files_struct *files;
@@ -957,13 +1042,13 @@ struct file *task_lookup_fd_rcu(struct task_struct *task, unsigned int fd)
9571042
task_lock(task);
9581043
files = task->files;
9591044
if (files)
960-
file = files_lookup_fd_rcu(files, fd);
1045+
file = __fget_files_rcu(files, fd, 0);
9611046
task_unlock(task);
9621047

9631048
return file;
9641049
}
9651050

966-
struct file *task_lookup_next_fd_rcu(struct task_struct *task, unsigned int *ret_fd)
1051+
struct file *task_lookup_next_fdget_rcu(struct task_struct *task, unsigned int *ret_fd)
9671052
{
9681053
/* Must be called with rcu_read_lock held */
9691054
struct files_struct *files;
@@ -974,7 +1059,7 @@ struct file *task_lookup_next_fd_rcu(struct task_struct *task, unsigned int *ret
9741059
files = task->files;
9751060
if (files) {
9761061
for (; fd < files_fdtable(files)->max_fds; fd++) {
977-
file = files_lookup_fd_rcu(files, fd);
1062+
file = __fget_files_rcu(files, fd, 0);
9781063
if (file)
9791064
break;
9801065
}
@@ -983,7 +1068,7 @@ struct file *task_lookup_next_fd_rcu(struct task_struct *task, unsigned int *ret
9831068
*ret_fd = fd;
9841069
return file;
9851070
}
986-
EXPORT_SYMBOL(task_lookup_next_fd_rcu);
1071+
EXPORT_SYMBOL(task_lookup_next_fdget_rcu);
9871072

9881073
/*
9891074
* Lightweight file lookup - no refcnt increment if fd table isn't shared.
@@ -1272,12 +1357,16 @@ SYSCALL_DEFINE2(dup2, unsigned int, oldfd, unsigned int, newfd)
12721357
{
12731358
if (unlikely(newfd == oldfd)) { /* corner case */
12741359
struct files_struct *files = current->files;
1360+
struct file *f;
12751361
int retval = oldfd;
12761362

12771363
rcu_read_lock();
1278-
if (!files_lookup_fd_rcu(files, oldfd))
1364+
f = __fget_files_rcu(files, oldfd, 0);
1365+
if (!f)
12791366
retval = -EBADF;
12801367
rcu_read_unlock();
1368+
if (f)
1369+
fput(f);
12811370
return retval;
12821371
}
12831372
return ksys_dup3(oldfd, newfd, 0);

0 commit comments

Comments
 (0)