Re: set_page_dirty vs truncate

MW
Matthew Wilcox
Sat, Dec 19, 2020 5:18 AM

On Fri, Dec 18, 2020 at 10:03:16PM +0000, Matthew Wilcox wrote:

On Fri, Dec 18, 2020 at 04:05:31PM +0000, Matthew Wilcox wrote:

A number of implementations of ->set_page_dirty check whether the page
has been truncated (ie page->mapping has become NULL since entering
set_page_dirty()).  Several other implementations assume that they can do
page->mapping->host to get to the inode.  So either some implementations
are doing unnecessary checks or others are vulnerable to a NULL pointer
dereference if truncate() races with set_page_dirty().

I'm touching ->set_page_dirty() anyway as part of the page folio
conversion.  I'm thinking about passing in the mapping so there's no
need to look at page->mapping.

The comments on set_page_dirty() and set_page_dirty_lock() suggests
there's no consistency in whether truncation is blocked or not; we're
only guaranteed that the inode itself won't go away.  But maybe the
comments are stale.

The comments are, I believe, not stale.  Here's some syzbot
reports which indicate that ext4 is seeing races between set_page_dirty()
and truncate():

https://groups.google.com/g/syzkaller-lts-bugs/c/s9fHu162zhQ/m/Phnf6ucaAwAJ

The reproducer includes calls to ftruncate(), so that would suggest
that's what's going on.

Hmmm ... looks like __set_page_dirty_nobuffers() has a similar problem:

{
lock_page_memcg(page);
if (!TestSetPageDirty(page)) {
struct address_space *mapping = page_mapping(page);
unsigned long flags;

            if (!mapping) {
                    unlock_page_memcg(page);
                    return 1;
            }

            xa_lock_irqsave(&mapping->i_pages, flags);
            BUG_ON(page_mapping(page) != mapping);

sure, we check that the page wasn't truncated between set_page_dirty()
and the call to TestSetPageDirty(), but we can truncate dirty pages
with no problem.  So between the call to TestSetPageDirty() and
the call to xa_lock_irqsave(), the page can be truncated, and the
BUG_ON should fire.

I haven't been able to find any examples of this, but maybe it's just a very
narrow race.  Does anyone recognise this signature?  Adding the filesystems
which use __set_page_dirty_nobuffers() directly without extra locking.

$ git grep set_page_dirty.=.__set_page_dirty_nobuffers
fs/9p/vfs_addr.c:      .set_page_dirty = __set_page_dirty_nobuffers,
fs/cifs/file.c: .set_page_dirty = __set_page_dirty_nobuffers,
fs/cifs/file.c: .set_page_dirty = __set_page_dirty_nobuffers,
fs/fuse/file.c: .set_page_dirty = __set_page_dirty_nobuffers,
fs/hostfs/hostfs_kern.c:        .set_page_dirty = __set_page_dirty_nobuffers,
fs/jfs/jfs_metapage.c:  .set_page_dirty = __set_page_dirty_nobuffers,
fs/nfs/file.c:  .set_page_dirty = __set_page_dirty_nobuffers,
fs/ntfs/aops.c: .set_page_dirty = __set_page_dirty_nobuffers,  /* Set the page dirty
fs/orangefs/inode.c:    .set_page_dirty = __set_page_dirty_nobuffers,
fs/vboxsf/file.c:      .set_page_dirty = __set_page_dirty_nobuffers,

On Fri, Dec 18, 2020 at 10:03:16PM +0000, Matthew Wilcox wrote: > On Fri, Dec 18, 2020 at 04:05:31PM +0000, Matthew Wilcox wrote: > > A number of implementations of ->set_page_dirty check whether the page > > has been truncated (ie page->mapping has become NULL since entering > > set_page_dirty()). Several other implementations assume that they can do > > page->mapping->host to get to the inode. So either some implementations > > are doing unnecessary checks or others are vulnerable to a NULL pointer > > dereference if truncate() races with set_page_dirty(). > > > > I'm touching ->set_page_dirty() anyway as part of the page folio > > conversion. I'm thinking about passing in the mapping so there's no > > need to look at page->mapping. > > > > The comments on set_page_dirty() and set_page_dirty_lock() suggests > > there's no consistency in whether truncation is blocked or not; we're > > only guaranteed that the inode itself won't go away. But maybe the > > comments are stale. > > The comments are, I believe, not stale. Here's some syzbot > reports which indicate that ext4 is seeing races between set_page_dirty() > and truncate(): > > https://groups.google.com/g/syzkaller-lts-bugs/c/s9fHu162zhQ/m/Phnf6ucaAwAJ > > The reproducer includes calls to ftruncate(), so that would suggest > that's what's going on. Hmmm ... looks like __set_page_dirty_nobuffers() has a similar problem: { lock_page_memcg(page); if (!TestSetPageDirty(page)) { struct address_space *mapping = page_mapping(page); unsigned long flags; if (!mapping) { unlock_page_memcg(page); return 1; } xa_lock_irqsave(&mapping->i_pages, flags); BUG_ON(page_mapping(page) != mapping); sure, we check that the page wasn't truncated between set_page_dirty() and the call to TestSetPageDirty(), but we can truncate dirty pages with no problem. So between the call to TestSetPageDirty() and the call to xa_lock_irqsave(), the page can be truncated, and the BUG_ON should fire. I haven't been able to find any examples of this, but maybe it's just a very narrow race. Does anyone recognise this signature? Adding the filesystems which use __set_page_dirty_nobuffers() directly without extra locking. $ git grep set_page_dirty.*=.*__set_page_dirty_nobuffers fs/9p/vfs_addr.c: .set_page_dirty = __set_page_dirty_nobuffers, fs/cifs/file.c: .set_page_dirty = __set_page_dirty_nobuffers, fs/cifs/file.c: .set_page_dirty = __set_page_dirty_nobuffers, fs/fuse/file.c: .set_page_dirty = __set_page_dirty_nobuffers, fs/hostfs/hostfs_kern.c: .set_page_dirty = __set_page_dirty_nobuffers, fs/jfs/jfs_metapage.c: .set_page_dirty = __set_page_dirty_nobuffers, fs/nfs/file.c: .set_page_dirty = __set_page_dirty_nobuffers, fs/ntfs/aops.c: .set_page_dirty = __set_page_dirty_nobuffers, /* Set the page dirty fs/orangefs/inode.c: .set_page_dirty = __set_page_dirty_nobuffers, fs/vboxsf/file.c: .set_page_dirty = __set_page_dirty_nobuffers,
JH
John Hubbard
Sat, Dec 19, 2020 6:10 AM

On 12/18/20 9:18 PM, Matthew Wilcox wrote:

On Fri, Dec 18, 2020 at 10:03:16PM +0000, Matthew Wilcox wrote:

On Fri, Dec 18, 2020 at 04:05:31PM +0000, Matthew Wilcox wrote:

A number of implementations of ->set_page_dirty check whether the page
has been truncated (ie page->mapping has become NULL since entering
set_page_dirty()).  Several other implementations assume that they can do
page->mapping->host to get to the inode.  So either some implementations
are doing unnecessary checks or others are vulnerable to a NULL pointer
dereference if truncate() races with set_page_dirty().

I'm touching ->set_page_dirty() anyway as part of the page folio
conversion.  I'm thinking about passing in the mapping so there's no
need to look at page->mapping.

The comments on set_page_dirty() and set_page_dirty_lock() suggests
there's no consistency in whether truncation is blocked or not; we're
only guaranteed that the inode itself won't go away.  But maybe the
comments are stale.

The comments are, I believe, not stale.  Here's some syzbot
reports which indicate that ext4 is seeing races between set_page_dirty()
and truncate():

https://groups.google.com/g/syzkaller-lts-bugs/c/s9fHu162zhQ/m/Phnf6ucaAwAJ

The reproducer includes calls to ftruncate(), so that would suggest
that's what's going on.

Hmmm ... looks like __set_page_dirty_nobuffers() has a similar problem:

{
lock_page_memcg(page);
if (!TestSetPageDirty(page)) {
struct address_space *mapping = page_mapping(page);
unsigned long flags;

              if (!mapping) {
                      unlock_page_memcg(page);
                      return 1;
              }

              xa_lock_irqsave(&mapping->i_pages, flags);
              BUG_ON(page_mapping(page) != mapping);

sure, we check that the page wasn't truncated between set_page_dirty()
and the call to TestSetPageDirty(), but we can truncate dirty pages
with no problem.  So between the call to TestSetPageDirty() and
the call to xa_lock_irqsave(), the page can be truncated, and the
BUG_ON should fire.

I haven't been able to find any examples of this, but maybe it's just a very
narrow race.  Does anyone recognise this signature?  Adding the filesystems
which use __set_page_dirty_nobuffers() directly without extra locking.

That sounds like the same kind of failure that Jan Kara and I were
seeing on live systems[1], that led eventually to the gup-to-pup
conversion exercise.

That crash happened due to calling set_page_dirty() on pages that had no
buffers on them [2]. And that sounds like exactly the same thing as
calling __set_page_dirty_nobuffers() without extra locking. So I'd
expect that it's Just Wrong To Do, for the same reasons as Jan spells
out very clearly in [1].

Hope that helps.

[1] https://www.spinics.net/lists/linux-mm/msg142700.html

[2] which triggered this assertion:

#define page_buffers(page)
({
BUG_ON(!PagePrivate(page));
((struct buffer_head *)page_private(page));
})

$ git grep set_page_dirty.=.__set_page_dirty_nobuffers
fs/9p/vfs_addr.c:      .set_page_dirty = __set_page_dirty_nobuffers,
fs/cifs/file.c: .set_page_dirty = __set_page_dirty_nobuffers,
fs/cifs/file.c: .set_page_dirty = __set_page_dirty_nobuffers,
fs/fuse/file.c: .set_page_dirty = __set_page_dirty_nobuffers,
fs/hostfs/hostfs_kern.c:        .set_page_dirty = __set_page_dirty_nobuffers,
fs/jfs/jfs_metapage.c:  .set_page_dirty = __set_page_dirty_nobuffers,
fs/nfs/file.c:  .set_page_dirty = __set_page_dirty_nobuffers,
fs/ntfs/aops.c: .set_page_dirty = __set_page_dirty_nobuffers,  /* Set the page dirty
fs/orangefs/inode.c:    .set_page_dirty = __set_page_dirty_nobuffers,
fs/vboxsf/file.c:      .set_page_dirty = __set_page_dirty_nobuffers,

...wow, long list of these.

thanks,

John Hubbard
NVIDIA

On 12/18/20 9:18 PM, Matthew Wilcox wrote: > On Fri, Dec 18, 2020 at 10:03:16PM +0000, Matthew Wilcox wrote: >> On Fri, Dec 18, 2020 at 04:05:31PM +0000, Matthew Wilcox wrote: >>> A number of implementations of ->set_page_dirty check whether the page >>> has been truncated (ie page->mapping has become NULL since entering >>> set_page_dirty()). Several other implementations assume that they can do >>> page->mapping->host to get to the inode. So either some implementations >>> are doing unnecessary checks or others are vulnerable to a NULL pointer >>> dereference if truncate() races with set_page_dirty(). >>> >>> I'm touching ->set_page_dirty() anyway as part of the page folio >>> conversion. I'm thinking about passing in the mapping so there's no >>> need to look at page->mapping. >>> >>> The comments on set_page_dirty() and set_page_dirty_lock() suggests >>> there's no consistency in whether truncation is blocked or not; we're >>> only guaranteed that the inode itself won't go away. But maybe the >>> comments are stale. >> >> The comments are, I believe, not stale. Here's some syzbot >> reports which indicate that ext4 is seeing races between set_page_dirty() >> and truncate(): >> >> https://groups.google.com/g/syzkaller-lts-bugs/c/s9fHu162zhQ/m/Phnf6ucaAwAJ >> >> The reproducer includes calls to ftruncate(), so that would suggest >> that's what's going on. > > Hmmm ... looks like __set_page_dirty_nobuffers() has a similar problem: > > { > lock_page_memcg(page); > if (!TestSetPageDirty(page)) { > struct address_space *mapping = page_mapping(page); > unsigned long flags; > > if (!mapping) { > unlock_page_memcg(page); > return 1; > } > > xa_lock_irqsave(&mapping->i_pages, flags); > BUG_ON(page_mapping(page) != mapping); > > sure, we check that the page wasn't truncated between set_page_dirty() > and the call to TestSetPageDirty(), but we can truncate dirty pages > with no problem. So between the call to TestSetPageDirty() and > the call to xa_lock_irqsave(), the page can be truncated, and the > BUG_ON should fire. > > I haven't been able to find any examples of this, but maybe it's just a very > narrow race. Does anyone recognise this signature? Adding the filesystems > which use __set_page_dirty_nobuffers() directly without extra locking. That sounds like the same *kind* of failure that Jan Kara and I were seeing on live systems[1], that led eventually to the gup-to-pup conversion exercise. That crash happened due to calling set_page_dirty() on pages that had no buffers on them [2]. And that sounds like *exactly* the same thing as calling __set_page_dirty_nobuffers() without extra locking. So I'd expect that it's Just Wrong To Do, for the same reasons as Jan spells out very clearly in [1]. Hope that helps. [1] https://www.spinics.net/lists/linux-mm/msg142700.html [2] which triggered this assertion: #define page_buffers(page) \ ({ \ BUG_ON(!PagePrivate(page)); \ ((struct buffer_head *)page_private(page)); \ }) > > $ git grep set_page_dirty.*=.*__set_page_dirty_nobuffers > fs/9p/vfs_addr.c: .set_page_dirty = __set_page_dirty_nobuffers, > fs/cifs/file.c: .set_page_dirty = __set_page_dirty_nobuffers, > fs/cifs/file.c: .set_page_dirty = __set_page_dirty_nobuffers, > fs/fuse/file.c: .set_page_dirty = __set_page_dirty_nobuffers, > fs/hostfs/hostfs_kern.c: .set_page_dirty = __set_page_dirty_nobuffers, > fs/jfs/jfs_metapage.c: .set_page_dirty = __set_page_dirty_nobuffers, > fs/nfs/file.c: .set_page_dirty = __set_page_dirty_nobuffers, > fs/ntfs/aops.c: .set_page_dirty = __set_page_dirty_nobuffers, /* Set the page dirty > fs/orangefs/inode.c: .set_page_dirty = __set_page_dirty_nobuffers, > fs/vboxsf/file.c: .set_page_dirty = __set_page_dirty_nobuffers, > ...wow, long list of these. thanks, -- John Hubbard NVIDIA
MW
Matthew Wilcox
Sat, Dec 19, 2020 6:50 AM

On Fri, Dec 18, 2020 at 10:10:01PM -0800, John Hubbard wrote:

On 12/18/20 9:18 PM, Matthew Wilcox wrote:

On Fri, Dec 18, 2020 at 10:03:16PM +0000, Matthew Wilcox wrote:

On Fri, Dec 18, 2020 at 04:05:31PM +0000, Matthew Wilcox wrote:

A number of implementations of ->set_page_dirty check whether the page
has been truncated (ie page->mapping has become NULL since entering
set_page_dirty()).  Several other implementations assume that they can do
page->mapping->host to get to the inode.  So either some implementations
are doing unnecessary checks or others are vulnerable to a NULL pointer
dereference if truncate() races with set_page_dirty().

I'm touching ->set_page_dirty() anyway as part of the page folio
conversion.  I'm thinking about passing in the mapping so there's no
need to look at page->mapping.

The comments on set_page_dirty() and set_page_dirty_lock() suggests
there's no consistency in whether truncation is blocked or not; we're
only guaranteed that the inode itself won't go away.  But maybe the
comments are stale.

The comments are, I believe, not stale.  Here's some syzbot
reports which indicate that ext4 is seeing races between set_page_dirty()
and truncate():

https://groups.google.com/g/syzkaller-lts-bugs/c/s9fHu162zhQ/m/Phnf6ucaAwAJ

The reproducer includes calls to ftruncate(), so that would suggest
that's what's going on.

Hmmm ... looks like __set_page_dirty_nobuffers() has a similar problem:

{
lock_page_memcg(page);
if (!TestSetPageDirty(page)) {
struct address_space *mapping = page_mapping(page);
unsigned long flags;

              if (!mapping) {
                      unlock_page_memcg(page);
                      return 1;
              }

              xa_lock_irqsave(&mapping->i_pages, flags);
              BUG_ON(page_mapping(page) != mapping);

sure, we check that the page wasn't truncated between set_page_dirty()
and the call to TestSetPageDirty(), but we can truncate dirty pages
with no problem.  So between the call to TestSetPageDirty() and
the call to xa_lock_irqsave(), the page can be truncated, and the
BUG_ON should fire.

I haven't been able to find any examples of this, but maybe it's just a very
narrow race.  Does anyone recognise this signature?  Adding the filesystems
which use __set_page_dirty_nobuffers() directly without extra locking.

That sounds like the same kind of failure that Jan Kara and I were
seeing on live systems[1], that led eventually to the gup-to-pup
conversion exercise.

That crash happened due to calling set_page_dirty() on pages that had no
buffers on them [2]. And that sounds like exactly the same thing as
calling __set_page_dirty_nobuffers() without extra locking. So I'd
expect that it's Just Wrong To Do, for the same reasons as Jan spells
out very clearly in [1].

Interesting.  It's a bit different, but Jan's race might be what's
causing this symptom.  The reason is that the backtrace contains
set_page_dirty_lock() which holds the page lock.  So there can't be
a truncation race because truncate holds the page lock when calling
->invalidatepage.

That said, the syzbot reproducer doesn't have any O_DIRECT in it
either.  So maybe this is some other race?

Hope that helps.

[1] https://www.spinics.net/lists/linux-mm/msg142700.html

[2] which triggered this assertion:

#define page_buffers(page)
({
BUG_ON(!PagePrivate(page));
((struct buffer_head *)page_private(page));
})

$ git grep set_page_dirty.=.__set_page_dirty_nobuffers
fs/9p/vfs_addr.c:      .set_page_dirty = __set_page_dirty_nobuffers,
fs/cifs/file.c: .set_page_dirty = __set_page_dirty_nobuffers,
fs/cifs/file.c: .set_page_dirty = __set_page_dirty_nobuffers,
fs/fuse/file.c: .set_page_dirty = __set_page_dirty_nobuffers,
fs/hostfs/hostfs_kern.c:        .set_page_dirty = __set_page_dirty_nobuffers,
fs/jfs/jfs_metapage.c:  .set_page_dirty = __set_page_dirty_nobuffers,
fs/nfs/file.c:  .set_page_dirty = __set_page_dirty_nobuffers,
fs/ntfs/aops.c: .set_page_dirty = __set_page_dirty_nobuffers,  /* Set the page dirty
fs/orangefs/inode.c:    .set_page_dirty = __set_page_dirty_nobuffers,
fs/vboxsf/file.c:      .set_page_dirty = __set_page_dirty_nobuffers,

...wow, long list of these.

thanks,

John Hubbard
NVIDIA

On Fri, Dec 18, 2020 at 10:10:01PM -0800, John Hubbard wrote: > On 12/18/20 9:18 PM, Matthew Wilcox wrote: > > On Fri, Dec 18, 2020 at 10:03:16PM +0000, Matthew Wilcox wrote: > > > On Fri, Dec 18, 2020 at 04:05:31PM +0000, Matthew Wilcox wrote: > > > > A number of implementations of ->set_page_dirty check whether the page > > > > has been truncated (ie page->mapping has become NULL since entering > > > > set_page_dirty()). Several other implementations assume that they can do > > > > page->mapping->host to get to the inode. So either some implementations > > > > are doing unnecessary checks or others are vulnerable to a NULL pointer > > > > dereference if truncate() races with set_page_dirty(). > > > > > > > > I'm touching ->set_page_dirty() anyway as part of the page folio > > > > conversion. I'm thinking about passing in the mapping so there's no > > > > need to look at page->mapping. > > > > > > > > The comments on set_page_dirty() and set_page_dirty_lock() suggests > > > > there's no consistency in whether truncation is blocked or not; we're > > > > only guaranteed that the inode itself won't go away. But maybe the > > > > comments are stale. > > > > > > The comments are, I believe, not stale. Here's some syzbot > > > reports which indicate that ext4 is seeing races between set_page_dirty() > > > and truncate(): > > > > > > https://groups.google.com/g/syzkaller-lts-bugs/c/s9fHu162zhQ/m/Phnf6ucaAwAJ > > > > > > The reproducer includes calls to ftruncate(), so that would suggest > > > that's what's going on. > > > > Hmmm ... looks like __set_page_dirty_nobuffers() has a similar problem: > > > > { > > lock_page_memcg(page); > > if (!TestSetPageDirty(page)) { > > struct address_space *mapping = page_mapping(page); > > unsigned long flags; > > > > if (!mapping) { > > unlock_page_memcg(page); > > return 1; > > } > > > > xa_lock_irqsave(&mapping->i_pages, flags); > > BUG_ON(page_mapping(page) != mapping); > > > > sure, we check that the page wasn't truncated between set_page_dirty() > > and the call to TestSetPageDirty(), but we can truncate dirty pages > > with no problem. So between the call to TestSetPageDirty() and > > the call to xa_lock_irqsave(), the page can be truncated, and the > > BUG_ON should fire. > > > > I haven't been able to find any examples of this, but maybe it's just a very > > narrow race. Does anyone recognise this signature? Adding the filesystems > > which use __set_page_dirty_nobuffers() directly without extra locking. > > > That sounds like the same *kind* of failure that Jan Kara and I were > seeing on live systems[1], that led eventually to the gup-to-pup > conversion exercise. > > That crash happened due to calling set_page_dirty() on pages that had no > buffers on them [2]. And that sounds like *exactly* the same thing as > calling __set_page_dirty_nobuffers() without extra locking. So I'd > expect that it's Just Wrong To Do, for the same reasons as Jan spells > out very clearly in [1]. Interesting. It's a bit different, *but* Jan's race might be what's causing this symptom. The reason is that the backtrace contains set_page_dirty_lock() which holds the page lock. So there can't be a truncation race because truncate holds the page lock when calling ->invalidatepage. That said, the syzbot reproducer doesn't have any O_DIRECT in it either. So maybe this is some other race? > Hope that helps. > > > [1] https://www.spinics.net/lists/linux-mm/msg142700.html > > [2] which triggered this assertion: > > #define page_buffers(page) \ > ({ \ > BUG_ON(!PagePrivate(page)); \ > ((struct buffer_head *)page_private(page)); \ > }) > > > > > > $ git grep set_page_dirty.*=.*__set_page_dirty_nobuffers > > fs/9p/vfs_addr.c: .set_page_dirty = __set_page_dirty_nobuffers, > > fs/cifs/file.c: .set_page_dirty = __set_page_dirty_nobuffers, > > fs/cifs/file.c: .set_page_dirty = __set_page_dirty_nobuffers, > > fs/fuse/file.c: .set_page_dirty = __set_page_dirty_nobuffers, > > fs/hostfs/hostfs_kern.c: .set_page_dirty = __set_page_dirty_nobuffers, > > fs/jfs/jfs_metapage.c: .set_page_dirty = __set_page_dirty_nobuffers, > > fs/nfs/file.c: .set_page_dirty = __set_page_dirty_nobuffers, > > fs/ntfs/aops.c: .set_page_dirty = __set_page_dirty_nobuffers, /* Set the page dirty > > fs/orangefs/inode.c: .set_page_dirty = __set_page_dirty_nobuffers, > > fs/vboxsf/file.c: .set_page_dirty = __set_page_dirty_nobuffers, > > > > ...wow, long list of these. > > thanks, > -- > John Hubbard > NVIDIA
JH
John Hubbard
Sat, Dec 19, 2020 7:04 AM

On 12/18/20 10:50 PM, Matthew Wilcox wrote:
...

Hmmm ... looks like __set_page_dirty_nobuffers() has a similar problem:

{
lock_page_memcg(page);
if (!TestSetPageDirty(page)) {
struct address_space *mapping = page_mapping(page);
unsigned long flags;

               if (!mapping) {
                       unlock_page_memcg(page);
                       return 1;
               }

               xa_lock_irqsave(&mapping->i_pages, flags);
               BUG_ON(page_mapping(page) != mapping);

sure, we check that the page wasn't truncated between set_page_dirty()
and the call to TestSetPageDirty(), but we can truncate dirty pages
with no problem.  So between the call to TestSetPageDirty() and
the call to xa_lock_irqsave(), the page can be truncated, and the
BUG_ON should fire.

I haven't been able to find any examples of this, but maybe it's just a very
narrow race.  Does anyone recognise this signature?  Adding the filesystems
which use __set_page_dirty_nobuffers() directly without extra locking.

That sounds like the same kind of failure that Jan Kara and I were
seeing on live systems[1], that led eventually to the gup-to-pup
conversion exercise.

That crash happened due to calling set_page_dirty() on pages that had no
buffers on them [2]. And that sounds like exactly the same thing as
calling __set_page_dirty_nobuffers() without extra locking. So I'd
expect that it's Just Wrong To Do, for the same reasons as Jan spells
out very clearly in [1].

Interesting.  It's a bit different, but Jan's race might be what's
causing this symptom.  The reason is that the backtrace contains
set_page_dirty_lock() which holds the page lock.  So there can't be
a truncation race because truncate holds the page lock when calling
->invalidatepage.

That said, the syzbot reproducer doesn't have any O_DIRECT in it
either.  So maybe this is some other race?

Jan's race can be also be reproduced without O_DIRECT. I first saw
it via a program that just did these steps on a normal ext4 filesystem:

a) pin ext4 file-backed pages, via get_user_pages(). Actually the way
it got here was due to using what looked like anonymous RAM to the
program, but was really file-backed RAM, because the admin had it
set up to mount ext4 on /tmp, instead of using tmpfs, to "save RAM",
but I digress. :)

b) wait a while, optionally do some DMA on the pages from a GPU, drink
coffee...

c) call set_pages_dirty()

d) unpin the pages

e) BUG_ON() in page_buffers().

thanks,

John Hubbard
NVIDIA

On 12/18/20 10:50 PM, Matthew Wilcox wrote: ... >>> Hmmm ... looks like __set_page_dirty_nobuffers() has a similar problem: >>> >>> { >>> lock_page_memcg(page); >>> if (!TestSetPageDirty(page)) { >>> struct address_space *mapping = page_mapping(page); >>> unsigned long flags; >>> >>> if (!mapping) { >>> unlock_page_memcg(page); >>> return 1; >>> } >>> >>> xa_lock_irqsave(&mapping->i_pages, flags); >>> BUG_ON(page_mapping(page) != mapping); >>> >>> sure, we check that the page wasn't truncated between set_page_dirty() >>> and the call to TestSetPageDirty(), but we can truncate dirty pages >>> with no problem. So between the call to TestSetPageDirty() and >>> the call to xa_lock_irqsave(), the page can be truncated, and the >>> BUG_ON should fire. >>> >>> I haven't been able to find any examples of this, but maybe it's just a very >>> narrow race. Does anyone recognise this signature? Adding the filesystems >>> which use __set_page_dirty_nobuffers() directly without extra locking. >> >> >> That sounds like the same *kind* of failure that Jan Kara and I were >> seeing on live systems[1], that led eventually to the gup-to-pup >> conversion exercise. >> >> That crash happened due to calling set_page_dirty() on pages that had no >> buffers on them [2]. And that sounds like *exactly* the same thing as >> calling __set_page_dirty_nobuffers() without extra locking. So I'd >> expect that it's Just Wrong To Do, for the same reasons as Jan spells >> out very clearly in [1]. > > Interesting. It's a bit different, *but* Jan's race might be what's > causing this symptom. The reason is that the backtrace contains > set_page_dirty_lock() which holds the page lock. So there can't be > a truncation race because truncate holds the page lock when calling > ->invalidatepage. > > That said, the syzbot reproducer doesn't have any O_DIRECT in it > either. So maybe this is some other race? Jan's race can be also be reproduced *without* O_DIRECT. I first saw it via a program that just did these steps on a normal ext4 filesystem: a) pin ext4 file-backed pages, via get_user_pages(). Actually the way it got here was due to using what *looked* like anonymous RAM to the program, but was really file-backed RAM, because the admin had it set up to mount ext4 on /tmp, instead of using tmpfs, to "save RAM", but I digress. :) b) wait a while, optionally do some DMA on the pages from a GPU, drink coffee... c) call set_pages_dirty() d) unpin the pages e) BUG_ON() in page_buffers(). thanks, -- John Hubbard NVIDIA