In the Linux kernel, the following vulnerability has been resolved:
btrfs: fix EEXIST abort due to non-consecutive gaps in chunk allocation
I have been observing a number of systems aborting at
insert_dev_extents() in btrfs_create_pending_block_groups(). The
following is a sample stack trace of such an abort coming from forced
chunk allocation (typically behind CONFIG_BTRFS_EXPERIMENTAL) but this
can theoretically happen to any DUP chunk allocation.
[81.801] ------------[ cut here ]------------
[81.801] BTRFS: Transaction aborted (error -17)
[81.801] WARNING: fs/btrfs/block-group.c:2876 at btrfs_create_pending_block_groups+0x721/0x770 [btrfs], CPU#1: bash/319
[81.802] Modules linked in: virtio_net btrfs xor zstd_compress raid6_pq null_blk
[81.803] CPU: 1 UID: 0 PID: 319 Comm: bash Kdump: loaded Not tainted 6.19.0-rc6+ #319 NONE
[81.803] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.17.0-2-2 04/01/2014
[81.804] RIP: 0010:btrfs_create_pending_block_groups+0x723/0x770 [btrfs]
[81.806] RSP: 0018:ffffa36241a6bce8 EFLAGS: 00010282
[81.806] RAX: 000000000000000d RBX: ffff8e699921e400 RCX: 0000000000000000
[81.807] RDX: 0000000002040001 RSI: 00000000ffffffef RDI: ffffffffc0608bf0
[81.807] RBP: 00000000ffffffef R08: ffff8e69830f6000 R09: 0000000000000007
[81.808] R10: ffff8e699921e5e8 R11: 0000000000000000 R12: ffff8e6999228000
[81.808] R13: ffff8e6984d82000 R14: ffff8e69966a69c0 R15: ffff8e69aa47b000
[81.809] FS: 00007fec6bdd9740(0000) GS:ffff8e6b1b379000(0000) knlGS:0000000000000000
[81.809] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[81.810] CR2: 00005604833670f0 CR3: 0000000116679000 CR4: 00000000000006f0
[81.810] Call Trace:
[81.810] <TASK>
[81.810] __btrfs_end_transaction+0x3e/0x2b0 [btrfs]
[81.811] btrfs_force_chunk_alloc_store+0xcd/0x140 [btrfs]
[81.811] kernfs_fop_write_iter+0x15f/0x240
[81.812] vfs_write+0x264/0x500
[81.812] ksys_write+0x6c/0xe0
[81.812] do_syscall_64+0x66/0x770
[81.812] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[81.813] RIP: 0033:0x7fec6be66197
[81.814] RSP: 002b:00007fffb159dd30 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[81.815] RAX: ffffffffffffffda RBX: 00007fec6bdd9740 RCX: 00007fec6be66197
[81.815] RDX: 0000000000000002 RSI: 0000560483374f80 RDI: 0000000000000001
[81.816] RBP: 0000560483374f80 R08: 0000000000000000 R09: 0000000000000000
[81.816] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002
[81.817] R13: 00007fec6bfb85c0 R14: 00007fec6bfb5ee0 R15: 00005604833729c0
[81.817] </TASK>
[81.817] irq event stamp: 20039
[81.818] hardirqs last enabled at (20047): [<ffffffff99a68302>] __up_console_sem+0x52/0x60
[81.818] hardirqs last disabled at (20056): [<ffffffff99a682e7>] __up_console_sem+0x37/0x60
[81.819] softirqs last enabled at (19470): [<ffffffff999d2b46>] __irq_exit_rcu+0x96/0xc0
[81.819] softirqs last disabled at (19463): [<ffffffff999d2b46>] __irq_exit_rcu+0x96/0xc0
[81.820] ---[ end trace 0000000000000000 ]---
[81.820] BTRFS: error (device dm-7 state A) in btrfs_create_pending_block_groups:2876: errno=-17 Object already exists
Inspecting these aborts with drgn, I observed a pattern of overlapping
chunk_maps. Note how stripe 1 of the first chunk overlaps in physical
address with stripe 0 of the second chunk.
<h2>Physical Start Physical End Length Logical Type Stripe</h2>
0x0000000102500000 0x0000000142500000 1.0G 0x0000000641d00000 META|DUP 0/2
0x0000000142500000 0x0000000182500000 1.0G 0x0000000641d00000 META|DUP 1/2
0x0000000142500000 0x0000000182500000 1.0G 0x0000000601d00000 META|DUP 0/2
0x0000000182500000 0x00000001c2500000 1.0G 0x0000000601d00000 META|DUP 1/2
Now how could this possibly happen? All chunk allocation is
---truncated---