Skip to content

Storage

Resources

Unlike ZFS which has a lot of material in written and video form for potential users to learn from, BtrFS appears not to have much available. BtrFS does have an official wiki, but written articles on FOSS blogs focus on operation from the command-line but don't do a good job of describing the taxonomy of concepts, aside from the glossary.

Users of ZFS, in contrast, have taken the trouble to create introductory material, including Ars Technica's ZFS 101 article, and many talks by enthusiasts like Philip Paeps.

This might be because btrfs's concepts seem less well thought-out, or at least more poorly described. For example, the term subvolume is used in btrfs but the container for subvolumes is not "volume" but rather "top-level subvolume".

Jim Salter from Ars Technica (who wrote the ZFS 101 article above) appears to have devoted some effort to fleshing out the topic:

Tasks

Create virtual disks

fallocate -l 100M /tmp/disk0    # Create sparse file
losetup -f /tmp/disk0           # Create loopback device

Formatting filesystems

mkfs.ext4 /dev/sda1
mkfs.xfs /dev/sda2

Check filesystems

fsck.ext4 /dev/sda1
xfs_repair /dev/sda2

HDD serial numbers

Produce a CSV of hard disk identifiers and their serial numbers using hdparm, grep, cut, and output redirection.
for l in {a..w} 
do 
    echo -n "/dev/sd$l," >> drives
    hdparm -I /dev/sd$l | 
        grep 'Serial Number' - |
        cut -d : -f 2 | 
        tr -d '[:space:]' >> drives
    echo '' >> drives;
done

Samba

Configure Samba

mkdir /samba                   # Create a directory for the share
chmod -R 0777 /samba
chown -R nobody:nobody /samba  # Remove ownership, not necessary

Open firewall rule (not strictly necessary)

firewall-cmd --permanent --add-service=samba
firewall-cmd --reload
firewall-cmd --list-services # verify

Configure the main Samba config file at /etc/samba/smb.conf. The name in brackets becomes the name of the share.

[samba]
    comment = Samba on Ubuntu
    path = /samba
    read only = no
    browsable = yes

Verify configuration

testparm

Set SELinux context of share directory

semanage fcontext -a -t samba_share_t '/samba(/.*)?'
restorecon -vvFR /samba

<!-- Allow SELinux to work with Samba

setsebool -P samba_export_all_ro on

Set up a Samba account for $USER

smbpasswd -a $USER

Restart Samba service

systemctl restart smbd
-->

Browse all available shares

smbclient -L $HOST

Access samba share at $SHARE at server $HOST using user credential $USER

smbclient //$HOST/$USER -U $USER # (1)

  1. This will display the Samba CLI
    smb: \>
    

On TrueNAS, the option to "Allow Guest Access" should be turned on, unless password-based authentication for specific users is desired. Also, the directory must have write permissions enabled to allow uploading.

chmod o+w
Bizarrely, the ability to navigate into subdirectories appears to depend on the owner execute bit. This may have something to do with anonymous guest access.
chmod u+x

Permanently mounting a Samba share in /etc/fstab

//nas/Videos /home/jasper/Videos cifs guest,uid=1000,iocharset=utf8 0 0
Then mount the fstab file
mount -a

NFS

NFS is a distributed filesystem based on the RPC protocol that provides transparent access to remote disks.

Modern NFS deployments in the wild are usually versions 3 or 4:

  • V4 has superior performance, requires only the additional rpc.mountd service, and TCP port 2049 to be open
  • V3 requires additional services (rpcbind, lockd, rpc.statd) and many firewall ports

NFS shares are enabled using the /etc/exports file.

/etc/exports
/export/web_data1 *(ro,sync)
/export/web_data2 127.0.0.1(rw,sync,no_root_squash)

Once exports are defined, the NFS server can be started

systemctl enable --now nfs-server.service

Exports on localhost can be displayed using showmount

showmount -e

Shares can be mounted in /etc/fstab using the following syntax:

127.0.0.1:/export/web_data1 /mnt/nfs_web_data1 nfs defaults,_netdev 0 0
127.0.0.1:/export/web_data2 /mnt/nfs_web_data2 nfs defaults,_netdev 0 0

Better still is using autofs.

Resources

autofs

Auto File System offers an alternative way of mounting NFS shares that can save some system resources, especially when many shares are mounted. Autofs can mount NFS shares dynamically, only when accessed.

dnf install -y autofs
systemctl enable --now autofs.service

Mounts are defined in configs called maps. There are three map types:

  • master map is /etc/auto.master by default
  • direct maps point to other files for mount details. They are notable for beginning with /-
  • indirect maps also point to other files for mount details but provide an umbrella mount point which will contain all other mounts within it. Note that other mountpoints at this parent directory cannot coexist with autofs mounts.

Here is an example indirect map that will mount to /data/sales.

/etc/auto.master.d/data.autofs
/data /etc/auto.data
/etc/auto.data
sales -rw,soft 192.168.33.101:/data/sales

Map files also support wildcards.

* 127.0.0.1:/home/&

AutoFS's config is at /etc/autofs.conf. One important directive is master_map_name which defines the master map file.

LVM volume

pvcreate /dev/vd{b,c,d}
vgcreate group /dev/vd{b,c,d}
lvcreate -l 100%FREE -n volume group

VDO

Virtual disk optimizer (VDO) is a kernel module introduced in RHEL 7.5 that provides data deduplication and compression on block devices.

The physical storage of a VDO volume is divided into a number of slabs, which are contiguous regions of the physical space. All slabs for a given volume have the same size, which can be any power of 2 multiple of 128 MB up to 32 GB (2 GB by default). The maximum number of slabs is 8,192. The maximum physical storage of the VDO is provided to the user on creation.

Like LVM volumes, VDO volumes appear under /dev/mapper

VDO appears not to be installed by default, but it is available in the BaseOS repo.

dnf install vdo
systemctl enable --now vdo

Create a VDO volume
vdo create --name=web_storage --device=/dev/xvdb --vdoLogicalSize=10G
vdostats --human-readable
mkfs.xfs -K /dev/mapper/web_storage
udevadm settle

The fstab file requires a variety of options

/dev/mapper/web_storage /mnt/web_storage xfs _netdev,x-systemd.device-timeout=0,x-systemd.requires=vdo.service 0 0

Stratis

Stratis is an open-source managed pooled storage solution in the vein of ZFS or btrfs.

Stratis block devices can be disks, partitions, LUKS-encrypted volumes, LVM logical volumes, or DM multipath devices. Stratis pools are mounted under /stratis and, like other pooled storage systems, support multiple filesystems. Stratis file systems are thinly provisioned and formatted with xfs, although vanilla xfs utilities cannot be used on Stratis file systems.

dnf -y install stratisd stratis-cli
systemctl enable --now stratisd
Create a pool
stratis pool create pool /dev/sda /dev/sdb /dev/sdc # (1)

  1. An error about the devices being "owned" can be resolved by wiping it.
    wipefs -a /dev/sda
    
Display block devices managed by Stratis
stratis blockdev # (1)
  1. This command is equivalent to pvs in LVM.
Create filesystem
stratis fs create pool files
Confirm
stratis fs
/etc/fstab
/stratis/pool/files /mnt/stratisfs xfs defaults,x-systemd.requires=stratisd.service 0 0
Expand pool
stratis pool add-data pool /dev/sdb
Save snapshot
stratis fs snapshot pool files files-snapshot
Restore from snapshot
stratis fs rename files files-orig
stratis fs rename files-snapshot files
umount /mnt/files; mount /mnt/files

ZFS management

# Create a storage pool, conventionally named "tank" in documentation
zpool create tank raidz /dev/sd{a,b,c}

# By default, disks are identified by UUID
zpool status tank
zpool list

# Display real paths (i.e. block device names)
zpool status tank -L 
zpool list -v

# Destroy pool
zpool destroy tank
Mirrored arrays
zpool add tank mirror sde sdf
zpool detach sdb
Hot spares
zpool add tank spare sdg
zpool remove tank sdg # (1)
  1. The zpool remove command is used to remove hot spares, as well as cache and log devices.

Replacing a used disk with one that is unused uses zpool replace, initiating the resilvering process.

zpool replace tank sdb sdc

Ongoing resilvers can be cancelled using zpool detach:

zpool detach tank sdc

If a disk has gone bad, it must first be taken offline (apparently requiring its UUID) before physically replacing it.

Replace disk
zpool clear tank
zpool offline $UUID
zpool replace tank sdb sdc
watch zpool status tank

A dataset in ZFS is equivalent to the btrfs subvolume, defined as an independently mountable POSIX filetree.

Create dataset
zfs create tank/dataset
zfs list
zfs rename tank zpool
zfs remove zpool/dataset
Configure dataset
zfs set compression=on tank/dataset # Enable compression
zfs set sync=disabled tank/dataset  # Disable sync
zfs set acme:disksource=vendorname  # Set tag
Snapshot management
zfs snapshot tank@snapshot1
zfs rollback tank@snapshot1
zfs destroy tank@snapshot1

ZFS datasets are automatically mounted when created, but this behavior can be managed and changed.

zfs get mountpoint tank
zfs set mountpoint=/tank tank

Glossary

ARC
ARC serves as ZFS's read cache mechanism and avoids the thrashing possible with standard OS page caches by using a more efficient algorithm.

btrfs

B-Tree Filesystem "butter fs" is an open-source CoW filesystem that offers many of the same features as ZFS.

It was founded by Chris Mason in 2007. By 2009, btrfs 1.0 was accepted into the mainline Linux kernel. Btrfs was adopted by SUSE Enterprise Linux, but support was dropped by Red Hat in 2017.

A B-tree is a self-balancing tree data structure used by btrfs to organize and store metadata. The superblock holds pointers to the tree roots of the tree of tree roots and the chunk tree.

block group
In btrfs the fundamental unit of storage allocation consisting of one or more chunks, depending on RAID level, each stored on a different device.
boot environment
Allow changes to OS installations to be reverted
copy-on-write
In a CoW filesystem like ZFS and btrfs, when data on the filesystem is modified, that data is copied first before being modified and then written back to a different free location. The main advantage of this method is that the original data extent is not modified, so the risk of data corruption or partial update due to power failure is eliminated. This ensures that writes are atomic and the filesystem will always be in a consistent state.

dataset

In ZFS, datasets represent mountable filesystems. Improves on the use of traditional use of partitions in, say, Linux installations where mount points are typically separate partitions. Datasets allow quotas and other rules to be enforced.
extent
In btrfs, an extent is the fundamental storage unit corresponding to a contiguous sequence of bytes on disk that holds file data. Files can be fragmented into multiple extents, and this fragmentation can be measured using the filefrag CLI utility.
Extended File System

Ext was first implemented in 1992 by Remy Card to address limitations in the MINIX filesystem, which was used to develop the first Linux kernel. It could address up to 2GB of storage and handle 255-character filenames and had only one timestap per file.

  • Ext2 was developed by Remy Card only a year after ext's release as a commercial-grade filesystem, influenced by BSD's Berkeley Fast File System. It was prone to corruption if the system crashed or lost power while data was being written and performance losses due to fragmentation. Nevertheless, it was quickly and widely adopted, and still used as a format for USB drives.
  • Ext3 was adopted by mainline Linux in 2001 and uses journaling, whereby disk writes are stored as transactions in a special allocation, which allows a rebooted system to roll back incomplete transactions.
  • Ext4 was added to mainline Linux in 2008, developed by Theodore Ts'o, and improves upon ext3 but is still reliant on old technology.

inode

An inode (index node) is a data structure that stores all the metadata about a file but not its name or data.
subvolume
A tree of files and directories inside a btrfs that can be mounted as if it were an independent filesystem. Each btrfs filesystem has at least one subvolume that contains everything else in the filesystem, called the top-level subvolume.
thrashing
All OSes implement the page cache using the LRU algorithm, which maintains a queue of the most recently read blocks. As more recent blocks are read, older blocks are evicted from the bottom of the queue even if they are more frequently accessed. This process is referred to as thrashing.
RAID hole
Condition in which a stripe is only partially written before the system crashes, making the array inconsistent and corrupt after a restart
RAIDz
RAIDz1, RAIDz2, and RAIDz3 are special varieties of parity RAID in ZFS: the number indicates how many parity blocks are allocated to each data stripe.
Resilvering
Process of rebuilding redundant groups after disk replacement
SMB

Server Message Block (SMB) is a client/server protocol developed in the early 1980s by Intel, Microsoft, and IBM that has become the native protocol for file and printer sharing on Windows. It is implemented in the Samba application suite.

CIFS (Common Internet File System, pronounced "sifs") is a dialect and implementation of SMB whose acronym has survived despite the fact the protocol itself has fallen into disuse.

vdev

In ZFS a vdev ("virtual device") is an abstraction of one or more storage devices and therefore equivalent to a volume group in LVM. A collection of vdevs constitutes a zpool.

Vdevs support one of five topologies:

  • Single-device vdevs cannot survive any failure
  • Mirror vdevs duplicate every block on each of their devices
  • RAIDz1
  • RAIDz2
  • RAIDz3

Special support classes of vdev:

  • CACHE
  • LOG (also SLOG), because it usually has faster write performance, provides the pool with a separate vdev to store the ZIL in.
  • SPECIAL

volume

A ZFS volume is a dataset that represents a block device. They are created with the -V option and can be found under /dev/zvol.
zfs create -V 5gb tank/vol
A volume can also be shared as an iSCSI target by setting the shareiscsi property on the volume.
ZED
Daemon that will listen to kernel events related to ZFS, conducting action defined in zedlets.
zfs-fuse
ZFS filesystem daemon

ZFS

ZFS is a technology that combines the functions of a 128-bit CoW filesystem, a volume manager, and software RAID.

Like RAID, ZFS attempts to achieve data reliability by abstracting volumes over physical devices. But ZFS improves on RAID with error handling: it can use checksum information to correct corrupted files. This is unlike hardware RAID mirrors, where failures occur silently and are typically only detected upon reading a corrupt file.

ZFS writes use CoW meaning they are atomic and aren't affected by issues like RAID holes.

ZFS can also transparently compress data written to datasets.

ZFS on Linux (ZOL) is considered the ugly stepchild of the ZFS community despite the fact that the Linux implementation has the most features and the most community support. ZFS is too tightly bound to the operation of the kernel to operate in true userspace, and that is why each implementation is different for operating systems.

ZIL

The ZIL is a special storage area used for synchronous writes.

Most writes are asynchronous, where the filesystem is allowed to aggregate and commit them in batches to reduce fragmentation and increase throughput.

Synchronous writes in ZFS are committed to the ZIL while also kept in memory. Writes saved to the ZIL are committed to main storage in normal TXGs moments later.

Normally, the ZIL is written to and never read from again. Writes saved to ZIL are committed to main storage from RAM in normal TXGs after a few moments and unlinked from the ZIL.

The ZIL is only read during pool imports that occur after a crash and restart.

The ZIL can be placed on the LOG vdev to take advantage of higher write performance during sync writes.

The ZIL is typically mirrored because that is where data can be lost.

zpool

A zpool is the largest structure in the ZFS taxonomy, representing an independent collection of one or more vdevs. Essentially, a zpool is a JBOD with special characteristics.

Writes are distributed across available vdevs in accordance with their available free space, such that they fill more or less evenly.

A utilization awareness mechanism built into ZFS also accounts for if one vdev is significantly more busy than another, i.e. reading. In such a case, writes to that busy vdev will be deferred in favor of less busy ones.

Zpools are automatically mounted at root upon creation (without the need to edit fstab).

Commands

btrfs

Show storage consumed, including how much is shared by all snapshots

btrfs fi du /home -s

btrfs fi df /home

fallocate

Create a file of a given size with the --length/-l option

fallocate -l 1GB $FILENAME # gigabyte
fallocate -l 1G $FILENAME  # gibibyte

hdparm

hdparm -I /dev/sda

losetup

Create a loopback device (i.e. a virtual block device)
losetup -f /tmp/file1

lsblk

Display filesystems
lsblk -f # --fs

sfdisk

Script-based partition table editor, similar to fdisk and gdisk, which can be run interactively. It does not interface with GPT format, neither is it designed for large partitions. [ref][11]

List partitions on all devices

Display size of {partition} or {device} This command produces the size of {partition} (i.e. /dev/sda1) or even {device} (/dev/sda) in blocks

sfdisk -s partition
sfdisk -s device
Apply consistency checks to {partition} or {device}
sfdisk -V partition
sfdisk --verify device
Create a partition
sfdisk device
Save sectors changed This command will allow recovery using the following command
sfdisk /dev/hdd -O hdd-partition-sectors.save
Recovery Man page indicates this flag is no longer supported, and recommends use of dd instead.
sfdisk /dev/hdd -I hdd-partition-sectors.save

shred

Write random data to an unmounted disk for {n} passes
shred --iterations=n

LVM

lvm

lvm version

lvresize

Resize existent logical volume Marketing in volume group vg1 to have an additional 10 gigabytes of space
lvresize -L +10G /dev/vg1/Marketing

pvcreate

pvcreate /dev/sd{a,b,c}

lvresize

Resize existent logical volume Marketing in volume group vg1 to have an additional 10 gigabytes of space

lvresize -L +10G /dev/vg1/Marketing

It is possible to use LVM to format the storage media when installing CentOS or RHEL on a virtual machine, even if there is only a single disk. This will result in a swap partition being created as a small logical volume. This can be removed:

swapoff cs/swap
lvremove cs/swap
Then the remaining logical volume mounted to root can be expanded:
lvresize -l 100%VG cs/root

pvcreate

pvcreate /dev/loop0