Storage
Resources
Unlike ZFS which has a lot of material in written and video form for potential users to learn from, BtrFS appears not to have much available. BtrFS does have an official wiki, but written articles on FOSS blogs focus on operation from the command-line but don't do a good job of describing the taxonomy of concepts, aside from the glossary.
Users of ZFS, in contrast, have taken the trouble to create introductory material, including Ars Technica's ZFS 101 article, and many talks by enthusiasts like Philip Paeps.
This might be because btrfs's concepts seem less well thought-out, or at least more poorly described. For example, the term subvolume is used in btrfs but the container for subvolumes is not "volume" but rather "top-level subvolume".
Jim Salter from Ars Technica (who wrote the ZFS 101 article above) appears to have devoted some effort to fleshing out the topic:
-
Examining btrfs, Linux's perpetually half-finished filesystem
- Install Samba4 on RHEL 8 for File Sharing on Windows
- FreeNAS 11.3 - How to Set Up Windows SMB Shares
- BtrFS
- Creating and Destroying ZFS Storage Pools
- Managing devices in ZFS storage pools
- Getting started with btrfs for Linux
- Understanding Linux filesystems: ext4 and beyond
Tasks
Create virtual disks
fallocate -l 100M /tmp/disk0 # Create sparse file losetup -f /tmp/disk0 # Create loopback device
Formatting filesystems
-
mkfs.ext4 /dev/sda1 mkfs.xfs /dev/sda2
Check filesystems
fsck.ext4 /dev/sda1 xfs_repair /dev/sda2
HDD serial numbers
- Produce a CSV of hard disk identifiers and their serial numbers using hdparm, grep, cut, and output redirection.
for l in {a..w} do echo -n "/dev/sd$l," >> drives hdparm -I /dev/sd$l | grep 'Serial Number' - | cut -d : -f 2 | tr -d '[:space:]' >> drives echo '' >> drives; done
Samba
-
Configure Samba
mkdir /samba # Create a directory for the share chmod -R 0777 /samba chown -R nobody:nobody /samba # Remove ownership, not necessary
Open firewall rule (not strictly necessary)
firewall-cmd --permanent --add-service=samba firewall-cmd --reload firewall-cmd --list-services # verify
Configure the main Samba config file at /etc/samba/smb.conf. The name in brackets becomes the name of the share.
[samba] comment = Samba on Ubuntu path = /samba read only = no browsable = yes
Verify configuration
testparm
Set SELinux context of share directory
semanage fcontext -a -t samba_share_t '/samba(/.*)?' restorecon -vvFR /samba
<!-- Allow SELinux to work with Samba
setsebool -P samba_export_all_ro on
Set up a Samba account for
$USER
smbpasswd -a $USER
Restart Samba service
-->systemctl restart smbd
Browse all available shares
smbclient -L $HOST
Access samba share at
$SHARE
at server$HOST
using user credential$USER
smbclient //$HOST/$USER -U $USER # (1)
- This will display the Samba CLI
smb: \>
On TrueNAS, the option to "Allow Guest Access" should be turned on, unless password-based authentication for specific users is desired. Also, the directory must have write permissions enabled to allow uploading.
Bizarrely, the ability to navigate into subdirectories appears to depend on the owner execute bit. This may have something to do with anonymous guest access.chmod o+w
chmod u+x
Permanently mounting a Samba share in /etc/fstab
Then mount the fstab file//nas/Videos /home/jasper/Videos cifs guest,uid=1000,iocharset=utf8 0 0
mount -a
- This will display the Samba CLI
NFS
-
NFS is a distributed filesystem based on the RPC protocol that provides transparent access to remote disks.
Modern NFS deployments in the wild are usually versions 3 or 4:
- V4 has superior performance, requires only the additional rpc.mountd service, and TCP port 2049 to be open
- V3 requires additional services (rpcbind, lockd, rpc.statd) and many firewall ports
NFS shares are enabled using the /etc/exports file.
/etc/exports/export/web_data1 *(ro,sync) /export/web_data2 127.0.0.1(rw,sync,no_root_squash)
Once exports are defined, the NFS server can be started
systemctl enable --now nfs-server.service
Exports on localhost can be displayed using showmount
showmount -e
Shares can be mounted in /etc/fstab using the following syntax:
127.0.0.1:/export/web_data1 /mnt/nfs_web_data1 nfs defaults,_netdev 0 0 127.0.0.1:/export/web_data2 /mnt/nfs_web_data2 nfs defaults,_netdev 0 0
Better still is using autofs.
autofs
-
Auto File System offers an alternative way of mounting NFS shares that can save some system resources, especially when many shares are mounted. Autofs can mount NFS shares dynamically, only when accessed.
dnf install -y autofs systemctl enable --now autofs.service
Mounts are defined in configs called maps. There are three map types:
- master map is /etc/auto.master by default
- direct maps point to other files for mount details. They are notable for beginning with /-
- indirect maps also point to other files for mount details but provide an umbrella mount point which will contain all other mounts within it. Note that other mountpoints at this parent directory cannot coexist with autofs mounts.
Here is an example indirect map that will mount to /data/sales.
/etc/auto.master.d/data.autofs/data /etc/auto.data
/etc/auto.datasales -rw,soft 192.168.33.101:/data/sales
Map files also support wildcards.
* 127.0.0.1:/home/&
AutoFS's config is at /etc/autofs.conf. One important directive is master_map_name which defines the master map file.
LVM volume
pvcreate /dev/vd{b,c,d} vgcreate group /dev/vd{b,c,d} lvcreate -l 100%FREE -n volume group
VDO
-
Virtual disk optimizer (VDO) is a kernel module introduced in RHEL 7.5 that provides data deduplication and compression on block devices.
The physical storage of a VDO volume is divided into a number of slabs, which are contiguous regions of the physical space. All slabs for a given volume have the same size, which can be any power of 2 multiple of 128 MB up to 32 GB (2 GB by default). The maximum number of slabs is 8,192. The maximum physical storage of the VDO is provided to the user on creation.
Like LVM volumes, VDO volumes appear under /dev/mapper
VDO appears not to be installed by default, but it is available in the BaseOS repo.
dnf install vdo systemctl enable --now vdo
Create a VDO volumevdo create --name=web_storage --device=/dev/xvdb --vdoLogicalSize=10G vdostats --human-readable mkfs.xfs -K /dev/mapper/web_storage udevadm settle
The fstab file requires a variety of options
/dev/mapper/web_storage /mnt/web_storage xfs _netdev,x-systemd.device-timeout=0,x-systemd.requires=vdo.service 0 0
Stratis
-
Stratis is an open-source managed pooled storage solution in the vein of ZFS or btrfs.
Stratis block devices can be disks, partitions, LUKS-encrypted volumes, LVM logical volumes, or DM multipath devices. Stratis pools are mounted under /stratis and, like other pooled storage systems, support multiple filesystems. Stratis file systems are thinly provisioned and formatted with xfs, although vanilla xfs utilities cannot be used on Stratis file systems.
dnf -y install stratisd stratis-cli systemctl enable --now stratisd
Create a poolstratis pool create pool /dev/sda /dev/sdb /dev/sdc # (1)
- An error about the devices being "owned" can be resolved by wiping it.
wipefs -a /dev/sda
Display block devices managed by Stratisstratis blockdev # (1)
- This command is equivalent to pvs in LVM.
Create filesystemstratis fs create pool files
Confirmstratis fs
/etc/fstab/stratis/pool/files /mnt/stratisfs xfs defaults,x-systemd.requires=stratisd.service 0 0
Expand poolstratis pool add-data pool /dev/sdb
Save snapshotstratis fs snapshot pool files files-snapshot
Restore from snapshotstratis fs rename files files-orig stratis fs rename files-snapshot files umount /mnt/files; mount /mnt/files
- An error about the devices being "owned" can be resolved by wiping it.
ZFS management
-
# Create a storage pool, conventionally named "tank" in documentation zpool create tank raidz /dev/sd{a,b,c} # By default, disks are identified by UUID zpool status tank zpool list # Display real paths (i.e. block device names) zpool status tank -L zpool list -v # Destroy pool zpool destroy tank
Mirrored arrayszpool add tank mirror sde sdf zpool detach sdb
Hot spareszpool add tank spare sdg zpool remove tank sdg # (1)
- The zpool remove command is used to remove hot spares, as well as cache and log devices.
Replacing a used disk with one that is unused uses zpool replace, initiating the resilvering process.
zpool replace tank sdb sdc
Ongoing resilvers can be cancelled using zpool detach:
zpool detach tank sdc
If a disk has gone bad, it must first be taken offline (apparently requiring its UUID) before physically replacing it.
Replace diskzpool clear tank zpool offline $UUID zpool replace tank sdb sdc watch zpool status tank
A dataset in ZFS is equivalent to the btrfs subvolume, defined as an independently mountable POSIX filetree.
Create datasetzfs create tank/dataset zfs list zfs rename tank zpool zfs remove zpool/dataset
Configure datasetzfs set compression=on tank/dataset # Enable compression zfs set sync=disabled tank/dataset # Disable sync zfs set acme:disksource=vendorname # Set tag
Snapshot managementzfs snapshot tank@snapshot1 zfs rollback tank@snapshot1 zfs destroy tank@snapshot1
ZFS datasets are automatically mounted when created, but this behavior can be managed and changed.
zfs get mountpoint tank zfs set mountpoint=/tank tank
Glossary
- ARC
- ARC serves as ZFS's read cache mechanism and avoids the thrashing possible with standard OS page caches by using a more efficient algorithm.
btrfs
-
B-Tree Filesystem "butter fs" is an open-source CoW filesystem that offers many of the same features as ZFS.
It was founded by Chris Mason in 2007. By 2009, btrfs 1.0 was accepted into the mainline Linux kernel. Btrfs was adopted by SUSE Enterprise Linux, but support was dropped by Red Hat in 2017.
A B-tree is a self-balancing tree data structure used by btrfs to organize and store metadata. The superblock holds pointers to the tree roots of the tree of tree roots and the chunk tree.
- block group
- In btrfs the fundamental unit of storage allocation consisting of one or more chunks, depending on RAID level, each stored on a different device.
- boot environment
- Allow changes to OS installations to be reverted
- copy-on-write
- In a CoW filesystem like ZFS and btrfs, when data on the filesystem is modified, that data is copied first before being modified and then written back to a different free location. The main advantage of this method is that the original data extent is not modified, so the risk of data corruption or partial update due to power failure is eliminated. This ensures that writes are atomic and the filesystem will always be in a consistent state.
dataset
- In ZFS, datasets represent mountable filesystems. Improves on the use of traditional use of partitions in, say, Linux installations where mount points are typically separate partitions. Datasets allow quotas and other rules to be enforced.
- extent
- In btrfs, an extent is the fundamental storage unit corresponding to a contiguous sequence of bytes on disk that holds file data.
Files can be fragmented into multiple extents, and this fragmentation can be measured using the
filefrag
CLI utility. - Extended File System
-
Ext was first implemented in 1992 by Remy Card to address limitations in the MINIX filesystem, which was used to develop the first Linux kernel. It could address up to 2GB of storage and handle 255-character filenames and had only one timestap per file.
- Ext2 was developed by Remy Card only a year after ext's release as a commercial-grade filesystem, influenced by BSD's Berkeley Fast File System. It was prone to corruption if the system crashed or lost power while data was being written and performance losses due to fragmentation. Nevertheless, it was quickly and widely adopted, and still used as a format for USB drives.
- Ext3 was adopted by mainline Linux in 2001 and uses journaling, whereby disk writes are stored as transactions in a special allocation, which allows a rebooted system to roll back incomplete transactions.
- Ext4 was added to mainline Linux in 2008, developed by Theodore Ts'o, and improves upon ext3 but is still reliant on old technology.
inode
- An inode (index node) is a data structure that stores all the metadata about a file but not its name or data.
- subvolume
- A tree of files and directories inside a btrfs that can be mounted as if it were an independent filesystem. Each btrfs filesystem has at least one subvolume that contains everything else in the filesystem, called the top-level subvolume.
- thrashing
- All OSes implement the page cache using the LRU algorithm, which maintains a queue of the most recently read blocks. As more recent blocks are read, older blocks are evicted from the bottom of the queue even if they are more frequently accessed. This process is referred to as thrashing.
- RAID hole
- Condition in which a stripe is only partially written before the system crashes, making the array inconsistent and corrupt after a restart
- RAIDz
- RAIDz1, RAIDz2, and RAIDz3 are special varieties of parity RAID in ZFS: the number indicates how many parity blocks are allocated to each data stripe.
- Resilvering
- Process of rebuilding redundant groups after disk replacement
- SMB
-
Server Message Block (SMB) is a client/server protocol developed in the early 1980s by Intel, Microsoft, and IBM that has become the native protocol for file and printer sharing on Windows. It is implemented in the Samba application suite.
CIFS (Common Internet File System, pronounced "sifs") is a dialect and implementation of SMB whose acronym has survived despite the fact the protocol itself has fallen into disuse.
vdev
-
In ZFS a vdev ("virtual device") is an abstraction of one or more storage devices and therefore equivalent to a volume group in LVM. A collection of vdevs constitutes a zpool.
Vdevs support one of five topologies:
- Single-device vdevs cannot survive any failure
- Mirror vdevs duplicate every block on each of their devices
- RAIDz1
- RAIDz2
- RAIDz3
Special support classes of vdev:
CACHE
- LOG (also SLOG), because it usually has faster write performance, provides the pool with a separate vdev to store the ZIL in.
SPECIAL
volume
- A ZFS volume is a dataset that represents a block device.
They are created with the -V option and can be found under /dev/zvol.
A volume can also be shared as an iSCSI target by setting the shareiscsi property on the volume.
zfs create -V 5gb tank/vol
- ZED
- Daemon that will listen to kernel events related to ZFS, conducting action defined in zedlets.
- zfs-fuse
- ZFS filesystem daemon
ZFS
-
ZFS is a technology that combines the functions of a 128-bit CoW filesystem, a volume manager, and software RAID.
Like RAID, ZFS attempts to achieve data reliability by abstracting volumes over physical devices. But ZFS improves on RAID with error handling: it can use checksum information to correct corrupted files. This is unlike hardware RAID mirrors, where failures occur silently and are typically only detected upon reading a corrupt file.
ZFS writes use CoW meaning they are atomic and aren't affected by issues like RAID holes.
ZFS can also transparently compress data written to datasets.
ZFS on Linux (ZOL) is considered the ugly stepchild of the ZFS community despite the fact that the Linux implementation has the most features and the most community support. ZFS is too tightly bound to the operation of the kernel to operate in true userspace, and that is why each implementation is different for operating systems.
- ZIL
-
The ZIL is a special storage area used for synchronous writes.
Most writes are asynchronous, where the filesystem is allowed to aggregate and commit them in batches to reduce fragmentation and increase throughput.
Synchronous writes in ZFS are committed to the ZIL while also kept in memory. Writes saved to the ZIL are committed to main storage in normal TXGs moments later.
Normally, the ZIL is written to and never read from again. Writes saved to ZIL are committed to main storage from RAM in normal TXGs after a few moments and unlinked from the ZIL.
The ZIL is only read during pool imports that occur after a crash and restart.
The ZIL can be placed on the LOG vdev to take advantage of higher write performance during sync writes.
The ZIL is typically mirrored because that is where data can be lost.
zpool
-
A zpool is the largest structure in the ZFS taxonomy, representing an independent collection of one or more vdevs. Essentially, a zpool is a JBOD with special characteristics.
Writes are distributed across available vdevs in accordance with their available free space, such that they fill more or less evenly.
A utilization awareness mechanism built into ZFS also accounts for if one vdev is significantly more busy than another, i.e. reading. In such a case, writes to that busy vdev will be deferred in favor of less busy ones.
Zpools are automatically mounted at root upon creation (without the need to edit fstab).
Commands
btrfs
-
Show storage consumed, including how much is shared by all snapshots
btrfs fi du /home -s
btrfs fi df /home
fallocate
-
Create a file of a given size with the
--length
/-l
optionfallocate -l 1GB $FILENAME # gigabyte
fallocate -l 1G $FILENAME # gibibyte
hdparm
hdparm -I /dev/sda
losetup
- Create a loopback device (i.e. a virtual block device)
losetup -f /tmp/file1
lsblk
- Display filesystems
lsblk -f # --fs
sfdisk
-
Script-based partition table editor, similar to
fdisk
andgdisk
, which can be run interactively. It does not interface with GPT format, neither is it designed for large partitions. [ref][11]List partitions on all devices
Display size of {partition} or {device} This command produces the size of {partition} (i.e.
/dev/sda1
) or even {device} (/dev/sda
) in blocksApply consistency checks to {partition} or {device}sfdisk -s partition sfdisk -s device
Create a partitionsfdisk -V partition sfdisk --verify device
Save sectors changed This command will allow recovery using the following commandsfdisk device
Recovery Man page indicates this flag is no longer supported, and recommends use ofsfdisk /dev/hdd -O hdd-partition-sectors.save
dd
instead.sfdisk /dev/hdd -I hdd-partition-sectors.save
shred
- Write random data to an unmounted disk for {n} passes
shred --iterations=n
LVM
lvm
lvm version
lvresize
- Resize existent logical volume Marketing in volume group vg1 to have an additional 10 gigabytes of space
lvresize -L +10G /dev/vg1/Marketing
pvcreate
pvcreate /dev/sd{a,b,c}
lvresize
-
Resize existent logical volume Marketing in volume group vg1 to have an additional 10 gigabytes of space
lvresize -L +10G /dev/vg1/Marketing
It is possible to use LVM to format the storage media when installing CentOS or RHEL on a virtual machine, even if there is only a single disk. This will result in a swap partition being created as a small logical volume. This can be removed:
Then the remaining logical volume mounted to root can be expanded:swapoff cs/swap lvremove cs/swap
lvresize -l 100%VG cs/root
pvcreate
pvcreate /dev/loop0