Rclone enhancements for cloud storage management
The open-source command-line program Rclone enables users to easily sync and manage data on cloud storage and now supports over 40 cloud storage products as backends. IBM Cloud recommends Rclone to users to access IBM Cloud Object Storage (COS). IBM Cloud users can take advantage of Rclone features like mounting remote storage as a filesystem, file transfer, and large file chunking. In this post, we’ve identified and addressed multiple Rclone issues related to these features to make using Rclone easier for anyone.
The first is a problem in the Rclone mount cache management that did not allow the application’s working set to be bigger than the capacity of the Rclone cache, which would render Rclone unusable in many data intensive applications. The second problem is a performance issue during uploads with large file chunking with Rclone Chunker to remote backends like S3 cloud object storage that do not support fast renaming. The third problem is the lack of ability for resumable uploads with Rclone Chunker.
Rclone mount supports several cache modes: off, minimal, writes, and full. The full cache mode provides the best-read performance through on-demand partial file caching and prefetching. In this mode, the cache that Rclone maintains has an item for each open file. Each cache item is a sparse file inside a local directory tree (specified by --cache-dir VFS mount command) that mirrors the remote storage and contains ranges of the remote file that are being accessed or overwritten.
When the full cache mode was developed for Rclone's 1.53 release, the condition of cache-space exhaustion (ENOSPC error) was not addressed. When the working set of open files consumed more space than available space on the system, Rclone would not be able to continue to serve new IOs, even for read operations. This problem would potentially render Rclone unusable in cases where the working set included large multi- terabyte files or a large number of smaller files.
Aiming to solve this problem, we provided a patch to support synchronous cache space recovery allowing read threads to recover from ENOSPC errors when cache space can be recovered from cache items that are not in use or safe to be reset or emptied. This patch complements the existing cache cleaning process in two ways.
First, the existing cache cleaning process is time-driven and runs periodically. The cache space can run out while the cache cleaner thread is still waiting for its next scheduled run. The IO threads that encounter ENOSPC return an internal error to the applications in this case even when cache space can be recovered to avoid this error. This patch addresses this problem by having the read threads kick the cache cleaner thread in this condition to recover cache space preventing ENOSPC errors from impacting the applications unnecessarily.
Second, this patch enhances the cache cleaner to support cache item reset. Before this patch, the cache purge process removed cache items not in use. This may not be sufficient when the total size of the working set exceeds the capacity of the cache directory. Like in the prior code, this patch starts the purge process by removing cache files that are not in use. Cache items whose access times are older than vfs-cache-max-age are removed first. After that, other not-in-use items are removed in LRU order until vfs-cache-max-size is reached. If the vfs-cache-max-size (the quota) is still not reached at this time, this patch adds a cache reset step to reset/empty cache files that are still in use but not dirtied. This enables application processes to continue without seeing an error even when the working set depletes the cache space as long as there is not a large write working set hoarding the entire cache space.
By design, this patch does not add ENOSPC error recovery for writes. Rclone does not empty a write cache item until the file data is written back to the backend upon close. Allowing more cache space to be consumed by dirty cache items when the cache space is already running low would increase the risk of exhausting the cache space in a way that the VFS mount becomes unreadable.
The Rclone Chunker transparently splits large files into smaller chunks during copies and can act as a wrapper around storage backends. To allow for parallel operations and look atomic, it employs temporary and permanent chunks. Temporary chunks represent partial objects, often in process of an upload. Permanent chunks belong to a completed upload and comprise a full composite object.
Rclone distinguished between permanent and temporary chunks by uploading chunks with a temporary suffix unique to each transaction and using server-side move or copy and delete operations to rename each chunk at the end of successful uploads to remove this suffix. We determined this method was inefficient for remote backends without instant server-side copy, such as S3 cloud-object storage, which typically involves an internal copy operation. The time spent renaming each chunk at the end of an upload could nearly double the total duration of a transaction.
To address this issue, we provided a secondary way to manage permanent and temporary chunks during uploads. Introduced in Rclone's 1.55 release, this approach is triggered when the "transactions" configuration option is set to "norename". In this case, Rclone Chunker still appends chunks with a unique transaction identifier during an upload. However, instead of renaming the chunks upon completion, the unique transaction identifier is added to the metadata of the composite object. Any transaction that needs to distinguish between permanent and temporary chunks can obtain the transaction identifier from metadata file and identify permanent chunks as those with the matching suffix.
Rclone Chunker uploads files in sequential chunks. When an upload is interrupted in the middle of a file, the chunks that are already written can remain on the remote. Previously, when restarting an upload after an interruption, the chunks that were already uploaded would be ignored and the entire upload would start from scratch. We provided an enhancement to allow chunked uploads to resume where they left off, saving time and bandwidth by avoiding uploading what is already at the remote.
In addition to making resumes possible for Rclone Chunker, there was also an existing goal to introduce resumes across a range of backends where resuming might be possible. To accommodate this, the first step that was taken was creating a generic resumer interface that could later be implemented by any backend in which resumes are possible. When a file is being transferred to a backend that implements resumes, any data necessary to resume an upload is stored in Rclone’s cache. This data will vary among backends, but will often be some sort of ID unique to the transaction. Along with this data, a fingerprint for the object being transferred is also stored. If an upload fails, future uploads for the same file will be able to check the cache for a fingerprint matching the file they are reattempting to upload. If one is found, the resume data stored in the cache will be used to execute a resume and complete the upload from where it left off. When a successful upload completes, the cache for that file is cleared.
The final step was to implement the resumer interface for the Rclone Chunker backend. This was accomplished by updating the cache after each successful chunk upload. The cache stores the latest successfully uploaded chunk number along with several user-set options, such as chunk size, as these must be consistent between uploads for resuming to be possible.
Additionally, a partial hash state of the file is stored both in the cache and in a special chunk on the remote. Rclone Chunker hashes files during transit to save time while still verifying the integrity of an upload. If an upload is interrupted, the cache maintains these values. Then, future attempts to upload the file can verify the user-set options and partial hash state are consistent with the original upload. If no discrepancies are found, Rclone can skip all chunks up to and including the cached chunk number and only upload chunks that didn't complete during the interrupted upload. Additionally, the partial hash state can be used to resume hashing the file from the interrupted point.
Currently, this functionality is available in a beta release for public testing. Users may set the "resume-cutoff" flag to attempt to resume any uploads of files larger than the specified size.
The Research and IBM Cloud development teams contributed three enhancements to the open-source community. The first enhancement enables Rclone VFS mount to work in data-intensive applications where the size of the working set exceeds the cache size. The second enhancement improves the performance of the Rclone Chunker backend when it is layered on remote backends without efficient renaming. This enhancement halves the total transaction time. The third enhancement adds resume functionality to Rclone copy with the Chunker backend. Because multi-TB files can take many hours to copy, ability to resume uploads can save a significant amount of time when an Rclone copy failure occurs.