SPDK_paper

Ziye Yang, James R. Harris, Benjamin Walker, Daniel Verkamp, Changpeng Liu, Cunyin Chang, Gang Cao, Jonathan Stern, Vishal Verma, Luse E. Paul:SPDK: A Development Kit to Build High Performance Storage Applications. CloudCom 2017: 154-161


Paper: SPDK

  1. provide a set of tools and libraries for writing high performance, scalable, user-mode strong applications.
  2. achieves high performance by moving the necessary drivers into user space and operating them in a polled mode instead of interrupt mode and lockless resource access.

Proble Statement

https://lddpicture.oss-cn-beijing.aliyuncs.com/picture/image-20220427141726878.png

  • strong demand on building high performance storage service upon emerging fast storage devices.
  • most storage software stack become the bottle neck for developing high performance storage applications.
    • kernel I/O stacks, due to context switch, data copy, interrupt, resource synchronization.

System Design and implementaion

https://lddpicture.oss-cn-beijing.aliyuncs.com/picture/image-20220427152603792.png

https://lddpicture.oss-cn-beijing.aliyuncs.com/picture/image-20220428153019098.png

https://lddpicture.oss-cn-beijing.aliyuncs.com/picture/image-20220428155416570.png

https://lddpicture.oss-cn-beijing.aliyuncs.com/picture/image-20220428155558990.png

  • App scheduling: event framework for writing asynchronous, polled-mode, shared-nothing server applications
  • Drivers: user space pooled mode NVMe driver, providing zero copy, highly parallel and direct access to NVMe SSDs.
  • Storages devices: abstracts the device exported by drivers and providers the user space block I/O interface to storage applications above.
  • Storage Protocals: contains the accelerated applications upon SPDK framework to support various different storage protocols.

App scheduling

https://lddpicture.oss-cn-beijing.aliyuncs.com/picture/image-20220427143747174.png

1. Events
  • Target: accomplish cross-thread communication while minimizing synchronization overhead
  • Traditional: thread-per-connection server design, depends on OS to schedule many threads issuing blocking I/O onto limited number of cores.
  • runs on events loop thread(reactor) per CPU core, to process incoming events from a queue, each event consists of a bundled function pointer and its arguments, destined for a particular CPU core.
2. Reactor
  • loop running on each core checks for incoming events and executes them in first-in, first-out order.
3. Pollers
  • functions with arguments that can bundled ad sent to a specific core to be executed.
  • pollers are executed repeatedly until unregistered.

User space polled drivers

  • moves the device driver implementation in user space instead of kernel space.
  • use pooling instead of interrupt, allows users to determine how much CPU time for each task instead of letting the kernel scheduler decide.
.1. Asynchronous I/O mode
  • call asynchronous read/write interface to send I/O requests, and use the corresponding I/O completion check functions to pull the completed I/Os.
.2. Lockless architecture
  • adopts a lockless architecture which requires each thread to access its own resource, e.g., memory, I/O submission queue, I/O completion queue.

Storage service

.1. Blobstore
  • a persistent, power-fail safe block allocator designed to be used as local storage system backing a higher level storage service.
  • designed to allow asynchronous, uncacked, parallel reads and writes to groups of blocks on a block device called ‘blob’.
.2. Blobfs
  • Filenames are currently stored as xattrs in each blob, the filename lookup is an O(n) operation. SPDK btree
.3. BDEV
  • abstract the device identified by user space drivers and other third part libraries to export the block service interface to applications.
  • a driver module API for implementing bdev drivers which enumerate and claim SPDK block devices and performance operations (read, write, unmap, etc.) on those devices
  • bdev drivers for NVMe, Linux AIO, Ceph RBD, blobdev

Kim, H. J., Lee, Y. S., & Kim, J. S. (2016). {NVMeDirect}: A User-space {I/O} Framework for Application-specific Optimization on {NVMe}{SSDs}. In 8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 16). CCF-A [link] [slide]


Paper: NVMeDirect

Summary

  1. propose a novel user-level I/O framework called NVMeDirect, which improves the performance by allowing the user applicatins to access the storage device directly.
  2. NVMeDirect can co-exist with legacy with legacy of I/O stack of the kernel, existing kernel based application can use the same NVMe SSD with NVMeDirect-enable applications simultanesously on different disk partitions.
  3. provides flexibility in queue management, I/O completion method, caching, and I/O scheduling where each user application can select its own I/O policies according to its I/O characteristics and requirements.

Proble Statement

https://lddpicture.oss-cn-beijing.aliyuncs.com/picture/image-20220503151514674.png

https://lddpicture.oss-cn-beijing.aliyuncs.com/picture/image-20220503151737965.png

  • as the storage devices are getting faster, the overhead of the legacy kernel I/O stack becomes noticeable since it has been optimized for slow HDDs.
    • the kernel should be general, so as to provides an abstraction layer for applications, managing all the hardware resources.
    • the kernel cann’t implement any policy that favors a certain application because it should provide fairness among applications.
    • the frequent update of the kernel requires a constant effort to port such application-specific optimization.

System Design

https://lddpicture.oss-cn-beijing.aliyuncs.com/picture/image-20220503150031783.png

  • Admin tool: controls the kernel driver with the root privilege to manage the access permission of I/O queues.
  • the kernal checks the premisions, and then creates the required submission queue and completion queue, and maps their memory regions and the associated doorbell registers to the user-space memory region of the application.
  • a thread can create one or more I/O handles to access the queues and each handle can be bound to a dedicated queue or a shared queue. Each handle can be configured to use different features such as caching, I/O scheduling, and I/O completion. (todo when a handle bound to a shared queue, how to solve data )

https://lddpicture.oss-cn-beijing.aliyuncs.com/picture/image-20220503152204762.png

https://lddpicture.oss-cn-beijing.aliyuncs.com/picture/image-20220503152303769.png

Yang Z, Liu C, Zhou Y, et al. Spdk vhost-nvme: Accelerating i/os in virtual machines on nvme ssds via user space vhost target[C]//2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2). IEEE, 2018: 67-76. [pdf]


Paper: SPDK vhost-NVMe

Summary

Proble Statement

0%