liudongdong1 收录于 Categories Storage

2022-04-29 约 855 字预计阅读 5 分钟 - 次阅读

https://cdn.pixabay.com/photo/2022/05/08/03/42/squirrel-7181213__340.jpg

Ziye Yang, James R. Harris, Benjamin Walker, Daniel Verkamp, Changpeng Liu, Cunyin Chang, Gang Cao, Jonathan Stern, Vishal Verma, Luse E. Paul:SPDK: A Development Kit to Build High Performance Storage Applications. CloudCom 2017: 154-161

Paper: SPDK

provide a set of tools and libraries for writing high performance, scalable, user-mode strong applications.
achieves high performance by moving the necessary drivers into user space and operating them in a polled mode instead of interrupt mode and lockless resource access.

Proble Statement

strong demand on building high performance storage service upon emerging fast storage devices.
most storage software stack become the bottle neck for developing high performance storage applications.
- kernel I/O stacks, due to context switch, data copy, interrupt, resource synchronization.

System Design and implementaion

App scheduling: event framework for writing asynchronous, polled-mode, shared-nothing server applications
Drivers: user space pooled mode NVMe driver, providing zero copy, highly parallel and direct access to NVMe SSDs.
Storages devices: abstracts the device exported by drivers and providers the user space block I/O interface to storage applications above.
Storage Protocals: contains the accelerated applications upon SPDK framework to support various different storage protocols.

App scheduling

1. Events

Target: accomplish cross-thread communication while minimizing synchronization overhead
Traditional: thread-per-connection server design, depends on OS to schedule many threads issuing blocking I/O onto limited number of cores.
runs on events loop thread(reactor) per CPU core, to process incoming events from a queue, each event consists of a bundled function pointer and its arguments, destined for a particular CPU core.

2. Reactor

loop running on each core checks for incoming events and executes them in first-in, first-out order.

3. Pollers

functions with arguments that can bundled ad sent to a specific core to be executed.
pollers are executed repeatedly until unregistered.

User space polled drivers

moves the device driver implementation in user space instead of kernel space.
use pooling instead of interrupt, allows users to determine how much CPU time for each task instead of letting the kernel scheduler decide.

.1. Asynchronous I/O mode

call asynchronous read/write interface to send I/O requests, and use the corresponding I/O completion check functions to pull the completed I/Os.

.2. Lockless architecture

adopts a lockless architecture which requires each thread to access its own resource, e.g., memory, I/O submission queue, I/O completion queue.

Storage service

.1. Blobstore

a persistent, power-fail safe block allocator designed to be used as local storage system backing a higher level storage service.
designed to allow asynchronous, uncacked, parallel reads and writes to groups of blocks on a block device called ‘blob’.

.2. Blobfs

Filenames are currently stored as xattrs in each blob, the filename lookup is an O(n) operation. SPDK btree

.3. BDEV

abstract the device identified by user space drivers and other third part libraries to export the block service interface to applications.
a driver module API for implementing bdev drivers which enumerate and claim SPDK block devices and performance operations (read, write, unmap, etc.) on those devices
bdev drivers for NVMe, Linux AIO, Ceph RBD, blobdev

Kim, H. J., Lee, Y. S., & Kim, J. S. (2016). {NVMeDirect}: A User-space {I/O} Framework for Application-specific Optimization on {NVMe}{SSDs}. In 8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 16). CCF-A [link] [slide]

Paper: NVMeDirect

Summary

propose a novel user-level I/O framework called NVMeDirect, which improves the performance by allowing the user applicatins to access the storage device directly.
NVMeDirect can co-exist with legacy with legacy of I/O stack of the kernel, existing kernel based application can use the same NVMe SSD with NVMeDirect-enable applications simultanesously on different disk partitions.
provides flexibility in queue management, I/O completion method, caching, and I/O scheduling where each user application can select its own I/O policies according to its I/O characteristics and requirements.

Proble Statement

as the storage devices are getting faster, the overhead of the legacy kernel I/O stack becomes noticeable since it has been optimized for slow HDDs.
- the kernel should be general, so as to provides an abstraction layer for applications, managing all the hardware resources.
- the kernel cann’t implement any policy that favors a certain application because it should provide fairness among applications.
- the frequent update of the kernel requires a constant effort to port such application-specific optimization.

System Design

Admin tool: controls the kernel driver with the root privilege to manage the access permission of I/O queues.
the kernal checks the premisions, and then creates the required submission queue and completion queue, and maps their memory regions and the associated doorbell registers to the user-space memory region of the application.
a thread can create one or more I/O handles to access the queues and each handle can be bound to a dedicated queue or a shared queue. Each handle can be configured to use different features such as caching, I/O scheduling, and I/O completion. (todo when a handle bound to a shared queue, how to solve data )

Yang Z, Liu C, Zhou Y, et al. Spdk vhost-nvme: Accelerating i/os in virtual machines on nvme ssds via user space vhost target[C]//2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2). IEEE, 2018: 67-76. [pdf]

SPDK_paper

Paper: SPDK

Proble Statement

System Design and implementaion

App scheduling

1. Events

2. Reactor

3. Pollers

User space polled drivers

.1. Asynchronous I/O mode

.2. Lockless architecture

Storage service

.1. Blobstore

.2. Blobfs

.3. BDEV

Paper: NVMeDirect

Summary

Proble Statement

System Design

Paper: SPDK vhost-NVMe

Summary

Proble Statement