What are IO Schedulers in Android

Linux I / O scheduler

There are three different ones in the Linux kernel I / O scheduler available: NOOP, Deadline and CFQ. Each scheduler has its own algorithm for processing requests and transferring them to the device driver for processing. Up to kernel 2.6.33 there were four schedulers, but from this version the AS scheduler is no longer part of the kernel.[1] This article gives one overview via the currently "active" scheduler, whose Algorithms in the Linux kernel as well Graphicsthat visualize the way they work. Part of the article is also how a scheduler can be selected at runtime.

Tasks of an I / O scheduler

Reading and writing data from / to block devices is an expensive matter from a performance point of view. Access to hard disks requires several milliseconds (seek time) due to the positioning of the read / write heads on the magnetic disks. Intelligent processes are therefore required to keep these times as short as possible and to send I / O requests in the best possible order. The I / O scheduler takes on these tasks of sorting and merging several requests in the Linux kernel. With high-performance solid-state drives (SSDs), however, these methods can have performance disadvantages. Therefore, with Linux Kernel 3.13, the new Linux Multi-Queue Block IO Queuing Mechanism (blk-mq) was introduced, which works in parallel with the conventional Linux Block I / O Layer.

Basically, two queues are relevant in connection with I / O scheduling in the conventional Linux block I / O layer:[2]

  1. Request queue: The requests sent by processes are collected in this queue, possibly merged and sorted by the scheduler depending on the scheduling algorithm. The schedulers use different approaches to handle the individual requests and to transfer them to the device driver.[3]
  2. Dispatch Queue: This queue contains the ordered requests that are ready for processing on the device. It is managed by the device driver.

Existing I / O schedulers

The Linux I / O schedulers are sometimes referred to as "Elevator". The dispatch queue is also included under the name Elevator, the name "I / O Scheduler" usually always refers to the schedule-specific part.[4] The two components can also be found in the Linux storage stack diagram.

Graphic representation

  • Way of working NOOP Scheduler

  • Way of working Deadline Scheduler

  • Way of working CFQ Scheduler

NOOP

The NOOP scheduler is a simple scheduler that collects all I / O requests in a FIFO queue. Request merging is carried out in order to enable the request to be sent optimally and to avoid unnecessary seek times. However, the requests are not sorted; the device driver again processes the dispatch queue according to the FIFO principle.[5] The NOOP scheduler has no setting options for optimization ("Tunables").

Deadline

The deadline scheduler tries to prevent the so-called "starvation" of requests. For this purpose, each request is given an expiration time and placed in two different queues (see below). In the worst case, requests should be processed after a defined expiration time, so an attempt is made to guarantee a predictable service start time.[6] Read requests are treated more highly than write requests, so they also have a shorter deadline. This measure is advantageous because read requests are usually sent synchronously (blocking) and write requests asynchronously (non-blocking). The default times for read requests are 0.5 seconds, for write 5 seconds. This parameter can be adjusted using the "read_expire" option.[6] Two queue pairs are used to manage the requests - one queue each for the I / O processes read and write.[7] Each direction has one of the following queue types:

  1. A sorted by sectors (seek ordered) Queue - Sorted queues
  2. A FIFO queue (order due to the deadline) - Deadline queues.

Selection of requests: The individual requests in the queues are always processed in so-called "batches" - several requests are sent together to the dispatch queue of the device driver (standard: 16, adjustable via the "fifo_batch" option)[6]). Neither direction nor queue are changed during a batch. After each batch a new direction is selected - by default "Read", a change is made to "Write" if "Write" has already had to wait for "writes_starved" times (default: 2x).[6] The new batch starts with the requests with the earliest expiry time from the FIFO deadline queue if:[8]

  1. A deadline has already passed.
  2. The data direction was changed.
  3. The last request was already the last in the sorted queue.

Otherwise the next request from the sorted queue will be processed (e.g. if no deadline has expired and the direction has not been changed). This mechanism prevents individual requests from starving to death and ignored for a long time.

CFQ

The "Completely fair queuing" (CFQ) scheduler is the default scheduler of the Linux kernel and has the following goals:[7]

  • Fair distribution of the existing I / O bandwidth to processes with the same priority classes via time slices. This "fairness" relates to the length of the time slots and not to the bandwidth, i.e. a process with sequential writes will achieve a higher bandwidth in the same slot than a process with random writes.
  • The possibility of dividing processes into priority or scheduling classes (e.g. via ionice).
  • Periodic processing of the process queues distributes the latency. The time slots enable a high bandwidth for processes that can send many requests in their slot.

Priority classes: Up to now, CFQ is the only scheduler that enables processes to be divided into classes. The following classes are available (in descending order of priority):[7]

  1. Real-time (RT): RT processes always get access to the device first. RT has 8 priority levels (7 means lowest priority), which determine the priority within the RT processes. Since RT processes could starve others, only root can set this class for a process.
  2. Best effort (BE): RT processes are always handled when there are no RTs. Just like RT, there are 8 priority levels.
  3. Idle: When RT and BE queues are emptied, requests from the idle queue are processed. Under heavy loads, requests can starve, but since kernel 2.6.25 this class has been allowed for normal users (see man ionice (linux.die.net)).

If the class and level are not explicitly set for a process, the "best effort" class and the level are determined using the CPU nice value: io_nice = (cpu_nice + 20) / 5.[9]

Selection of requests: The scheduler manages the incoming requests of a process in several data structures. On the one hand there is an RBTree and on the other hand a FIFO (sorting according to expiration time) is managed.[10] In the RBTree there is no distinction between read and write requests, but in the FIFO queue there is. Here read, write - or synchronous, asynchronous - requests have different expire times: "fifo_expire_async" and "fifo_expire_sync" determine the expiration time.[11] The scheduler always provides the queues of a process with a certain time slot for processing requests. During this time, a maximum number of requests can be dispatched by the process (based on "cfq_quantum"). Particular attention is also paid to the fact when a process no longer has any pending requests. Under these circumstances, a certain amount of time is still to be expected (see Tunables).

Tunables:

  • slice_idle: If a queue of a process has no further requests that could be processed in the current slot, it will still remain in the slot for a certain time (idle time). This has the advantage that unnecessary seek times are avoided with conventional hard disks if there are still new requests in the queue that are adjacent to the hard disk from a spatial point of view.[12] However, idling does not take place if the queue:[13]
  1. Expired is
  2. has the priority "Idle",
  3. it is an async queue,
  4. there is "close cooperator" (working close to the same area of ​​a device).

Selection of an I / O scheduler

Scheduler of selected distributions

Change scheduler

Since Linux Kernel 2.6.10 it is possible to change the current I / O scheduler at runtime.[15] This means, for example, that the default scheduler cannot be changed, but a different one can be selected for a specific device. The following command shows which scheduler is currently active:

: ~ $ cat / sys / block / sda / queue / scheduler noop deadline [cfq]

The active scheduler is the one in brackets. To change the scheduler, a device (in this case sda) only the following can be carried out (root rights are required):

: ~ # echo deadline> / sys / block / sda / queue / scheduler: ~ # cat / sys / block / sda / queue / scheduler noop [deadline] cfq

However, this setting is not persistent and is reset to the default setting when you reboot. In order to permanently change the settings for the I / O scheduler, this must already be done at boot time using a kernel parameter. It must be the entry

elevator =

appended, where is either noop, cf, deadline or as (up to Kernel 2.6.33, see Git Commit Log).

Scheduler for SSDs

The deadline scheduler is well suited for interaction with SSDs, as it does not rely on the excessive merging of requests. In many places, NOOP is touted as the best scheduler for SSDs because it simply forwards the requests to the device. For performance reasons, this is entirely logical, but the fairness of the scheduler times among applications can suffer as a result.

The CFQ scheduler has also received improvements for SSDs:

  • CFQ has some optimizations for SSDs and if it detects a non-rotational media which can support higher queue depth (multiple requests at in flight at a time), then it cuts down on idling of individual queues and all the queues move to sync-noidle tree and only tree idle remains.[12][16]
  • IOPS mode active by default with SSDs (from Kernel 4.2)[17][18][19]

Individual evidence

  1. ↑ Git Commit Message "Remove AS Scheduler" (git.kernel.org)
  2. ↑ An Evaluation of Linux I / O Scheduler Behavior at the Block I / O Layer, Section 2.2, (repository.lib.ncsu.edu)
  3. ↑ A Device Mapper based Encryption Layer for TransCrypt, p. 17 (cse.iitk.ac.in)
  4. ↑ Kernel biodoc.txt (git.kernel.org)
  5. ↑ Red Hat Enterprise Linux 5 IO Tuning Guide (linux.com)
  6. 6,06,16,26,3Kernel documentation deadline-iosched.txt (git.kernel.org)
  7. 7,07,17,2Improving Disk I / O Performance on Linux, Master Thesis, (home.ifi.uio.no)
  8. ↑ Native Command Queuing and Latency, page 21, Master's Thesis (core.dk)
  9. ↑ Kernel documentation ioprio.txt (git.kernel.org)
  10. ↑ Kern Technik - Scheduler (linux-magazin.de)
  11. ↑ Tuning CFQ - What Station is That? (linux.com)
  12. 12,012,1Kernel documentation cfq-iosched.txt (git.kernel.org)
  13. ↑ Source file cfq-iosched.c (git.kernel.org)
  14. ↑ Using the Deadline IO Scheduler (access.redhat.com)
  15. ↑ Kernel documentation switching_sched.txt (git.kernel.org)
  16. ↑ cfq-iosched: Add documentation about idling (git.kernel.org, 08/05/2011)
  17. ↑ Linux 4.2 Will Tweak The CFQ Scheduler For SSDs To Offer Better Performance (www.phoronix.com, 06/06/2015)
  18. ↑ block: Make CFQ default to IOPS mode on SSDs (git.kernel.org, 06.06.2015)
  19. ↑ cfq-iosched: fix the setting of IOPS mode on SSDs (git.kernel.org, 06/10/2015)

Author: Georg Schönberger

Georg Schönberger, DevOps department at XORTEX eBusiness GmbH, graduated from the University of Applied Sciences Upper Austria on the Hagenberg campus with a bachelor's degree in computer and media security and a master's degree in secure information systems. Georg has been with XORTEX since 2015 and works very solution-oriented and is not afraid of difficult tasks. In addition to Linux, his hobbies include tennis, climbing and traveling.