http://www.linuxdevices.com/articles/AT6320079446.html
ELJonline: Real Time and Linux, Part 1
Kevin Dankwardt (January, 2002)[Updated Feb. 4, 2002] What is real time? This article, first of a three-part series, introduces the benchmarks we'll run on real-time Linux version in the next two issues.Linux is well tuned for throughput-limited applications, but it is not well designed for deterministic response, though enhancements to the kernel are available to help or guarantee determinism. So-called real-time applications require, among other things, deterministic response. In this article I examine the nature of real-time applications and Linux's strengths and weaknesses in supporting such applications. In later articles I will examine various approaches to help real-time applications satisfy hard real-time requirements. Most of these issues are with respect to the Linux kernel, but the GNU C library, for example, plays a part in some of this.What Is Real Time?Many definitions are available for the terms relating to real time. In fact, because they have different requirements, different applications demand different definitions. Some applications may be satisfied by some average response time while others may require that every deadline be met.The response time of an application is that time interval from when it receives a stimulus, usually provided via a hardware interrupt, to when the application has produced a result based on that stimulus. These results may be such things as the opening of a valve in an industrial control application, drawing a frame of graphics in a visual simulation or processing a packet of data in a data-acquisition application.Let's consider an opening of a valve scenario. Imagine we have a sensor next to a conveyor belt with parts that need to be painted that move past a paint nozzle. When the part is just in the right position the sensor alerts our system that the valve on the paint nozzle should open to allow paint to be sprayed onto the part. Do we need to have this valve open at just the right time on average, or every time? Every time would be nice.We need to open the valve no later than the latest time that it's not too late to begin painting the part. We also need to close the valve no sooner than when we are finished with painting the part. Thus, it is desirable to not keep the valve open any longer than necessary, since we really don't want to be painting the rest of the conveyor belt.We say that the latest possible time to open the valve and still accomplish the proper painting is the deadline. In this case if we miss the deadline, we won't paint the part properly. Let's say that our deadline is 1ms. That is the time from when the sensor alerts us until the time we must have begun painting. To be sure that we are never late, let's say we design our system to begin painting 950祍 after we receive the sensor interrupt. In some cases we may begin painting a little before, and sometimes a little afterward.Of course, it will never be the case that we start painting exactly 950祍 after the interrupt, with infinite precision. For instance, we may be early by 10祍 one time and late by 13祍 the next. This variance is termed jitter. We can see from our example that if our system were to provide large jitter, we would have to move our target time significantly below 1ms to be assured of satisfying that deadline. This also would mean that we frequently would be opening the valve much sooner than actually required, which would waste paint. Thus, some real-time systems will have requirements in terms of both deadlines and jitter. We assume that our requirements say that any missed deadlines are failures.The term operating environment means the operating system as well as the collection of processes running, the interrupt activity and the activity of hardware devices (like disks). We want to have an operating environment for our real-time application that is so robust we are free to run any number of any kind of applications concurrently with our real-time application and still have it perform acceptably.An operating environment where we can determine the worst-case time for a given response time or requirement is deterministic. Operating environments that don't allow us to determine a worst-case time are called nondeterministic. Real-time applications require a deterministic operating environment, and real-time operating systems are capable of providing a deterministic operating environment.Nondeterminism is frequently caused by algorithms that do not run in constant time; for example, if an operating system's scheduler must traverse its entire run list to be able to decide which process to run next. This algorithm is linear, sometimes notated as O(n), meaning read on the order of n. That is, as n (the number of processes on the run list) grows, the time to decide grows proportionally. With an O(n) algorithm there is no upper bound on the time that the algorithm will take. If your response time depends upon your sleeping process to be awakened and selected to run, and the scheduler is O(n), then you will not be able to determine the worst-case time. This is a property of the Linux scheduler.This is important in an environment where the system designer cannot control the number of processes that users of the system may create, this is important. In an embedded system, where characteristics of the system, such as the user interface, make it impossible for there to be any more than a given number of processes, then the environment has been constrained sufficiently to bound this kind of scheduling delay. This is an example where determinism may be achievable by some aspect of the configuration of the operating environment. Notice that a priority system may be required, as well as other things, but as far as the scheduling time is concerned, the time is bounded.A visual simulation may require an average target framerate, say 60 frames per second. As long as frames are dropped relatively infrequently, and that over a suitable period the framerate is 60 frames a second, the system may be performing acceptably.The example of the paint nozzle and the average framerate are examples of what we call hard real-time and soft real-time constraints, respectively. Hard real-time applications must have their deadlines met, otherwise an unacceptable result occurs. Something blows up, something crashes, some operation fails, someone dies. Soft real-time applications usually must satisfy a deadline, but if a certain number of deadlines are missed by just a little bit, the system may still be considered to be operating acceptably.Let's consider another example. Imagine we are building a penguin robot to aid scientists in studying animal behavior. Through careful observation we determine that upon the emergence of a seal from a hole in the ice, a penguin has 600ms to move away from the hole to avoid being eaten by the seal. Will our robot penguin survive if it moves back, on average, within 600ms? Perhaps, if the variance in the attack time of the seal varies synchronously with our penguin's response time. Are you going to build your penguin with that assumption? We also realize that there is a certain distance from the hole that the seal can reach. Our penguin must move farther than that distance within the 600ms. Some would call that line the deadline.For an operating environment to accommodate a hard real-time application, it must be able to insure that the application's deadlines always can be satisfied. This implies that all actions within the operating system must be deterministic. If the operating environment accommodates a soft real-time application, this generally means that an occasional delay may occur, but such a delay will not be unduly long.Requirements for an application may be quantitative or qualitative. A qualitative requirement for a visual simulation would be that the system needs to react quickly enough to seem natural. This reaction time would be quantified to measure compliance. For example, a frame of graphics based upon user input may have a requirement to be rendered within 33.3ms after the user's input. This means that if a pilot moves the control stick in the flight simulator to bank right, the out-of-the-window view should change to reflect the new flight path within 33.3ms. Where did the 33.3ms requirement come from? Human factors--that amount of time is fast enough so that humans perceive the visual simulation as sufficiently smooth.It is not the value of the time requirement but rather that there is a time requirement that makes this a real-time requirement. If one changed the requirement to have the graphics drawn within 33.3 seconds, instead of 33.3ms, it would still be a real-time system. The difference may be in the means to satisfy the requirements. In a Linux system the 33.3ms may require the use of a special Linux kernel and its functions, whereas the 33.3 second requirement may be achievable by means available within a standard kernel.This leads us to a tenet: fast does not imply real-time and vice versa. However, fast on a relative scale may imply the need for real-time operating system features. This leads us to the distinction between real-time operating systems and real-time applications. Real-time applications have time-related requirements. Real-time operating systems can guarantee performance to real-time applications.In practice, a general-purpose operating system, such as Linux, provides sufficient means for an application with relatively long deadlines if the operating environment can be controlled suitably. It is because of this property that one frequently hears that there is no need for real-time operating systems because processors have become so fast. This is only true for relatively uninteresting projects.One has to remember, though, that if the operating environment is not constrained suitably, deadlines may be missed even though extensive testing never found a case of a missed deadline. Linear-time algorithms, for example, may be lurking in the code.Another issue to keep in mind is audience effect, often stated as "The more important the audience for one's demo, the more likely the demo will fail." While anecdotal evidence abounds for the audience effect, effects such as nonrepeatability are often due to race conditions. A race condition is a situation where the result depends upon the relative speeds of the tasks or the outside world. All real-time systems, by definition, have race conditions. Well-designed systems have race conditions only around required deadlines. Testing alone cannot prove the lack of race conditions.Since an operating system is largely responsive instead of proactive, many activities that cause delay can be avoided. In order for a particular application (process) to be able to meet its deadlines, such things as CPU-bound competitors, disk I/O, system calls or interrupts may need to be controlled. These are the kinds of things that constitute a properly constrained operating environment. Characteristics of the operating system and drivers also may be of concern. The operating system may block interrupts or not allow system calls to be preempted. While these activities may be deterministic, they may cause delays that are longer than acceptable for a given application.A real-time operating system requires simpler efforts to constrain the environment than does a general-purpose operating system.Is Linux Capable of Real Time?Unless we state otherwise, assume we are talking about the 2.4.9 version of the Linux kernel. Version 2.4.9 was released in August 2001, although our statements, at least for the most part, are true for the last several years of kernel releases.There are many qualities of an operating system that may be necessary or desirable for it to be appropriate for real-time applications. One list of features is included in the FAQ at Comp.realtime. That list contains such features as the OS being multithreaded and preemptible, and able to support thread priorities and provide predictable thread-synchronization mechanisms. Linux is certainly multithreaded, supports thread priorities and provides predictable thread-synchronization mechanisms. The Linux kernel is not preemptible.The FAQ also says that one should know the OS behavior for interrupt latency, time for system calls and maximum time that interrupts are masked. Further, one should know the system-interrupt levels and device-driver IRQ (interrupt request line) levels, as well as the maximum time they take. We provide some timings for interrupt latency and interrupt masked ("interrupts blocked") times in the benchmarking section below.Many developers also are interested in having a deadline scheduler, a kernel that is preemptible in the millisecond range or better, more than 100 priority levels, user-space support for interrupt handlers and DMA, priority inheritance on synchronization mechanisms, microsecond timer resolution, the complete set of POSIX 1003.1b functionality and constant time algorithms for scheduling, exit(), etc. None of these capabilities are available in the standard kernel and the GNU C library.Additionally, in practice, the magnitude of delays becomes important. The Linux kernel, in a relatively easily constrained environment, may be capable of worst-case response times of about 50ms, with the average being just a few milliseconds.Andrew Morton suggests that one should not scroll the framebuffer, run hdparm, use blkdev_close or switch consoles (see reference). These are examples of constraining the operating environment.Some applications, however, may require response times on the order of 25祍. Such requirements are not satisfiable in applications that are making use of Linux kernel functionality. In such cases, some mechanism outside of the Linux kernel functions must be employed in order to assure such relatively short response-time deadlines.We see, in practice, that a hard real-time operating system can assure deterministic response and provide response times that are significantly faster than those provided by general-purpose operating systems like Linux.We see that Linux is not a real-time operating system because it always cannot assure deterministic performance and because its average and worst-case timing behavior is far worse than that required by many real-time applications. Remember that the timing behavior required by these many real-time applications generally is not a hardware limitation. For example, the Linux kernel response time may be on the order of a few milliseconds on a typical x86-based PC, while the same hardware may be capable of better than 20祍 response times when running a real-time operating system.The two reasons that the Linux kernel has such relatively poor performance on uniprocessor systems are because the kernel disables interrupts and because the kernel is not suitably preemptible. If interrupts are disabled, the system is not capable of responding to an incoming interrupt. The longer that interrupts are delayed, the longer the expected delay for an application's response to an interrupt. The lack of kernel preemptibility means that the kernel does not preempt itself, such as in a system call for a lower-priority process, in order to switch to a higher-priority process that has just been awakened. This may cause significant delay. On SMP systems, the Linux kernel also employs locks and semaphores that will cause delays.Real-Time Application ProgrammingUser-space real-time applications require services of the Linux kernel. Such services, among other things, provide scheduling, interprocess communication and performance improvement. Let's examine a variety of system calls (the kernel's way of providing services to applications that are of special benefit to real-time application developers). These calls are used to constrain the operating environment.There are 208 system calls in the Linux kernel. System calls usually are called indirectly through library routines. The library routines usually have the same name as the system call, and sometimes library routines map into alternative system calls. For example, on Linux, the signal library routine from the GNU C library, version 2.2.3, maps to the sigaction system call.A real-time application may call nearly all of the set of system calls. The calls that are most interesting to us are exit(2), fork(2), exec(2), kill(2), pipe(2), brk(2), getrususage(2), mmap(2), setitimer(2), ipc(2) (in the form of semget(), shmget() and msgget()), clone(), mlockall(2) and sched_setscheduler(2). Most of these are described well in either Advanced Programming in the UNIX Environment, by W. Richard Stevens, or in POSIX.4: Programming for the Real World, by Bill O. Gallmeister. The clone() function is Linux-specific. The others, for the most part, are compatible with typical UNIX systems. However, read the man pages because there are some subtle differences at times.Real-time applications on Linux also frequently are interested in the POSIX Threads calls, such as pthread_create() and pthread_mutex_lock(). Several implementations of these functions exist for Linux. The most commonly available of these is provided by the GNU C library. These so-called LinuxThreads are based on the clone() system call and are scheduled by the Linux scheduler. Some POSIX functions are available for POSIX Threads (e.g., sem_wait()) but not for Linux processes.An application running on Linux ordinarily can be slowed down considerably, from its best case, by a number of factors. Essentially these are caused by contention for resources. Such resources include synchronization primitives, main memory, the CPU, a bus, the CPU cache and interrupt handling.An application can reduce its resource contention for these resources in a number of ways. For synchronization mechanisms, e.g., mutexes and semaphores, an application can reduce their use, employ priority inheritance versions, employ relatively fast implementations, reduce the time in critical sections, etc. Contention for the CPU is affected by priorities. In this view, for example, lack of kernel preemption can be seen as a priority inversion. Contention for a bus is probably not significantly long to be of direct concern. However, know your hardware. Do you have a clock that takes 70祍 to respond and holds the bus? Contention for a cache is affected by frequent context switches and by large or random data or instruction references.What to Do?Consequently, real-time applications usually give themselves a high priority, lock themselves in memory (and don't grow their memory usage), use lock-free communication whenever possible, use cache memory wisely, avoid nondeterministic I/O (e.g., sockets) and execute within a suitably constrained system. Suitable constraints include limiting hardware interrupts, limiting the number of processes, curtailing system call use by other processes and avoiding kernel problem areas, e.g., don't run hdparm.Some of the system calls that should be made by a real-time application require special privileges. This usually is accomplished by having root be the owner of the process (having a shell owned by root run the program or having the executable file have the SUID bit set). A newer way is to make use of the capability mechanism. There are capabilities for locking down memory, such as CAP_IPC_LOCK (that "IPC" is in the name is just something we need to accept), and for being able to set real-time priorities, which can be done with the capability CAP_SYS_NICE.A real-time process sets its priority with sched_setscheduler(2). The current implementation provides the standard POSIX policies of SCHED_FIFO and SCHED_RR, along with priorities ranging in value from 1-99. Bigger is better. The POSIX function to check the maximum allowable priority value for a given policy is sched_get_priority_max(2).A real-time process should lock down its memory and not grow. Locking memory is done in Linux with the POSIX standard function mlockall(2). Usually one uses the flags value of MCL_CURRENT MCL_FUTURE to lock down current memory and any new memory if one's process grows in the future. While growing often is not acceptable, if you get lucky and survive the delay you might as well get the newly allocated memory locked down as well. Be careful to grow your stack and allocate all dynamic memory, and then call mlockall(2) before your process begins its time-critical phase. Note that you can check to see if your process had any page faults during a section of code by using getrusage(2). I show a code fragment below to illustrate the use of several functions. Note that one should check the return value from each of these calls and read the man pages for more details:
priority = sched_get_priority_max(SCHED_FIFO); sp . sched_priority = priority; sched_setscheduler(getpid(), SCHED_FIFO, &sp); mlockall(MCL_FUTURE MCL_CURRENT); getrusage(RUSAGE_SELF,&ru_before); . . . // R E A L T I M E S E C T I O N getrusage(RUSAGE_SLEF,&ru_after); minorfaults = ru_after.ru_minflt - ru_before.ru_minflt; majorfaults = ru_after.ru_majflt - ru_before.ru_majflt;
Benchmarking for Real-Time ApplicationsThere are a number of efforts to benchmark various aspects of Linux. Real-time application developers are most interested in interrupt latency, timer granularity, context-switch time, system call overhead and kernel preemptibility. Interrupt latency is the time from when a device asserts an interrupt until the time that the appropriate interrupt handler begins executing. This typically is delayed by the handling of other interrupts and by interrupts being disabled. Linux does not implement interrupt priorities. Most interrupts are blocked when Linux is handling an interrupt. This time typically is quite short, however, perhaps a few microseconds.On the other hand, the kernel may block interrupts for a significantly longer time. The intlat program from Andrew Morton allows one to measure interrupt latencies. Similarly, his schedlat shows scheduling latencies.Context-switch time is included in the well-known benchmark harness LMbench, as well as by others (reference 1, reference 2). LMbench also provides information about system calls.In Table 1 we show the results of LMbench. This table shows context-switch times. The benchmark program was run three times, and the lowest value for the context-switch time for each configuration is reported in the table as per the documentation for LMbench. The highest value, however, was no more than about 10% larger than the minimum. The size of the process is reported in kilobytes, the context-switch time is in microseconds. The context-switch time data indicate that substantial use of data in the cache causes significantly larger context-switch times. The context-switch time includes time to restore the cache state.
Table 1. Context-Switch Times
As an example of interrupt-off times, one can see some results here. In one experiment with hdparm, the data show that interrupts can be disabled for over 2ms while hdparm runs. Developers can use the intlat mechanism to measure interrupt-off times for the system they are running. It is only under rare conditions that interrupt-off times will exceed 100祍. These conditions should be avoidable for most embedded systems. They are the areas that Morton warns against.An area of more significant concern to most real-time developers is that of scheduling latency. That is, the delay in continuing a newly awakened high-priority task. A long delay is possible when the kernel is busy executing a system call. This is because the Linux kernel will not preempt a lower priority process in the midst of a system call in order to execute a newly awakened higher priority process. This is why the Linux kernel is termed non-preemptible.The latency test from Benno Senoner shows that a delay of 100ms or more is possible (see reference). We can see that both interrupt blocking and scheduling latencies can be sufficiently long to prevent satisfactory performance for some applications.Timing resolution is also of importance to many embedded Linux developers. For example, the setitimer(2) function is used to set a timer. This function, like other time functions in Linux, has a resolution of 10ms. Thus, if one sets a timer to expire in 15ms, it actually will expire in about 20ms. In a simple test measuring the time interval between 1,000 successive 15ms timers, we found that the average time interval was 19.99ms, the minimum time was 19.987ms and the maximum time was 20.042ms on a quiescent system.
About the author: Kevin Dankwardt is founder and CEO of K Computing, a training and consulting firm in Silicon Valley. In particular, his organization develops and delivers embedded and real-time Linux training worldwide.
Copyright ?2001 Specialized Systems Consultants, Inc. All rights reserved. Embedded Linux Journal Online is a cooperative project of Embedded Linux Journal and LinuxDevices.com.
Be sure to read the full three-part series on Real-time Linux by Kevin Dankwardt . . .
In the January/February 2002 issue of Embedded Linux Journal, we examined the fundamental issues of real time with Linux. In this article we examine efforts to bring real-time capabilities to applications by making improvements to the Linux kernel. To date, the majority of this work has been to make the kernel more responsive--to reduce latency by reducing the preemption latency, which can be quite long in Linux.By improving the kernel, and not changing or adding to the API, applications can run more responsively by merely switching out a standard kernel for the improved one. This is a big benefit. It means that ISVs need not create special versions for different real-time efforts. For example, DVD players may run more reliably on an improved kernel without needing to be aware that the kernel they are running on has been improved.Background and HistoryWith around Linux kernel release 2.2, the issue of kernel preemptibility began to get quite a lot of attention. Paul Barton-Davis and Benno Senoner, for example, wrote a letter (which in addition was signed by many others) to Linus Torvalds, asking that 2.4 please include significantly reduced preemption delays.Their request was based on their desire to have Linux function well with audio, music and MIDI. Senoner produced some benchmarking software that demonstrated that the 2.2 kernel (and later the 2.4 kernel) had worst-case preemption latencies on the order of 100ms (reference). Latencies of this magnitude are unacceptable for audio applications. Conventional wisdom seems to say that latencies on the order of no more than a few milliseconds are required.Two efforts emerged that produced patched kernels that provided quite reasonable preemption latencies. Ingo Molnar (of Red Hat) and Andrew Morton (then of The University of Wollongong) both produced patch sets that provided preemption within particularly long sections in the kernel. You can find Ingo Molnar's patches here, and you can find Andrew Morton's work here.In addition, Morton provides tools for measuring latencies, such as periods where the kernel ignores reschedule requests. His low-latency patches' web page, cited above, provides information on those as well.Recently, at least two organizations have produced preemptible kernels that provide a more fundamental, and powerful, solution to the kernel preemptibility problem.In the first article of this series in the January/February 2002 issue of ELJ, we listed several other desired features for real-time support in Linux, including increased number of priority levels, user-space interrupt handling and DMA, priority inheritance on synchronization mechanisms, microsecond time resolution, complete POSIX 1003.1b functionality and a constant time algorithm for scheduling. We will briefly comment on these as well.A key point to remember with all of these improvements is that they involve patching the kernel. Anytime you patch a kernel you must assume that you no longer have binary compatibility for other kernel code, such as drivers. For example, the preemptible kernel approaches require modifying the code for spin locks. A binary driver won't employ this modification and thus may not prevent preemption properly. This emphasizes the need to have the source and recompile all kernel code. The Linux model for drivers is one of source-compatibility anyway. Distribution of binary-only drivers is discouraged for compatibility as well as for open-source philosophy reasons.ImprovementsVarious efforts that improve the kernel provide essentially transparent benefits. The efforts to improve the preemptibility of the kernel, be they through a preemptible kernel or through preemption points, result in a kernel that is more responsive to applications without any alterations in these applications.Another aspect of transparency is whether the changes are transparent to the kernel, or in other words, do the approaches automatically track with changes in the kernel. The preemption point approaches of Molnar and Morton require that the scheduling latencies in new kernels be measured and preemption points placed in the proper places.In contrast, the approaches to creating a preemptible kernel piggyback on the SMP locking and thus automatically transfer with new kernel versions. Also, by tying the preemptibility to the SMP-locking mechanism, as kernel developers improve the granularity of the SMP locking, the granularity of the preemption will improve automatically as well. We are likely to see steady improvement in SMP-locking granularity because improvement in this is required for improved SMP scaling.It is because of this co-opting of the SMP locks that the preemptible kernel work depends upon a 2.4 or newer kernel. Prior kernels lacked the required SMP locks.Another important benefit of the preemptible kernel approach to emphasize is that the approach makes code, which is otherwise unaware of it, preemptible. For example, driver writers need do nothing special to have their driver preemptible. Code in the driver will be preempted as required unless the driver holds a lock. Thus, as in other parts of the kernel, well-written drivers that are SMP-safe automatically will benefit from a preemptible kernel. On the other hand, drivers that are not SMP-safe may not function correctly with the preemptible kernels.One should be aware, though, that just because one's driver does not request a lock, kernel code calling it may. For example, we found in a simple test with MontaVista's preemptible kernel that the functions read() and write() of a dynamically loaded driver were preempted just fine, while the functions init_module(), open() and close() were not. This means that if a low-priority process does an open() or close(), it may delay its preemption by a newly awoken high-priority process.In practice, developers still should measure the latencies they are seeing. With the preemptible kernel approaches we see that it is still possible that a section of kernel code can hold a lock for a period longer than acceptable for one's application.MontaVista, for example, provides a preemptible kernel, adds a few preemption points in sections where locks are held too long and provides measurement tools so that developers can measure the preemptibility performance with their actual applications and environment.The goal of SMP locks is to ensure safe re-entrance into the kernel. That is, if processes running in parallel require kernel resources, access to these resources is done safely. The smaller the granularity of the locking, the greater the chance that competing processes can continue to execute in parallel. Parallelization is improved as the blocking (because of contention) is reduced.This concept applies to uniprocessors as well, when I/O is considered. If one considers I/O devices as separate processors, then parallelization, or throughput, improves as applications and I/O activities can continue in parallel. Improvements in preemptibility, which imply that high-priority I/O-bound processes wake up more quickly, can thus improve throughput. Thus, somewhat paradoxically, we see that even though we may experience more context swaps and execute more code in critical kernel paths, we may still see greater system throughput.The benefits of a preemptible kernel seem to be so clear that we can expect preemptibility eventually to be a standard feature of the Linux kernel. Preemptible kernels have been shown to reduce latencies to just a few milliseconds for some implementations and to as low as tens of microseconds in others.In a quick survey of embedded Linux vendors, MontaVista and TimeSys provide preemptible kernels, REDSonic has preemption points, LynuxWorks and Red Hat use RTLinux. Lineo uses RTAI. OnCore provides Linux preemptibility both through a Linux system call-compatible API (as does LynuxWorks with LynxOS) and through running a Linux kernel (which effectively becomes preemptible) on top of their preemptible microkernel.Preemption PointsPreemption points are essentially calls to the scheduler to check to see if a higher-priority task is ready and should be run. Molnar and Morton timed paths in the kernel and found sections that were quite long and inserted the schedule check calls. You can readily find these places by examining the patches, or by applying the patches and comparing the before-and-after versions of the affected source files. Preemption patches look like if (current ->need_resched) schedule();.To use Andrew Morton's preemption point kernel patch, download the patch from the URL above and download the appropriate Linux kernel version from kernel.org. Apply the patch and rebuild the kernel as usual. More details can be found here, although the notes are for an old 2.4 kernel. Also, take note that you may need to update your development environment.To use Molnar's patches you do the same thing. Download the patch and create a new kernel. Morton has patches for many 2.4 kernels. Molnar has patches for some 2.2 kernels and some early 2.4 kernels.Preemptible KernelsPreemptible kernels provide for one user process to be preempted in the midst of a system call so that a newly awoken higher-priority process can run. This preemption cannot be done safely at arbitrary places in the kernel code. One section of code where this may not be safe is within a critical section. A critical section is a code sequence that must not be executed by more than one process at the same time. In the Linux kernel these sections are protected by spin locks.MontaVista and TimeSys have taken similar approaches to creating a preemptible kernel. They cleverly alter the spin-lock calls to additionally prevent preemption. In this way, preemption is permitted in other sections. When a higher-priority process awakens, the scheduler will preempt a lower-priority process in a system call if the system call code has not indicated, via the modified spin-lock code, that preemption is not possible.In addition, with a preemptible kernel, breaking locks to allow rescheduling is simpler than with the preemption (low-latency) patches. If the kernel releases a lock and then re-acquires it, when the lock is released preemption will be checked for. There are places in the kernel where a lock is held, say in a loop, where it need not be held the entire time. Perhaps for each iteration it can be released and then re-acquired.MontaVista implements preemption through a counter. When a spin lock is acquired, the counter is incremented. When a high-priority process awakens, the scheduler checks to see whether the preemption counter indicates, by having the value zero, that preemption is allowed. By employing a counter, the mechanism works when locks are nested. With this mechanism, however, any spin-lock-held critical section prevents preemption, even if the lock is for an unrelated resource.TimeSys employs a priority inheritance mutex. With this mechanism, a high-priority process can preempt a low-priority process that holds a mutex for a different resource. In addition, since they employ priority inheritance, low-priority processes holding a mutex cannot indefinitely postpone a higher-priority process waiting on the mutex. This solves the so-called Priority Inversion Problem.One can obtain the preemption patches developed by MontaVista from the SourceForge kpreempt website. MontaVista is conducting this work in a laudable, open-source manner. They also provide their work on a real-time scheduler and high-resolution timers on SourceForge, here and here.The SourceForge kpreempt Project also gives a link to Robert Love's preemptible kernel work [see the April and May 2002 issues of Linux Journal for more information on Love's kernel work]. These are MontaVista's patches and are now maintained by Love, although MontaVista is still involved. The newest patches are available here.A recent release of Love's work was created to work with a recent constant time scheduler patch by Ingo Molnar. Molnar's O(1) scheduler is available as a patch for 2.4 and has been merged into 2.5. TimeSys makes their preemptible kernel available on their website. The preemptible kernel is provided already patched. To obtain the patches, you need to back them out with diff from a 2.4.7 kernel source tree. Their preemptible kernel source is released under the GPL.TimeSys additionally has a number of other valuable capabilities for real-time developers that are not available for free download. These include technology for real-time scheduling and resource allocation. These modules add additional system calls, for example, to provide for conventional access to their enhancements.For those interested in examining the gory details we provide a couple of hints on where to look. The key to the spin-lock mechanism is the include file, spinlock.h, under include/linux. Both MontaVista and TimeSys modify this file.Interestingly, both seem to rename, and continue to use, the old functions. The original spin-lock functions still are required. It is not acceptable, for example, to preempt the kernel while it is in the scheduler. Infinite recursion would ensue. MontaVista uses names like _raw_spin_lock and _raw_read_lock; TimeSys uses names like old_spin_lock and old_spin_lock_irq.By examining the file kernel/include/linux/mutex.h in the TimeSys distribution you can see that spin locks have been defined to use write_lock() and read_lock() functions that implement mutex locks. The file kernel/kernel/mutex.c includes the source to the do_write_lock() function, for example, which implements the mutex locking functionality.Other Real-Time Kernel EffortsAnother popular area for improvement is in the granularity of timing. TimeSys, MontaVista, REDSonic and others have solutions that greatly improve time resolution. For example, TimeSys queries the Pentium Time Stamp Counter on context switches to insure quite accurate CPU time accounting for use of such functions as getrusage().In the opinion of many developers, including this author, Linux's lack of the complete set of POSIX 1003.1b functionality is a significant shortcoming. Luckily, there are solutions. In particular, TimeSys has quite a good implementation.In addition to their POSIX contributions, TimeSys has developed some innovative resource control mechanisms. These allow a real-time application to reserve CPU time or network bandwidth, for example. This, coupled with their interrupt threading model, preemptible kernel and other features, provide two or three orders of magnitude of improvement in terms of latency over a standard Linux kernel.To date, it appears that little has been done to allow a user-space application to register a function to be called as an interrupt handler. This mechanism is called user-space interrupt handling and is available, for example, in IRIX, SGI's UNIX.Interestingly, SGI, in Linux, provides user-space access to the interrupts from a real-time clock in their ds1286 real-time clock interface. This can be obtained here.Related to user-level interrupt handling is user-space DMA to and from devices. There is a patch to provide that functionality, here.GuaranteesApparently, no real-time Linux vendor is willing to make a guarantee for latency. A guarantee, if given, may have the form of something like
With our Linux kernel, and these hardware requirements, and these drivers, etc., we guarantee that your application, if it is locked in memory, has the highest priority...and will be awoken within N microseconds after your real-time device raises a hardware interrupt. If you are not able to achieve the guarantee, then we will treat it as a bug.
Since we see no such guarantees, what can we infer? We can think of several possibilities.Vendors do not see any benefit in making the guarantee. No customers request it. In our opinion, many developers want a guarantee. In fact, hard real time implies a guarantee.Vendors have not sufficiently measured their kernels and environment to be able to give a guarantee. This is a bit tricky. Measuring alone can't prove a guarantee can be satisfied. One must determine that the code is bounded in every circumstance and that all worst-case paths are measured. From the vendors' announcements, it is apparent that many of them have spent quite a lot of effort both measuring and studying the code. In fact, it is likely that many engineers feel rather confident that they could guarantee a certain number given the right environment.Linux is too diverse to allow for any meaningful guarantees. This is likely the heart of the issue. Developers want to be able to modify their kernel. They want to be able to download drivers and use them. Activities such as these are beyond the control of a vendor. If a vendor were to claim a guarantee publicly, it may have to be for a system so constrained as to be useful for only one or a few select situations.Perhaps we'll see some kind of compromise guarantee, something like "100ms or less on Pentium class computers for properly behaving applications" plus the time spent in drivers. The driver caveat is important, for example, because the interrupt handling code is probably in the driver and thus a major part of the latency path.What's Next?In the third article of our series we will discuss real-time functionality available through means outside of a Linux kernel. We will consider approaches such as RTLinux and RTAI. We also will return to benchmarking and comparing the wide variety of options.
About the author: Kevin Dankwardt is founder and CEO of K Computing, a training and consulting firm in Silicon Valley. In particular, his organization develops and delivers embedded and real-time Linux training worldwide.
Copyright ?2002 Specialized Systems Consultants, Inc. All rights reserved. Embedded Linux Journal Online is a cooperative project of Embedded Linux Journal and LinuxDevices.com.
ELJonline: Real Time and Linux, Part 3: Sub-Kernels and Benchmarks
Kevin DankwardtIn the first two articles of this series (see "Real Time and Linux, Part 1" and "Real Time and Linux, Part 2: the Preemptible Kernel"), we examined the fundamental concepts of real time and efforts to make the Linux kernel more responsive. In this article we examine two approaches to real time that involve the introduction of a separate, small, real-time kernel between the hardware and Linux. We also return to benchmarking and compare a desktop/server Linux kernel to modified kernels.We note and discuss no further that LynuxWorks and OnCore Systems provide proprietary kernels that provide some Linux compatibility. LynuxWorks provides a real-time kernel that implements a Linux compatible API. OnCore Systems provides a real-time microkernel that provides Linux functionality in a variety of ways. These allow one to run a Linux kernel, with real-time performance of its processes, on top of their microkernel.In this article we concern ourselves primarily with single-CPU real time. When more than one CPU is used, new solutions to real time are possible. For example, one may avoid system calls on a CPU on which a real-time process is waiting. This avoids the kernel-preemption problem altogether. One may be able to direct interrupts to a particular CPU and away from another particular CPU, thus avoiding interrupt latency issues. All of the Linux real-time solutions, incidentally, are usable on multi-CPU systems. In addition, RTAI for example, has additional functionality for multiple CPUs. We are focused, however, on the needs of embedded Linux developers, and most embedded Linux devices have a single general-purpose CPU.What Is a Real-Time Sub-Kernel?A typical real-time system has a few tasks that must be executed in a deterministic, real-time manner. In addition, it is frequently the case that response to hardware interrupts must be deterministic. A clever idea is to create a small operating system kernel that provides these mechanisms and provides for running a Linux kernel as well, to supply the complete complement of Linux functionality.Thus, these real-time sub-kernels deliver an API for tasking, interrupt handling and communication with Linux processes. Linux is suspended while the sub-kernel's tasks run or while the sub-kernel is dealing with an interrupt. As a consequence, for example, Linux is not allowed to disable interrupts. Also, these sub-kernels are not complete operating systems. They do not have a full complement of device drivers. They don't provide extensive libraries. They are an addition to Linux, not a standalone operating system.There is a natural tendency, however, for these sub-kernels to grow in complexity, from software release to software release, as more and more functionality is incorporated. A major aspect of their virtue, though, is that one may still take advantage of all the benefits of Linux in one's application. It is just that the real-time portion of the application is handled separately by the sub-kernel.Some view this situation as Linux being treated as the lowest priority, or idle, task of the sub-kernel OS. Figure 1 depicts the relationship of the sub-kernel and Linux.
Figure 1. Relationship of the Sub-Kernel and Linux
The sub-kernels are created with Linux by doing three things: 1) patching a Linux kernel to provide a few hooks for things like added functionality, 2) modifying the interrupt handling and 3) creating loadable modules to provide the bulk of the API and functionality.Sub-kernels provide an API for use by the real-time tasks. The APIs they provide resemble POSIX threads, other POSIX functions and additional unique functions. Using the sub-kernels means that the real-time tasks are using APIs that may be familiar to Linux programmers, but they are separate implementations and sometimes differ.Interrupt handling is modified by patching the kernel source tree. The patches change the functions, for example, that are ordinarily used to disable interrupts. Thus, when the kernel and drivers in the Linux sub-tree are recompiled, they will not actually be able to disable interrupts. It is important to note this change because it means, for example, that a driver compiled separately from these modified headers may actually disable interrupts and thwart the real-time technique. Additionally, nonstandard code that, say, simply inlines an interrupt-disabling assembly language instruction will likewise thwart it. Fortunately, in practice, these are not likely situations and certainly can be avoided. They are examples to reinforce the idea that no real-time solution is completely free from caveats.RTLinux and RTAIThe two most commonly used sub-kernels are RTLinux and RTAI. Both RTLinux and RTAI are designed for hard real time. They are much more (and a little less) than just a preemptible kernel. In practical terms, a real-time operating system provides convenience to developers. RTLinux and RTAI provide a wealth of additional, real-time, related functions. RTAI, for example, provides rate-monotonic scheduling and early-deadline-first scheduling, in addition to conventional priority scheduling.The sub-kernels provide both POSIX and proprietary functions, as well as functions to create tasks, disable/enable interrupts and provide synchronization and communication. When using RTLinux or RTAI, a developer uses a new API in addition to their POSIX functions.Both RTLinux and RTAI furnish some support for working with user-space processes. This is important because a real-time application for Linux naturally will want to make use of the functionality of Linux. RTLinux provides support for invoking a signal handler in a user-space process, in addition to FIFOs and shared memory that can be read and written in both kernel and user space. RTAI provides FIFOs, shared memory and a complete, hard real-time mechanism, called LXRT, that can be used in user space.These mechanisms, though, don't make the Linux kernel real time. A user-space process still must avoid system calls because they may block in the kernel. Also, it seems neither RTLinux nor RTAI have been enhanced to work with a preemptible kernel. Since the two approaches are both beneficial, and mostly orthogonal, perhaps they will be combined in the near future. This may be likely since the Love patches are now part of the standard 2.5 kernel tree and perhaps will be a part of the stable, 2.6 kernel whenever it is released.Some Thoughts on the ChoicesFor a developer requiring real-time enhancements, choosing among RTLinux, RTAI, the Love preemptible kernel and the TimeSys preemptible kernel, there are a myriad of issues. Let's highlight a few that many developers value.
Which are maintained in an open-source manner where independent outsiders have contributed? RTAI and Love.
Which have a software patent for their underlying technique? RTLinux.
Which are part of the 2.5 Linux kernel tree? Love.
Which have additional real-time capabilities besides preemptibility? TimeSys, RTAI and RTLinux.
Which are positioned such that one can reasonably assume that the solution will continue to be freely available for free download? RTAI and Love (in my humble opinion).
Which give control over interrupts and are likely to provide near-machine-level resolution responsiveness? RTLinux and RTAI.
Kernel Availability for Different ProcessorsNone of these real-time approaches are available for every CPU on which Linux runs; extra effort is required to adapt the solution to a new processor. But, as the four solutions we examine here have quite active development, it is safe to assume that support for additional CPUs is at least contemplated.As a snapshot, the Love preemptible kernel is currently only available for x86, but with MontaVista's support it is likely to be ported to most, if not all, of the CPUs that MontaVista supports. That includes PowerPC, ARM, MIPS, SuperH, etc. The TimeSys kernel is currently available for PowerPC, ARM, SuperH and Pentium. RTLinux is available for x86 and PowerPC. RTAI is available for x86 and PowerPC.BenchmarksYou may download the benchmark programs from the Web (see the Resources Sidebar under K Computing Benchmarks). All of our benchmarks were run on a 465MHz Celeron. Other x86 CPUs, however, have produced similar results. We have not benchmarked on other kinds of CPUs.We benchmarked the Red Hat 7.2 kernel, which is based on Linux kernel 2.4.7; the TimeSys Linux 3.0 kernel, which is based on Linux 2.4.7; and a kernel patched with Robert Love and MontaVista's preemption patch for Linux kernel 2.4.18. We will refer to these kernels as Red Hat, TimeSys and Love, respectively. We separately benchmarked the RTAI and RTLinux kernels.The benchmark consisted of timing the precision of a call to nanosleep(). Sleeping for a precise amount of time closely relates to the kernel's ability to serve user-space, real-time processes reliably. The Linux nanosleep() function allows one to request sleeps in nanosecond units. Our benchmark requests a 50 millisecond sleep. Interestingly, a request to nanosleep() to sleep N milliseconds reliably sleeps 10 + N milliseconds. Thus, we measure jitter with respect to how close the sleep was to 60 milliseconds. Also, one should note that nanosleep() is a busy wait in the kernel when the request to sleep is for an amount of two milliseconds or less. Therefore, a busy wait would not simulate interrupt response time as well as a true sleep.The benchmark program takes 1,000 samples. The last 998 are used in the graph. The first two are discarded to avoid cache slowdowns as a result of a cold cache. The benchmark program was locked into memory via mlockall() and given the highest FIFO priority via sched_set_scheduler() and sched_get_priority_max().The heart of our benchmark is:
t1 = get_cycles(); nanosleep(fifty_ms, NULL); t2 = get_cycles(); jitter[i] = t2 - t1;
The get_cycles() function is a machine-independent way to read the CPU's cycle counter. On x86 machines it reads the timestamp counter (TSC). The TSC increments at the rate of the CPU. Thus, on a 500MHz CPU, the TSC increments 500,000,000 times per second. The frequency of the CPU is determined by examining the CPU speed value listed in /proc/cpuinfo. The read of the TSC takes on the order of about ten instruction times and is extremely precise in comparison to the interval we are timing.The difference, in milliseconds, from our expected sleep time of 50 + 10 milliseconds for a given value of jitter, is calculated as
diff = (jitter/KHz) - 10 - 50;
The five benchmarks used the stress tests of Benno Senoner, which are part of his latency test benchmark. These tests stress the system by copying a disk file, reading a disk file, writing a disk file, reading the /proc filesystem and performing the X11perf test. The graphs of the three kernels for these loads are shown in Figures 2-6.
Figure 2. Copying a Disk File
Figure 3. Reading a Disk File
Figure 4. Writing a Disk File
Figure 5. Reading the /proc Filesystem
Figure 6. Performing the X11perf Test
Since the Red Hat kernel is clearly much less responsive than the Love or TimeSys kernels, we separately graph just the Love and TimeSys kernel results. These are depicted in Figures 7-11.
Figure 7. Love and TimeSys Kernels: Copying a Disk File
Figure 8. Love and TimeSys Kernels: Reading a Disk File
Figure 9. Love and TimeSys Kernels: Writing a Disk File
Figure 10. Love and TimeSys Kernels: Reading the /proc Filesystem
Figure 11. Love and TimeSys Kernels: Performing the X11perf Test
It is apparent from the graphs that the preemptible kernels provide significant improvement in responsiveness. Because they represent much improved performance, without a change in the API required to be used by an application, they are clearly attractive choices for embedded Linux developers.RTLinux and RTAI BenchmarksOne justifiably expects that RTAI and RTLinux will provide rock-solid performance even under great loads. They meet these expectations, evidenced through our benchmarks. One must remember, though, that there are still a few caveats. Some issues to keep in mind that can thwart real-time performance: perform no blocking operations such as memory allocation; don't use any drivers that haven't been patched to avoid truly disabling interrupts, and avoid costly priority inversions.To benchmark RTAI and RTLinux we created a periodic task and measured its timing performance against the requested periodic rate. The worst-case performance for both RTLinux and RTAI is on the order of 30 microseconds or less. Our benchmark programs are available for free download (see the Resources Sidebar under K Computing Benchmarks).
Resources
Talk back! Do you have comments or questions on this story? talkback here
About the author: Kevin Dankwardt is founder and CEO of K Computing, a training and consulting firm in Silicon Valley. In particular, his organization develops and delivers embedded and real-time Linux training worldwide.
Copyright ?2002 Specialized Systems Consultants, Inc. All rights reserved. Embedded Linux Journal Online is a cooperative project of Embedded Linux Journal and LinuxDevices.com. Be sure to read the full three-part series on Real-time Linux by Kevin Dankwardt . . .
The Way of the great learning involves manifesting virtue, renovating the people, and abiding by the highest good.
没有评论:
发表评论