The Way of the great learning involves manifesting virtue, renovating the people, and abiding by the highest good.

2009年1月1日星期四

Real Time and Linux

http://blog.csdn.net/cyp207/archive/2007/08/01/1720896.aspx

Kevin Dankwardt (January, 2002)

[Updated Feb. 4, 2002] What is real time? This article, first of a three-part series, introduces the benchmarks we'll run on real-time Linux version in the next two issues.

Linux is well tuned for throughput-limited applications, but it is not well designed for deterministic response, though enhancements to the kernel are available to help or guarantee determinism. So-called real-time applications require, among other things, deterministic response. In this article I examine the nature of real-time applications and Linux's strengths and weaknesses in supporting such applications. In later articles I will examine various approaches to help real-time applications satisfy hard real-time requirements. Most of these issues are with respect to the Linux kernel, but the GNU C library, for example, plays a part in some of this.

What Is Real Time?

Many definitions are available for the terms relating to real time. In fact, because they have different requirements, different applications demand different definitions. Some applications may be satisfied by some average response time while others may require that every deadline be met.

The response time of an application is that time interval from when it receives a stimulus, usually provided via a hardware interrupt, to when the application has produced a result based on that stimulus. These results may be such things as the opening of a valve in an industrial control application, drawing a frame of graphics in a visual simulation or processing a packet of data in a data-acquisition application.

Let's consider an opening of a valve scenario. Imagine we have a sensor next to a conveyor belt with parts that need to be painted that move past a paint nozzle. When the part is just in the right position the sensor alerts our system that the valve on the paint nozzle should open to allow paint to be sprayed onto the part. Do we need to have this valve open at just the right time on average, or every time? Every time would be nice.

We need to open the valve no later than the latest time that it's not too late to begin painting the part. We also need to close the valve no sooner than when we are finished with painting the part. Thus, it is desirable to not keep the valve open any longer than necessary, since we really don't want to be painting the rest of the conveyor belt.

We say that the latest possible time to open the valve and still accomplish the proper painting is the deadline. In this case if we miss the deadline, we won't paint the part properly. Let's say that our deadline is 1ms. That is the time from when the sensor alerts us until the time we must have begun painting. To be sure that we are never late, let's say we design our system to begin painting 950µs after we receive the sensor interrupt. In some cases we may begin painting a little before, and sometimes a little afterward.

Of course, it will never be the case that we start painting exactly 950µs after the interrupt, with infinite precision. For instance, we may be early by 10µs one time and late by 13µs the next. This variance is termed jitter. We can see from our example that if our system were to provide large jitter, we would have to move our target time significantly below 1ms to be assured of satisfying that deadline. This also would mean that we frequently would be opening the valve much sooner than actually required, which would waste paint. Thus, some real-time systems will have requirements in terms of both deadlines and jitter. We assume that our requirements say that any missed deadlines are failures.

The term operating environment means the operating system as well as the collection of processes running, the interrupt activity and the activity of hardware devices (like disks). We want to have an operating environment for our real-time application that is so robust we are free to run any number of any kind of applications concurrently with our real-time application and still have it perform acceptably.

An operating environment where we can determine the worst-case time for a given response time or requirement is deterministic. Operating environments that don't allow us to determine a worst-case time are called nondeterministic. Real-time applications require a deterministic operating environment, and real-time operating systems are capable of providing a deterministic operating environment.

Nondeterminism is frequently caused by algorithms that do not run in constant time; for example, if an operating system's scheduler must traverse its entire run list to be able to decide which process to run next. This algorithm is linear, sometimes notated as O(n), meaning read on the order of n. That is, as n (the number of processes on the run list) grows, the time to decide grows proportionally. With an O(n) algorithm there is no upper bound on the time that the algorithm will take. If your response time depends upon your sleeping process to be awakened and selected to run, and the scheduler is O(n), then you will not be able to determine the worst-case time. This is a property of the Linux scheduler.

This is important in an environment where the system designer cannot control the number of processes that users of the system may create, this is important. In an embedded system, where characteristics of the system, such as the user interface, make it impossible for there to be any more than a given number of processes, then the environment has been constrained sufficiently to bound this kind of scheduling delay. This is an example where determinism may be achievable by some aspect of the configuration of the operating environment. Notice that a priority system may be required, as well as other things, but as far as the scheduling time is concerned, the time is bounded.

A visual simulation may require an average target framerate, say 60 frames per second. As long as frames are dropped relatively infrequently, and that over a suitable period the framerate is 60 frames a second, the system may be performing acceptably.

The example of the paint nozzle and the average framerate are examples of what we call hard real-time and soft real-time constraints, respectively. Hard real-time applications must have their deadlines met, otherwise an unacceptable result occurs. Something blows up, something crashes, some operation fails, someone dies. Soft real-time applications usually must satisfy a deadline, but if a certain number of deadlines are missed by just a little bit, the system may still be considered to be operating acceptably.

Let's consider another example. Imagine we are building a penguin robot to aid scientists in studying animal behavior. Through careful observation we determine that upon the emergence of a seal from a hole in the ice, a penguin has 600ms to move away from the hole to avoid being eaten by the seal. Will our robot penguin survive if it moves back, on average, within 600ms? Perhaps, if the variance in the attack time of the seal varies synchronously with our penguin's response time. Are you going to build your penguin with that assumption? We also realize that there is a certain distance from the hole that the seal can reach. Our penguin must move farther than that distance within the 600ms. Some would call that line the deadline.

For an operating environment to accommodate a hard real-time application, it must be able to insure that the application's deadlines always can be satisfied. This implies that all actions within the operating system must be deterministic. If the operating environment accommodates a soft real-time application, this generally means that an occasional delay may occur, but such a delay will not be unduly long.

Requirements for an application may be quantitative or qualitative. A qualitative requirement for a visual simulation would be that the system needs to react quickly enough to seem natural. This reaction time would be quantified to measure compliance. For example, a frame of graphics based upon user input may have a requirement to be rendered within 33.3ms after the user's input. This means that if a pilot moves the control stick in the flight simulator to bank right, the out-of-the-window view should change to reflect the new flight path within 33.3ms. Where did the 33.3ms requirement come from? Human factors--that amount of time is fast enough so that humans perceive the visual simulation as sufficiently smooth.

It is not the value of the time requirement but rather that there is a time requirement that makes this a real-time requirement. If one changed the requirement to have the graphics drawn within 33.3 seconds, instead of 33.3ms, it would still be a real-time system. The difference may be in the means to satisfy the requirements. In a Linux system the 33.3ms may require the use of a special Linux kernel and its functions, whereas the 33.3 second requirement may be achievable by means available within a standard kernel.

This leads us to a tenet: fast does not imply real-time and vice versa. However, fast on a relative scale may imply the need for real-time operating system features. This leads us to the distinction between real-time operating systems and real-time applications. Real-time applications have time-related requirements. Real-time operating systems can guarantee performance to real-time applications.

In practice, a general-purpose operating system, such as Linux, provides sufficient means for an application with relatively long deadlines if the operating environment can be controlled suitably. It is because of this property that one frequently hears that there is no need for real-time operating systems because processors have become so fast. This is only true for relatively uninteresting projects.

One has to remember, though, that if the operating environment is not constrained suitably, deadlines may be missed even though extensive testing never found a case of a missed deadline. Linear-time algorithms, for example, may be lurking in the code.

Another issue to keep in mind is audience effect, often stated as "The more important the audience for one's demo, the more likely the demo will fail." While anecdotal evidence abounds for the audience effect, effects such as nonrepeatability are often due to race conditions. A race condition is a situation where the result depends upon the relative speeds of the tasks or the outside world. All real-time systems, by definition, have race conditions. Well-designed systems have race conditions only around required deadlines. Testing alone cannot prove the lack of race conditions.

Since an operating system is largely responsive instead of proactive, many activities that cause delay can be avoided. In order for a particular application (process) to be able to meet its deadlines, such things as CPU-bound competitors, disk I/O, system calls or interrupts may need to be controlled. These are the kinds of things that constitute a properly constrained operating environment. Characteristics of the operating system and drivers also may be of concern. The operating system may block interrupts or not allow system calls to be preempted. While these activities may be deterministic, they may cause delays that are longer than acceptable for a given application.

A real-time operating system requires simpler efforts to constrain the environment than does a general-purpose operating system.

Is Linux Capable of Real Time?

Unless we state otherwise, assume we are talking about the 2.4.9 version of the Linux kernel. Version 2.4.9 was released in August 2001, although our statements, at least for the most part, are true for the last several years of kernel releases.

There are many qualities of an operating system that may be necessary or desirable for it to be appropriate for real-time applications. One list of features is included in the FAQ at Comp.realtime. That list contains such features as the OS being multithreaded and preemptible, and able to support thread priorities and provide predictable thread-synchronization mechanisms. Linux is certainly multithreaded, supports thread priorities and provides predictable thread-synchronization mechanisms. The Linux kernel is not preemptible.

The FAQ also says that one should know the OS behavior for interrupt latency, time for system calls and maximum time that interrupts are masked. Further, one should know the system-interrupt levels and device-driver IRQ (interrupt request line) levels, as well as the maximum time they take. We provide some timings for interrupt latency and interrupt masked ("interrupts blocked") times in the benchmarking section below.

Many developers also are interested in having a deadline scheduler, a kernel that is preemptible in the millisecond range or better, more than 100 priority levels, user-space support for interrupt handlers and DMA, priority inheritance on synchronization mechanisms, microsecond timer resolution, the complete set of POSIX 1003.1b functionality and constant time algorithms for scheduling, exit(), etc. None of these capabilities are available in the standard kernel and the GNU C library.

Additionally, in practice, the magnitude of delays becomes important. The Linux kernel, in a relatively easily constrained environment, may be capable of worst-case response times of about 50ms, with the average being just a few milliseconds.

Andrew Morton suggests that one should not scroll the framebuffer, run hdparm, use blkdev_close or switch consoles (see reference). These are examples of constraining the operating environment.

Some applications, however, may require response times on the order of 25µs. Such requirements are not satisfiable in applications that are making use of Linux kernel functionality. In such cases, some mechanism outside of the Linux kernel functions must be employed in order to assure such relatively short response-time deadlines.

We see, in practice, that a hard real-time operating system can assure deterministic response and provide response times that are significantly faster than those provided by general-purpose operating systems like Linux.

We see that Linux is not a real-time operating system because it always cannot assure deterministic performance and because its average and worst-case timing behavior is far worse than that required by many real-time applications. Remember that the timing behavior required by these many real-time applications generally is not a hardware limitation. For example, the Linux kernel response time may be on the order of a few milliseconds on a typical x86-based PC, while the same hardware may be capable of better than 20µs response times when running a real-time operating system.

The two reasons that the Linux kernel has such relatively poor performance on uniprocessor systems are because the kernel disables interrupts and because the kernel is not suitably preemptible. If interrupts are disabled, the system is not capable of responding to an incoming interrupt. The longer that interrupts are delayed, the longer the expected delay for an application's response to an interrupt. The lack of kernel preemptibility means that the kernel does not preempt itself, such as in a system call for a lower-priority process, in order to switch to a higher-priority process that has just been awakened. This may cause significant delay. On SMP systems, the Linux kernel also employs locks and semaphores that will cause delays.

Real-Time Application Programming

User-space real-time applications require services of the Linux kernel. Such services, among other things, provide scheduling, interprocess communication and performance improvement. Let's examine a variety of system calls (the kernel's way of providing services to applications that are of special benefit to real-time application developers). These calls are used to constrain the operating environment.

There are 208 system calls in the Linux kernel. System calls usually are called indirectly through library routines. The library routines usually have the same name as the system call, and sometimes library routines map into alternative system calls. For example, on Linux, the signal library routine from the GNU C library, version 2.2.3, maps to the sigaction system call.

A real-time application may call nearly all of the set of system calls. The calls that are most interesting to us are exit(2), fork(2), exec(2), kill(2), pipe(2), brk(2), getrususage(2), mmap(2), setitimer(2), ipc(2) (in the form of semget(), shmget() and msgget()), clone(), mlockall(2) and sched_setscheduler(2). Most of these are described well in either Advanced Programming in the UNIX Environment, by W. Richard Stevens, or in POSIX.4: Programming for the Real World, by Bill O. Gallmeister. The clone() function is Linux-specific. The others, for the most part, are compatible with typical UNIX systems. However, read the man pages because there are some subtle differences at times.

Real-time applications on Linux also frequently are interested in the POSIX Threads calls, such as pthread_create() and pthread_mutex_lock(). Several implementations of these functions exist for Linux. The most commonly available of these is provided by the GNU C library. These so-called LinuxThreads are based on the clone() system call and are scheduled by the Linux scheduler. Some POSIX functions are available for POSIX Threads (e.g., sem_wait()) but not for Linux processes.

An application running on Linux ordinarily can be slowed down considerably, from its best case, by a number of factors. Essentially these are caused by contention for resources. Such resources include synchronization primitives, main memory, the CPU, a bus, the CPU cache and interrupt handling.

An application can reduce its resource contention for these resources in a number of ways. For synchronization mechanisms, e.g., mutexes and semaphores, an application can reduce their use, employ priority inheritance versions, employ relatively fast implementations, reduce the time in critical sections, etc. Contention for the CPU is affected by priorities. In this view, for example, lack of kernel preemption can be seen as a priority inversion. Contention for a bus is probably not significantly long to be of direct concern. However, know your hardware. Do you have a clock that takes 70µs to respond and holds the bus? Contention for a cache is affected by frequent context switches and by large or random data or instruction references.

What to Do?

Consequently, real-time applications usually give themselves a high priority, lock themselves in memory (and don't grow their memory usage), use lock-free communication whenever possible, use cache memory wisely, avoid nondeterministic I/O (e.g., sockets) and execute within a suitably constrained system. Suitable constraints include limiting hardware interrupts, limiting the number of processes, curtailing system call use by other processes and avoiding kernel problem areas, e.g., don't run hdparm.

Some of the system calls that should be made by a real-time application require special privileges. This usually is accomplished by having root be the owner of the process (having a shell owned by root run the program or having the executable file have the SUID bit set). A newer way is to make use of the capability mechanism. There are capabilities for locking down memory, such as CAP_IPC_LOCK (that "IPC" is in the name is just something we need to accept), and for being able to set real-time priorities, which can be done with the capability CAP_SYS_NICE.

A real-time process sets its priority with sched_setscheduler(2). The current implementation provides the standard POSIX policies of SCHED_FIFO and SCHED_RR, along with priorities ranging in value from 1-99. Bigger is better. The POSIX function to check the maximum allowable priority value for a given policy is sched_get_priority_max(2).

A real-time process should lock down its memory and not grow. Locking memory is done in Linux with the POSIX standard function mlockall(2). Usually one uses the flags value of MCL_CURRENT | MCL_FUTURE to lock down current memory and any new memory if one's process grows in the future. While growing often is not acceptable, if you get lucky and survive the delay you might as well get the newly allocated memory locked down as well. Be careful to grow your stack and allocate all dynamic memory, and then call mlockall(2) before your process begins its time-critical phase. Note that you can check to see if your process had any page faults during a section of code by using getrusage(2). I show a code fragment below to illustrate the use of several functions. Note that one should check the return value from each of these calls and read the man pages for more details:


priority = sched_get_priority_max(SCHED_FIFO);
sp . sched_priority = priority;
sched_setscheduler(getpid(), SCHED_FIFO, &sp);
mlockall(MCL_FUTURE | MCL_CURRENT);
getrusage(RUSAGE_SELF,&ru_before);
. . . // R E A L T I M E S E C T I O N
getrusage(RUSAGE_SLEF,&ru_after);
minorfaults = ru_after.ru_minflt - ru_before.ru_minflt;
majorfaults = ru_after.ru_majflt - ru_before.ru_majflt;

Benchmarking for Real-Time Applications

There are a number of efforts to benchmark various aspects of Linux. Real-time application developers are most interested in interrupt latency, timer granularity, context-switch time, system call overhead and kernel preemptibility. Interrupt latency is the time from when a device asserts an interrupt until the time that the appropriate interrupt handler begins executing. This typically is delayed by the handling of other interrupts and by interrupts being disabled. Linux does not implement interrupt priorities. Most interrupts are blocked when Linux is handling an interrupt. This time typically is quite short, however, perhaps a few microseconds.

On the other hand, the kernel may block interrupts for a significantly longer time. The intlat program from Andrew Morton allows one to measure interrupt latencies. Similarly, his schedlat shows scheduling latencies.

Context-switch time is included in the well-known benchmark harness LMbench, as well as by others (reference 1,reference 2). LMbench also provides information about system calls.

In Table 1 we show the results of LMbench. This table shows context-switch times. The benchmark program was run three times, and the lowest value for the context-switch time for each configuration is reported in the table as per the documentation for LMbench. The highest value, however, was no more than about 10% larger than the minimum. The size of the process is reported in kilobytes, the context-switch time is in microseconds. The context-switch time data indicate that substantial use of data in the cache causes significantly larger context-switch times. The context-switch time includes time to restore the cache state.

Table 1. Context-Switch Times


As an example of interrupt-off times, one can see some results here. In one experiment with hdparm, the data show that interrupts can be disabled for over 2ms while hdparm runs. Developers can use the intlat mechanism to measure interrupt-off times for the system they are running. It is only under rare conditions that interrupt-off times will exceed 100µs. These conditions should be avoidable for most embedded systems. They are the areas that Morton warns against.

An area of more significant concern to most real-time developers is that of scheduling latency. That is, the delay in continuing a newly awakened high-priority task. A long delay is possible when the kernel is busy executing a system call. This is because the Linux kernel will not preempt a lower priority process in the midst of a system call in order to execute a newly awakened higher priority process. This is why the Linux kernel is termed non-preemptible.

The latency test from Benno Senoner shows that a delay of 100ms or more is possible (see reference). We can see that both interrupt blocking and scheduling latencies can be sufficiently long to prevent satisfactory performance for some applications.

Timing resolution is also of importance to many embedded Linux developers. For example, the setitimer(2) function is used to set a timer. This function, like other time functions in Linux, has a resolution of 10ms. Thus, if one sets a timer to expire in 15ms, it actually will expire in about 20ms. In a simple test measuring the time interval between 1,000 successive 15ms timers, we found that the average time interval was 19.99ms, the minimum time was 19.987ms and the maximum time was 20.042ms on a quiescent system.
In the January/February 2002 issue of Embedded Linux Journal, we examined the fundamental issues of real time with Linux. In this article we examine efforts to bring real-time capabilities to applications by making improvements to the Linux kernel. To date, the majority of this work has been to make the kernel more responsive--to reduce latency by reducing the preemption latency, which can be quite long in Linux.

By improving the kernel, and not changing or adding to the API, applications can run more responsively by merely switching out a standard kernel for the improved one. This is a big benefit. It means that ISVs need not create special versions for different real-time efforts. For example, DVD players may run more reliably on an improved kernel without needing to be aware that the kernel they are running on has been improved.

Background and History

With around Linux kernel release 2.2, the issue of kernel preemptibility began to get quite a lot of attention. Paul Barton-Davis and Benno Senoner, for example, wrote a letter (which in addition was signed by many others) to Linus Torvalds, asking that 2.4 please include significantly reduced preemption delays.

Their request was based on their desire to have Linux function well with audio, music and MIDI. Senoner produced some benchmarking software that demonstrated that the 2.2 kernel (and later the 2.4 kernel) had worst-case preemption latencies on the order of 100ms (reference). Latencies of this magnitude are unacceptable for audio applications. Conventional wisdom seems to say that latencies on the order of no more than a few milliseconds are required.

Two efforts emerged that produced patched kernels that provided quite reasonable preemption latencies. Ingo Molnar (of Red Hat) and Andrew Morton (then of The University of Wollongong) both produced patch sets that provided preemption within particularly long sections in the kernel. You can find Ingo Molnar's patches here, and you can find Andrew Morton's work here.

In addition, Morton provides tools for measuring latencies, such as periods where the kernel ignores reschedule requests. His low-latency patches' web page, cited above, provides information on those as well.

Recently, at least two organizations have produced preemptible kernels that provide a more fundamental, and powerful, solution to the kernel preemptibility problem.

In the first article of this series in the January/February 2002 issue of ELJ, we listed several other desired features for real-time support in Linux, including increased number of priority levels, user-space interrupt handling and DMA, priority inheritance on synchronization mechanisms, microsecond time resolution, complete POSIX 1003.1b functionality and a constant time algorithm for scheduling. We will briefly comment on these as well.

A key point to remember with all of these improvements is that they involve patching the kernel. Anytime you patch a kernel you must assume that you no longer have binary compatibility for other kernel code, such as drivers. For example, the preemptible kernel approaches require modifying the code for spin locks. A binary driver won't employ this modification and thus may not prevent preemption properly. This emphasizes the need to have the source and recompile all kernel code. The Linux model for drivers is one of source-compatibility anyway. Distribution of binary-only drivers is discouraged for compatibility as well as for open-source philosophy reasons.

Improvements

Various efforts that improve the kernel provide essentially transparent benefits. The efforts to improve the preemptibility of the kernel, be they through a preemptible kernel or through preemption points, result in a kernel that is more responsive to applications without any alterations in these applications.

Another aspect of transparency is whether the changes are transparent to the kernel, or in other words, do the approaches automatically track with changes in the kernel. The preemption point approaches of Molnar and Morton require that the scheduling latencies in new kernels be measured and preemption points placed in the proper places.

In contrast, the approaches to creating a preemptible kernel piggyback on the SMP locking and thus automatically transfer with new kernel versions. Also, by tying the preemptibility to the SMP-locking mechanism, as kernel developers improve the granularity of the SMP locking, the granularity of the preemption will improve automatically as well. We are likely to see steady improvement in SMP-locking granularity because improvement in this is required for improved SMP scaling.

It is because of this co-opting of the SMP locks that the preemptible kernel work depends upon a 2.4 or newer kernel. Prior kernels lacked the required SMP locks.

Another important benefit of the preemptible kernel approach to emphasize is that the approach makes code, which is otherwise unaware of it, preemptible. For example, driver writers need do nothing special to have their driver preemptible. Code in the driver will be preempted as required unless the driver holds a lock. Thus, as in other parts of the kernel, well-written drivers that are SMP-safe automatically will benefit from a preemptible kernel. On the other hand, drivers that are not SMP-safe may not function correctly with the preemptible kernels.

One should be aware, though, that just because one's driver does not request a lock, kernel code calling it may. For example, we found in a simple test with MontaVista's preemptible kernel that the functions read() and write() of a dynamically loaded driver were preempted just fine, while the functions init_module(), open() and close() were not. This means that if a low-priority process does an open() or close(), it may delay its preemption by a newly awoken high-priority process.

In practice, developers still should measure the latencies they are seeing. With the preemptible kernel approaches we see that it is still possible that a section of kernel code can hold a lock for a period longer than acceptable for one's application.

MontaVista, for example, provides a preemptible kernel, adds a few preemption points in sections where locks are held too long and provides measurement tools so that developers can measure the preemptibility performance with their actual applications and environment.

The goal of SMP locks is to ensure safe re-entrance into the kernel. That is, if processes running in parallel require kernel resources, access to these resources is done safely. The smaller the granularity of the locking, the greater the chance that competing processes can continue to execute in parallel. Parallelization is improved as the blocking (because of contention) is reduced.

This concept applies to uniprocessors as well, when I/O is considered. If one considers I/O devices as separate processors, then parallelization, or throughput, improves as applications and I/O activities can continue in parallel. Improvements in preemptibility, which imply that high-priority I/O-bound processes wake up more quickly, can thus improve throughput. Thus, somewhat paradoxically, we see that even though we may experience more context swaps and execute more code in critical kernel paths, we may still see greater system throughput.

The benefits of a preemptible kernel seem to be so clear that we can expect preemptibility eventually to be a standard feature of the Linux kernel. Preemptible kernels have been shown to reduce latencies to just a few milliseconds for some implementations and to as low as tens of microseconds in others.

In a quick survey of embedded Linux vendors, MontaVista and TimeSys provide preemptible kernels, REDSonic has preemption points, LynuxWorks and Red Hat use RTLinux. Lineo uses RTAI. OnCore provides Linux preemptibility both through a Linux system call-compatible API (as does LynuxWorks with LynxOS) and through running a Linux kernel (which effectively becomes preemptible) on top of their preemptible microkernel.

Preemption Points

Preemption points are essentially calls to the scheduler to check to see if a higher-priority task is ready and should be run. Molnar and Morton timed paths in the kernel and found sections that were quite long and inserted the schedule check calls. You can readily find these places by examining the patches, or by applying the patches and comparing the before-and-after versions of the affected source files. Preemption patches look like if (current ->need_resched) schedule();.

To use Andrew Morton's preemption point kernel patch, download the patch from the URL above and download the appropriate Linux kernel version from kernel.org. Apply the patch and rebuild the kernel as usual. More details can be found here, although the notes are for an old 2.4 kernel. Also, take note that you may need to update your development environment.

To use Molnar's patches you do the same thing. Download the patch and create a new kernel. Morton has patches for many 2.4 kernels. Molnar has patches for some 2.2 kernels and some early 2.4 kernels.

Preemptible Kernels

Preemptible kernels provide for one user process to be preempted in the midst of a system call so that a newly awoken higher-priority process can run. This preemption cannot be done safely at arbitrary places in the kernel code. One section of code where this may not be safe is within a critical section. A critical section is a code sequence that must not be executed by more than one process at the same time. In the Linux kernel these sections are protected by spin locks.

MontaVista and TimeSys have taken similar approaches to creating a preemptible kernel. They cleverly alter the spin-lock calls to additionally prevent preemption. In this way, preemption is permitted in other sections. When a higher-priority process awakens, the scheduler will preempt a lower-priority process in a system call if the system call code has not indicated, via the modified spin-lock code, that preemption is not possible.

In addition, with a preemptible kernel, breaking locks to allow rescheduling is simpler than with the preemption (low-latency) patches. If the kernel releases a lock and then re-acquires it, when the lock is released preemption will be checked for. There are places in the kernel where a lock is held, say in a loop, where it need not be held the entire time. Perhaps for each iteration it can be released and then re-acquired.

MontaVista implements preemption through a counter. When a spin lock is acquired, the counter is incremented. When a high-priority process awakens, the scheduler checks to see whether the preemption counter indicates, by having the value zero, that preemption is allowed. By employing a counter, the mechanism works when locks are nested. With this mechanism, however, any spin-lock-held critical section prevents preemption, even if the lock is for an unrelated resource.

TimeSys employs a priority inheritance mutex. With this mechanism, a high-priority process can preempt a low-priority process that holds a mutex for a different resource. In addition, since they employ priority inheritance, low-priority processes holding a mutex cannot indefinitely postpone a higher-priority process waiting on the mutex. This solves the so-called Priority Inversion Problem.

One can obtain the preemption patches developed by MontaVista from the SourceForge kpreempt website. MontaVista is conducting this work in a laudable, open-source manner. They also provide their work on a real-time scheduler and high-resolution timers on SourceForge, here and here.

The SourceForge kpreempt Project also gives a link to Robert Love's preemptible kernel work [see the April and May 2002 issues of Linux Journal for more information on Love's kernel work]. These are MontaVista's patches and are now maintained by Love, although MontaVista is still involved. The newest patches are available here.

A recent release of Love's work was created to work with a recent constant time scheduler patch by Ingo Molnar. Molnar's O(1) scheduler is available as a patch for 2.4 and has been merged into 2.5. TimeSys makes their preemptible kernel available on their website. The preemptible kernel is provided already patched. To obtain the patches, you need to back them out with diff from a 2.4.7 kernel source tree. Their preemptible kernel source is released under the GPL.

TimeSys additionally has a number of other valuable capabilities for real-time developers that are not available for free download. These include technology for real-time scheduling and resource allocation. These modules add additional system calls, for example, to provide for conventional access to their enhancements.

For those interested in examining the gory details we provide a couple of hints on where to look. The key to the spin-lock mechanism is the include file, spinlock.h, under include/linux. Both MontaVista and TimeSys modify this file.

Interestingly, both seem to rename, and continue to use, the old functions. The original spin-lock functions still are required. It is not acceptable, for example, to preempt the kernel while it is in the scheduler. Infinite recursion would ensue. MontaVista uses names like _raw_spin_lock and _raw_read_lock; TimeSys uses names like old_spin_lock and old_spin_lock_irq.

By examining the file kernel/include/linux/mutex.h in the TimeSys distribution you can see that spin locks have been defined to use write_lock() and read_lock() functions that implement mutex locks. The file kernel/kernel/mutex.c includes the source to the do_write_lock() function, for example, which implements the mutex locking functionality.

Other Real-Time Kernel Efforts

Another popular area for improvement is in the granularity of timing. TimeSys, MontaVista, REDSonic and others have solutions that greatly improve time resolution. For example, TimeSys queries the Pentium Time Stamp Counter on context switches to insure quite accurate CPU time accounting for use of such functions as getrusage().

In the opinion of many developers, including this author, Linux's lack of the complete set of POSIX 1003.1b functionality is a significant shortcoming. Luckily, there are solutions. In particular, TimeSys has quite a good implementation.

In addition to their POSIX contributions, TimeSys has developed some innovative resource control mechanisms. These allow a real-time application to reserve CPU time or network bandwidth, for example. This, coupled with their interrupt threading model, preemptible kernel and other features, provide two or three orders of magnitude of improvement in terms of latency over a standard Linux kernel.

To date, it appears that little has been done to allow a user-space application to register a function to be called as an interrupt handler. This mechanism is called user-space interrupt handling and is available, for example, in IRIX, SGI's UNIX.

Interestingly, SGI, in Linux, provides user-space access to the interrupts from a real-time clock in their ds1286 real-time clock interface. This can be obtained here.

Related to user-level interrupt handling is user-space DMA to and from devices. There is a patch to provide that functionality, here.

Guarantees

Apparently, no real-time Linux vendor is willing to make a guarantee for latency. A guarantee, if given, may have the form of something like

With our Linux kernel, and these hardware requirements, and these drivers, etc., we guarantee that your application, if it is locked in memory, has the highest priority...and will be awoken within Nmicroseconds after your real-time device raises a hardware interrupt. If you are not able to achieve the guarantee, then we will treat it as a bug.

Since we see no such guarantees, what can we infer? We can think of several possibilities.

Vendors do not see any benefit in making the guarantee. No customers request it. In our opinion, many developers want a guarantee. In fact, hard real time implies a guarantee.

Vendors have not sufficiently measured their kernels and environment to be able to give a guarantee. This is a bit tricky. Measuring alone can't prove a guarantee can be satisfied. One must determine that the code is bounded in every circumstance and that all worst-case paths are measured. From the vendors' announcements, it is apparent that many of them have spent quite a lot of effort both measuring and studying the code. In fact, it is likely that many engineers feel rather confident that they could guarantee a certain number given the right environment.

Linux is too diverse to allow for any meaningful guarantees. This is likely the heart of the issue. Developers want to be able to modify their kernel. They want to be able to download drivers and use them. Activities such as these are beyond the control of a vendor. If a vendor were to claim a guarantee publicly, it may have to be for a system so constrained as to be useful for only one or a few select situations.

Perhaps we'll see some kind of compromise guarantee, something like "100ms or less on Pentium class computers for properly behaving applications" plus the time spent in drivers. The driver caveat is important, for example, because the interrupt handling code is probably in the driver and thus a major part of the latency path.

What's Next?

In the third article of our series we will discuss real-time functionality available through means outside of a Linux kernel. We will consider approaches such as RTLinux and RTAI. We also will return to benchmarking and comparing the wide variety of options.
In the first two articles of this series (see "Real Time and Linux, Part 1" and "Real Time and Linux, Part 2: the Preemptible Kernel"), we examined the fundamental concepts of real time and efforts to make the Linux kernel more responsive. In this article we examine two approaches to real time that involve the introduction of a separate, small, real-time kernel between the hardware and Linux. We also return to benchmarking and compare a desktop/server Linux kernel to modified kernels.

We note and discuss no further that LynuxWorks and OnCore Systems provide proprietary kernels that provide some Linux compatibility. LynuxWorks provides a real-time kernel that implements a Linux compatible API. OnCore Systems provides a real-time microkernel that provides Linux functionality in a variety of ways. These allow one to run a Linux kernel, with real-time performance of its processes, on top of their microkernel.

In this article we concern ourselves primarily with single-CPU real time. When more than one CPU is used, new solutions to real time are possible. For example, one may avoid system calls on a CPU on which a real-time process is waiting. This avoids the kernel-preemption problem altogether. One may be able to direct interrupts to a particular CPU and away from another particular CPU, thus avoiding interrupt latency issues. All of the Linux real-time solutions, incidentally, are usable on multi-CPU systems. In addition, RTAI for example, has additional functionality for multiple CPUs. We are focused, however, on the needs of embedded Linux developers, and most embedded Linux devices have a single general-purpose CPU.

What Is a Real-Time Sub-Kernel?

A typical real-time system has a few tasks that must be executed in a deterministic, real-time manner. In addition, it is frequently the case that response to hardware interrupts must be deterministic. A clever idea is to create a small operating system kernel that provides these mechanisms and provides for running a Linux kernel as well, to supply the complete complement of Linux functionality.

Thus, these real-time sub-kernels deliver an API for tasking, interrupt handling and communication with Linux processes. Linux is suspended while the sub-kernel's tasks run or while the sub-kernel is dealing with an interrupt. As a consequence, for example, Linux is not allowed to disable interrupts. Also, these sub-kernels are not complete operating systems. They do not have a full complement of device drivers. They don't provide extensive libraries. They are an addition to Linux, not a standalone operating system.

There is a natural tendency, however, for these sub-kernels to grow in complexity, from software release to software release, as more and more functionality is incorporated. A major aspect of their virtue, though, is that one may still take advantage of all the benefits of Linux in one's application. It is just that the real-time portion of the application is handled separately by the sub-kernel.

Some view this situation as Linux being treated as the lowest priority, or idle, task of the sub-kernel OS. Figure 1 depicts the relationship of the sub-kernel and Linux.



Figure 1. Relationship of the Sub-Kernel and Linux


The sub-kernels are created with Linux by doing three things: 1) patching a Linux kernel to provide a few hooks for things like added functionality, 2) modifying the interrupt handling and 3) creating loadable modules to provide the bulk of the API and functionality.

Sub-kernels provide an API for use by the real-time tasks. The APIs they provide resemble POSIX threads, other POSIX functions and additional unique functions. Using the sub-kernels means that the real-time tasks are using APIs that may be familiar to Linux programmers, but they are separate implementations and sometimes differ.

Interrupt handling is modified by patching the kernel source tree. The patches change the functions, for example, that are ordinarily used to disable interrupts. Thus, when the kernel and drivers in the Linux sub-tree are recompiled, they will not actually be able to disable interrupts. It is important to note this change because it means, for example, that a driver compiled separately from these modified headers may actually disable interrupts and thwart the real-time technique. Additionally, nonstandard code that, say, simply inlines an interrupt-disabling assembly language instruction will likewise thwart it. Fortunately, in practice, these are not likely situations and certainly can be avoided. They are examples to reinforce the idea that no real-time solution is completely free from caveats.

RTLinux and RTAI

The two most commonly used sub-kernels are RTLinux and RTAI. Both RTLinux and RTAI are designed for hard real time. They are much more (and a little less) than just a preemptible kernel. In practical terms, a real-time operating system provides convenience to developers. RTLinux and RTAI provide a wealth of additional, real-time, related functions. RTAI, for example, provides rate-monotonic scheduling and early-deadline-first scheduling, in addition to conventional priority scheduling.

The sub-kernels provide both POSIX and proprietary functions, as well as functions to create tasks, disable/enable interrupts and provide synchronization and communication. When using RTLinux or RTAI, a developer uses a new API in addition to their POSIX functions.

Both RTLinux and RTAI furnish some support for working with user-space processes. This is important because a real-time application for Linux naturally will want to make use of the functionality of Linux. RTLinux provides support for invoking a signal handler in a user-space process, in addition to FIFOs and shared memory that can be read and written in both kernel and user space. RTAI provides FIFOs, shared memory and a complete, hard real-time mechanism, called LXRT, that can be used in user space.

These mechanisms, though, don't make the Linux kernel real time. A user-space process still must avoid system calls because they may block in the kernel. Also, it seems neither RTLinux nor RTAI have been enhanced to work with a preemptible kernel. Since the two approaches are both beneficial, and mostly orthogonal, perhaps they will be combined in the near future. This may be likely since the Love patches are now part of the standard 2.5 kernel tree and perhaps will be a part of the stable, 2.6 kernel whenever it is released.

Some Thoughts on the Choices

For a developer requiring real-time enhancements, choosing among RTLinux, RTAI, the Love preemptible kernel and the TimeSys preemptible kernel, there are a myriad of issues. Let's highlight a few that many developers value.
  • Which are maintained in an open-source manner where independent outsiders have contributed? RTAI and Love. 

  • Which have a software patent for their underlying technique? RTLinux. 

  • Which are part of the 2.5 Linux kernel tree? Love. 

  • Which have additional real-time capabilities besides preemptibility? TimeSys, RTAI and RTLinux. 

  • Which are positioned such that one can reasonably assume that the solution will continue to be freely available for free download? RTAI and Love (in my humble opinion). 

  • Which give control over interrupts and are likely to provide near-machine-level resolution responsiveness? RTLinux and RTAI.
Kernel Availability for Different Processors

None of these real-time approaches are available for every CPU on which Linux runs; extra effort is required to adapt the solution to a new processor. But, as the four solutions we examine here have quite active development, it is safe to assume that support for additional CPUs is at least contemplated.

As a snapshot, the Love preemptible kernel is currently only available for x86, but with MontaVista's support it is likely to be ported to most, if not all, of the CPUs that MontaVista supports. That includes PowerPC, ARM, MIPS, SuperH, etc. The TimeSys kernel is currently available for PowerPC, ARM, SuperH and Pentium. RTLinux is available for x86 and PowerPC. RTAI is available for x86 and PowerPC.

Benchmarks

You may download the benchmark programs from the Web (see the Resources Sidebar under K Computing Benchmarks). All of our benchmarks were run on a 465MHz Celeron. Other x86 CPUs, however, have produced similar results. We have not benchmarked on other kinds of CPUs.

We benchmarked the Red Hat 7.2 kernel, which is based on Linux kernel 2.4.7; the TimeSys Linux 3.0 kernel, which is based on Linux 2.4.7; and a kernel patched with Robert Love and MontaVista's preemption patch for Linux kernel 2.4.18. We will refer to these kernels as Red Hat, TimeSys and Love, respectively. We separately benchmarked the RTAI and RTLinux kernels.

The benchmark consisted of timing the precision of a call to nanosleep(). Sleeping for a precise amount of time closely relates to the kernel's ability to serve user-space, real-time processes reliably. The Linux nanosleep() function allows one to request sleeps in nanosecond units. Our benchmark requests a 50 millisecond sleep. Interestingly, a request to nanosleep() to sleep N milliseconds reliably sleeps 10 + N milliseconds. Thus, we measure jitter with respect to how close the sleep was to 60 milliseconds. Also, one should note that nanosleep() is a busy wait in the kernel when the request to sleep is for an amount of two milliseconds or less. Therefore, a busy wait would not simulate interrupt response time as well as a true sleep.

The benchmark program takes 1,000 samples. The last 998 are used in the graph. The first two are discarded to avoid cache slowdowns as a result of a cold cache. The benchmark program was locked into memory via mlockall() and given the highest FIFO priority via sched_set_scheduler() and sched_get_priority_max().

The heart of our benchmark is:

 t1 = get_cycles();
nanosleep(fifty_ms, NULL);
t2 = get_cycles();
jitter[i] = t2 - t1;

The get_cycles() function is a machine-independent way to read the CPU's cycle counter. On x86 machines it reads the timestamp counter (TSC). The TSC increments at the rate of the CPU. Thus, on a 500MHz CPU, the TSC increments 500,000,000 times per second. The frequency of the CPU is determined by examining the CPU speed value listed in /proc/cpuinfo. The read of the TSC takes on the order of about ten instruction times and is extremely precise in comparison to the interval we are timing.

The difference, in milliseconds, from our expected sleep time of 50 + 10 milliseconds for a given value of jitter, is calculated as

 diff = (jitter/KHz) - 10 - 50;

The five benchmarks used the stress tests of Benno Senoner, which are part of his latency test benchmark. These tests stress the system by copying a disk file, reading a disk file, writing a disk file, reading the /proc filesystem and performing the X11perf test. The graphs of the three kernels for these loads are shown in Figures 2-6.


Figure 2. Copying a Disk File



Figure 3. Reading a Disk File



Figure 4. Writing a Disk File



Figure 5. Reading the /proc Filesystem



Figure 6. Performing the X11perf Test


Since the Red Hat kernel is clearly much less responsive than the Love or TimeSys kernels, we separately graph just the Love and TimeSys kernel results. These are depicted in Figures 7-11.


Figure 7. Love and TimeSys Kernels: Copying a Disk File


Figure 8. Love and TimeSys Kernels: Reading a Disk File



Figure 9. Love and TimeSys Kernels: Writing a Disk File



Figure 10. Love and TimeSys Kernels: Reading the /proc Filesystem



Figure 11. Love and TimeSys Kernels: Performing the X11perf Test


It is apparent from the graphs that the preemptible kernels provide significant improvement in responsiveness. Because they represent much improved performance, without a change in the API required to be used by an application, they are clearly attractive choices for embedded Linux developers.

RTLinux and RTAI Benchmarks

One justifiably expects that RTAI and RTLinux will provide rock-solid performance even under great loads. They meet these expectations, evidenced through our benchmarks. One must remember, though, that there are still a few caveats. Some issues to keep in mind that can thwart real-time performance: perform no blocking operations such as memory allocation; don't use any drivers that haven't been patched to avoid truly disabling interrupts, and avoid costly priority inversions.

To benchmark RTAI and RTLinux we created a periodic task and measured its timing performance against the requested periodic rate. The worst-case performance for both RTLinux and RTAI is on the order of 30 microseconds or less. Our benchmark programs are available for free download (see the Resources Sidebar under K Computing Benchmarks).

Resources




Talk back! Do you have comments or questions on this story? talkback here



About the author: Kevin Dankwardt is founder and CEO of K Computing, a training and consulting firm in Silicon Valley. In particular, his organization develops and delivers embedded and real-time Linux training worldwide. 



Copyright © 2002 Specialized Systems Consultants, Inc. All rights reserved. Embedded Linux Journal Online is a cooperative project of Embedded Linux Journal and LinuxDevices.com. 

Be sure to read the full three-part series on Real-time Linux by Kevin Dankwardt . . .

实时和Linux(1)
本文作者:
康华:计算机硕士,主要从事Linux操作系统内核、Linux技术标准、计算机安全、软件测试等领域的研究与开发工作,现就职于信息产业部软件与集成电路促进中心所属的MII-HP Linux软件实验室。如果需要可以联系通过kanghua151@msn.com联系他。


Kevin Dankwardt (1月, 2002)
什么是实时系统? 我将用三篇系列文章为读者介绍实时系统,第一篇介绍实时系统的概念,从而为进入后两章中的实时Linux作准备。
Linux系统执行吞吐量有限的应用程序很理想,但是对于那些需要确定响应(deterministic)的任务却并不合适,虽然Linux内核性能提高后对提高确定性会有一定的好处。所谓的实时应用程序首先需要的就是确定响应。本文着重讨论实时应用程序的本质特性和Linux对于运行这类任务的优势和不足。以后的文章中,我将向大家介绍采用那些方法可以使实时任务满足硬实时要求。多数方法都是针对Linux内核而言,但是有时也会用到GUN C库等。
什么是实时
有许多种定义可以描述实时,这是因为不同任务需求各异,所以对实时的定义也不尽相同。一些任务可能要求平均响应时间,而另一些则可能要求每一个任务都必须能按时完成。
应用程序的响应时间是指,自应用程序收到启动命令——通常是由硬件中断触发——到应用程序执行完毕,产生结果为止所经历的时间间隔。比如工业控制中打开阀门、在可视模拟器中绘图或数据采集任务中的数据包处理这些都属于有确定响应时间的任务。
我们考虑一下打开阀门的情景。假设有一条传输带自动移动,其上方有一个染料喷头,侧面有一个传感器监视传输带移动。我们为传输带隔段上色,也就是说,当需要上色的部分传到喷头下方时,传感器将发送信号给系统,系统打开喷头阀门,向传输带喷射染料。我们应该要求打开阀门的平均时间准确呢?还是每次打开时间都必须准确?不用说,这里应该要求每次打开阀门时间都必须准确。
我们需要打开阀门时间不能晚于待上色位置以经过;关闭阀门也不能早于特定位置上色完毕前。另外我们也不能超时打开阀门,那样会将传输带的其它部分错误地染上涂料的。
打开阀门的最后有效时间和染色最后完毕时间是任务的最后期限,如果任务超过执行的最后期限(打开或关闭动作),就不能正确完成染色任务。如果说最后期限是1毫秒,就意味这从传感器发信号给系统到必须开始喷染料的时间间隔是1毫秒。为了保证不会超过最后期限,我们设计系统使其在收到传感器中断后960微妙开始上色。但实际的上色工作有时可能提前一点,有时可能延迟一点。
当然,开始时间并不是绝对准确的,不可能正好在中断发生后950微秒开始染色。比如可能这次染色早开始了10微秒,而下次却晚发生了13微秒。这种差异我们称为是波动。从我们的例子中可以看出,如果系统的波动很大,那么我们必须设计系统的执行响应时间更短些,这样才能满足最后期限要求。但是这样以来会做造成打开阀门往往早于实际需要,频繁发生就会浪费很多染料。可见一些实时系统不但需要一个最后期限而且也对波动大小有一定要求。我们认为所有超过最后期限的行为都是错误的。
操作环境是指操作系统和运行在其中的进程、中断活动和硬件设备的活动(比如磁盘)。我们希望能够有一种强健的实时应用程序的运行环境,以便我们可以同时运行任意数量的各种不同任务,而且同时能保证任务的执行性能。
如果某个操作环境我们可以确定给定响应或需求的最坏情况(最大延迟),我们便认为该操作环境是确定的,否则,如果不能确定最坏情况,那么该操作环境便是不确定的。实时应用程序需要确定的操作环境,所以实时操作系统必须能提供确定的操作环境。
非确定性往往是因为算法执行时间不是恒量所造成的,比如,如果一个操作系统中的调度程序必须遍历整个运行队列才能找到需要运行的下一个任务。该算法是线性的,即时间复杂度为O(n),也就是说,如果n越大(运行队列的任务数),那么搜索下一个任务的时间也就相应地越长。对于O(n)来说的算法来说,它的执行没有时间上限,所以如果响应时间取决于睡眠进程被唤醒并被选择执行的耗时,而且调度程序的时间复杂度为O(n),那么就无法确定最坏响应情况。而这正是Linux调度程序的特征。
对于普通系统来说,系统设计师不能控制用户到底可以创建多少个进程,但是对于嵌入式系统来说,通过某些系统特性,比如用户接口,可以限制进程数量,所以系统的运行环境可以保证调度程序的执行时间有限。我们可以设置操作系统的环境参数来获得一定的可确定性保证。注意在一个系统可能需要一定的优先级别和其它特性,但是只要调度时间是有限的,那么响应时间也就有限。
一个可视模拟程序可能要求的是图像的平均刷新率,比如每秒60帧。只要帧不频繁丢失,而且在一段时间内达到平均每秒60帧的话,系统性能就可以接收。
上色喷头和帧的平均刷新率两个例子分别对应于我们所称的硬实时和软实时。硬实时应用程序的行为必须满足它们的最后期限,否则将带来不可原谅的结果,有些东西可能爆炸,有些可能崩溃,有些操作可能失败,甚至会有人员伤亡。软实时一般也需要满足最后期限,但是如果只有少量轻微的超期行为,那么系统仍然被认为是可靠的。
我们考虑另一个例子,假设我们生产了一个机器鼹鼠帮助科学家研究动物行为,通过细心地观察我们发现海豹从冰洞里爬出前,鼹鼠有600毫秒的时间从洞口逃跑,否则就要被海豹吃掉。我们的机器鼹鼠如果在平均600毫秒内逃开,它能否兴存?也许可以,但必须果海豹的攻击速度和鼹鼠的反映时间同步改变。你是否会按照这个假设来设计鼹鼠呢?我们知道从洞口周边一定范围内海豹能抓住鼹鼠,所以我们的鼹鼠必须在600毫秒内跑出这个范围,这个范围被有些人称作死亡线(最后期限)。
操作环境如果要适用于硬实时应用程序,就必须保证所有程序的最后期限都必须严格满足。这意味着操作系统中的所有行为都必须是可确定的,如果操作系统想用于软实时应用程序,那么通常意味着偶然的延迟可能发生,但是这种延迟不能无节制。
应用程既有定量要求也有定性要求。对于可视模拟程序,定量要求指系统响应应该足够快,保证看起来自然。响应时间有定量要求,比如用户输入一真图象后,要求能在33.3毫秒内被绘制出来。也就是说,如果一个飞行员移动飞行模拟中的控制杆,那么窗外的视角也必须能在33.3毫秒内根据新的飞行路线更新。你一定要问33.3秒的时间需要是根据什么得到的?它是根据人的传感系统——这个速度足可以保证人感觉视觉模拟场景很流畅。
实时需要不是说时间要求短,而是指有时间要求。如果改变绘图程序的时间要求为33.3秒而不是原来的33.3毫秒,那么这个系统仍然属于实时系统。不同之处可能在于满足响应时间要求的方法不尽相同。在Linux系统中33.3毫秒的时间要求需要使用特别的内核和相关函数,而33.3秒的要求使用标准内核就可以达到。
所以说单从执行速度的快慢不能说明是否是实时系统,反之易然。但是快速运行是实时操作系统的基本特征。这点有助于我们区分实时操作系统和实时应用程序。实时应用程序有时间相关的要求,实时操作系统可以保证实时应用程序的性能。
实际上,通用目的的操作系统,比如Linux,如果可以合理设置操作环境的话,都能为最后期限要求相对宽松(长)的任务提供了足够的措施来满足时间要求。因此常有人说现在处理器已经非常快,所以不在需要实时操作系统了。但这种情况仅仅对那些相对无聊的项目适用。
但你必须明白,如果操作环境配置不恰当,那么即使是经过广泛的测试毫无问题的的系统,执行时间超过最后期限的问题仍有可能发生。因为线性算法等因素可能潜伏在代码中,而没被发现。
令一个需要注意的地方是群体效应(audience effect),常言说得好“群体对表演越重要,表演失败的可能性越大”。群体效应例子举不胜举,如同不可重现性一样,它的罪魁祸首通常都是竞争条件,竞争条件指的是一种状态,它的结果取决于任务之间运行的相对速度或外界环境。所有的实时系统按理来说都存在竞争条件,但设计优秀的系统,它的竞争条件仅会发生在最后期限附近。任务独立测试无法证明竞争条件是否已经避免。
因为操作系统中大多数任务都是被动响应请求而不是主动执行的,所以许多会产生延迟的活动都是可以避免的,为了使得特殊应用程序(进程)可以满足最后期限的要求,那些CPU范畴进程的竞争者,如磁盘I/O、系统调用或中断都应该被控制。构建一个合适的受限操作环境需要考虑的因素不少,其中操作系统特性和驱动程序是首先应该关心的。操作系统可能会堵塞中断或不允许系统调用被抢占。这些活动可能是非确定的,所以可能会造成应用程序无法接受的延迟。
实时操作系统相比通用目的的操作系统建立受限环境要简单许多。
Linux是实时的吗?
除非特别声明,否则我们讨论的内容限于为2.4.9版本的内核,该版本内核于2001年8月发布。我们所谈到的绝大部分内容对最近几年内出现的其它内核发行版也都有效。
实时应用程序要求操作系统必备某些特性,另外有些特性则最好能够提供。这些特性的列表包含在 Comp.realtime的FAQ上。其中指出操作系统应该支持多线程和内核抢占,而且线程应该具有各种优先级,同时内核还要提供可预测的同步机制。Linux内核(2.6版本后已经成为抢占式内核)不是抢占式内核。
FAQ中还提出用户应该了解中断期间OS的具体行为,要知道系统调用需要的时间和中断被禁止的最大时间。近一步还应该知道中断级别和设备驱动的IRQ(中断请求线),以及中断执行的最大耗时。我们在下面的基准(benchmarking)部分中提出了一些中断响应的时速和中断禁止(“中断被堵塞”)时间。
许多开发者还会对如下内容很感兴趣:最后期限调度程序、微妙级别的抢占内核、上百个的优先级别、用户空间的中断处理和DMA操作、同步机制中的优先级继承、精确到微妙级的定时器解析度、符合POSIX 1003.1b规范的完整函数集合,另外还有调度程序、exit()等函数的时间复杂度为O(1)(完成时间为常量)的算法。这些要求仅仅靠标准内核和GUN C库是无法满足的。
另外,实际上,延迟的量级也非常关键,Linux内核,采取相对简单措施建立的受限环境,可以使得最坏响应时间控制到50毫秒内,平均响应时间只有几微妙。
Andrew Morton 建议应该禁止卷帧缓冲(scroll the framebuffer)、运行hdparm(优化硬盘速度的程序)、使用blkdev_close 和切换终端(请看参考资料)。这些都是限制操作环境的例子。
但是有些应用程序可能需要响应时间少于25毫秒,这种要求使用Linux内核函数无法满足,这时,需要采用一些内核函数以外的机制来保证这种短响应最后期限的要求。
实际中我们可以看到硬实时操作系统不但能够确保确定的响应时间,而且同时提供相比,如Linux等,通用操作系统更快的响应。
我们不认为Linux是一个实时操作系统,因为它不能保证有确定的执行性能,而且它的平均和最坏时速也要远低于多数实时应用程序的要求。要知道多数实时应用程序的运行时速不是受硬件限制的,比如Linux在标准的基于x86的PC机上的响应时间可能为数毫秒,而在同样硬件上运行的实时操作系统则可能将响应时间缩短20倍。
Linux内核在单处理器上的性能相对偏低主要是两个原因造成的:一个是内核禁止中断;另一个是内核无法及时抢占。如果中断被禁止,则系统无法继续接收到来的中断信号,中断接收延迟的时间越长,应用程序响应中断的期望延迟也就越长。另外内核缺乏抢占能力,意味着内核不能抢占自身,比如一个低优先级进程发起系统调用,如果这时想切换到刚刚唤醒的高优先级进程,必须等待系统调用完成,因此切换延迟时间很长。在SMP系统中,Linux内核采用的锁和信号量同样会造成延迟。
编写实时应用程序
用户空间的实时程序需要Linux内核提供的服务,这些服务包含进程调度、进程间通讯、性能提升等等。我们来分析一下各种系统调用(内核为用户程序提供的这些服务对实时应用程序开发者很有帮助)。这些系统调用可被用来限制操作环境。
Linux内核中共有208个系统调用,系统调用通常都是间接通过库例程被使用的,这些库例程和系统调用习惯上都使用相同的名称,但有些时候也会映射到替代的系统调用上,比如Linux中GUN C库(2.2.3版本)中的信号库例程就映射到sigaction系统调用上。
实时应用程序可能会调用几乎所有的系统调用,其中我们最感兴趣的调用是exit(2)、fork(2)、exec(2)、kill(2)、pipe(2)、brk(2)、getrususage(2)、mmap(2)、setitimer(2)、ipc(2)(它有三种形式:semget()、shmget()和msgget())、clone()、mlockall(2)和sched_setscheduler(2)。这些调用中的大多数在W. Richard Stevens所著的《UNIX高级环境编程》或Bill O. Gallmeister所著的《POSIX 4:现实世界编程》中都做了叙述。clone()系统调用是Linux特有的调用,其它大多数系统调用都和Unix系统的系统调用兼容。但是还是请详细阅读相关的帮助发现其中的细微差异。
Linux上的实时应用程序和POSIX线程调用也密切相关,比如pthread_create()和pthread_mutex_lock()例程。Linux中已经存在了这些函数的实现,最常被使用的是来自GUN C库中的实现。这些被称为Linux线程的函数基于clone()系统调用实现并且由Linux调度器调度。但是有些POSIX函数只对POSIX线程有效(比如,sem_wait())而不能用在Linux进程中。
在普通Linux系统中运行的应用程序可能在非理想条件下都相当的慢,因为它要受众多条件影响,特别是资源争用现象,这些资源包含同步原子(synchronization primitives)、内存、CPU、总线、CPU高速缓存和中断控制器等。
应用程序可以采用许多方法减少资源争用发生。对同步机制来说,比如互斥变量和信号量等应该在程序中减少使用;采用优先级继承版本的函数;采用相对快速的实现;减少在临界区中停留时间等等。CPU的争用受优先级的影响,比如可以将非抢占内核看作是抛弃优先级的典型。总线的争用一般不会很长时间,用不着过分关注,但是具体如何是由你硬件决定的。你是否有时钟要花70毫秒时间响应或持有总线?对高速缓存的争用要受切换频率和大量随机数据或指令的影响。
我们做些什么
因此,实时应用程序通常都会给自己一个高优先级,将自身锁到内存(不动态扩张内存),而且最好使用不加锁的通讯方法,合理使用缓存内存,避免非确定的I/O操作(比如,套接字),在合适的受限执行环境中运行——受限执行环境指限制硬件中断,限制进程数量,通过使用其它进程来缩减使用系统调用,避免使用内核易发生故障的工具,比如不要运行hdparm。
实时应用程序使用某些系统调用时,需要特殊的权限。通常只有系统管理员或进程的宿主(拥有系统管理员权限的shell运行程序,或拥有的执行文件设置了SUID标志)才有这种能力。现在还可以采用一种权能机制达到此目的,目前的权能包含锁定内存权能,CAP_IPC_LOCK(别管“IPC”这个名字,只要记住它),还有设置实时优先级的权能,CAP_SYS_NICE等和实时任务有关。
实时进程通过sched_setcheduler(2)设置优先级。目前调度程序实现了标准的POSIX策略:SCHED_FIFO和SCHED_RR,优先级从1到99。数值越大,优先度越高。POSIX函数sched_get_priority_max(2)可检查给定策略的最高允许的优先级。
实时进程应该锁定使用的内存并不再扩张内存使用量。锁定内存在Linux下使用POSIX的标准函数mlockall(2)。通常设置MCL_CURRENT MCL_FUTURE 这两个标志分别来锁定当前内存和禁止进程扩张占用新的内存。通常实时进程是不允许内存扩张的,但是如果你能忍受扩张带来的延迟,那么也应该对新申请的内存锁定。一定要小心扩张进程栈和分配动态内存,并且在进程进入时间敏感区域前,要使用mlockall(2)函数。注意你可以利用getrusage(2)检查是否你的进程产生了许多页错误。下面我写了一段代码示范这些函数如何使用,注意必须逐个检查这些函数的返回值,如果想了解更详细的细节,请阅读帮助手册。
priority = sched_get_priority_max(SCHED_FIFO); sp . sched_priority = priority; sched_setscheduler(getpid(), SCHED_FIFO, &sp); mlockall(MCL_FUTURE MCL_CURRENT); getrusage(RUSAGE_SELF,&ru_before); . . . // 实时部分
getrusage(RUSAGE_SLEF,&ru_after); minorfaults = ru_after.ru_minflt - ru_before.ru_minflt; majorfaults = ru_after.ru_majflt - ru_before.ru_majflt;
实时应用程序的基准
针对Linux的各个方面现在都已经提出了相关基准,实时应用程序开发者最关心的是中断响应时间、定时器粒度、上下问切换时间、系统调用负载和内核抢占等。中断响应时间是指设备发出中断到相关的中断处理程序开始执行的用时。这个过程有可能延迟,因为系统可能正在处理其它中断,或者中断此刻被禁止。Linux没有实现中断优先级,当Linux处理某个中断时,多数中断往往被禁止。延迟一般都很短,但有时也许会有数个毫妙。
另外一方面,内核可能长时间堵塞中断。来自Andrew Morton 的Intlat程序可以测算中断响应时间。类似地他的schedlat 程序能测量调度响应时间。
上下文切换时间计算工具包含在著名的基准工具包 LMbench中,另外还有其它人实现的一些工具(参考 1, 参考 2)。 LMbench 同时也能提供了有关系统调用的信息。
表1显示了LMBench的执行结果,它给出了上下问切换时间。基准程序运行三次,报告每种配置下上下问切换所用的最短时间,致意最长时间也不会高于最短时间%10以上。显示的进程大小以K为单位,上下文切换时间以毫秒为单位。从上下文切换时间的变化可以看到,大量使用高速缓存中的数据会造成上下问切换时间猛增。上下问切换时间包含恢复高诉缓存状态的用时。
表1. 上下问切换时间
作为中断关闭时间的例子,你可以看看这里的结果。这里使用hdparm来做实验,可以发现在hdparm运行期间2ms内中断都会被禁止。开发者可以使用intlat机制来测量系统运行时刻中断关闭时间。只有在很罕见的情况下中断关闭时间才可能超过100毫秒。这些情况应该在嵌入系统中杜绝,Morton提醒的正是这些情况。
对实时程序开发者来说,调度响应时间更为重要。也就是说,继续一个新被唤醒的高优先任务需要的延迟很关键。由于Linux内核在系统调用期间无法抢占低优先进程,去执行一个新被唤醒的高优先进程。所以Linux内核被称为非抢占内核。
Benno Senoner的响应速度测试显示可能延迟100毫秒或更长(请看参考资料)。我们可以发现中断堵塞和调度响应过长,以致无法满足某些应用程序的执行性能要求。
对许多嵌入Linux开发者来说定时解析度同样相当重要,比如settimer(2)函数可用来设置定时器,该函数,和其它Linux函数一样解析度为10毫秒。如此以来,如果设置定时器的超时时间为15毫秒,那么实际上定时器在20毫秒后才能执行。连续测试定时15毫秒1000次,对于一个精密的系统中,我们会看定到时器执行的平均时间间隔为19.99毫秒。最小时间间隔为19。987毫秒最大间隔为20.042毫秒。


关于作者: Kevin Dankwardt是 K Computing 公司的创始人和CEO ,该公司是一家硅谷的培训和咨询公司 。特别要强调的是,该公司在全球范围内发展和推广嵌入和实时Linux培训
Kevin 将继续他的实时之旅,这次他要着重分析如何通过改造Linux内核来为应用程序带来实时性能。
在2002年的1-2月发行的嵌入Linux月刊中,我们探讨了有关实时和Linux的基础问题。而这次我将精力花在改造Linux内核来为应用程序带来实时性能这个主题上。到目前为止,工作的重心集中在提高内核响应速度——通过减少抢占响应时间缩短系统响应时间,因为我们知道抢占响应时间在Linux中耗时较长。

通过改进内核——仅仅是剔除一些标准内核的功能——并不去改变或增加内核API,应用程序就可以获得更快的响应速度。这样做优势明显,因为ISVs(独立软件开发商)不需要为不同的实时要求开发不同版本的系统。比如,DVD播放器可以在一个改进过的内核上更稳定的运行,而它不必知道该内核是经过改进的版本。
背景和历史
自2.2版本内核发布以来,内核抢占成为一个热门话题。Paul Barton-Davis 和 Benno Senoner曾给Linus Torvalds写了一封信(该信后来追加了许多人的签名),请求在2.4版本内核中显著降低抢占延迟时间。
他们需要Linux性能上能满足播放音频、音乐以及MIDI的要求。Senoner开发了一套基准软件(benchmarking software)测试2.2版本内核(以及后来的2.4版本),在最坏情况下发现抢占延迟高达100毫秒(参考)。 对于音频应用程序来说,这么长的延迟时间显然是无法接受的。因为经验显示系统响应时间只有在几毫秒内才能满足音频应用程序的需要。
有两个补丁包为内核提供了相当不错的抢占响应速度提升。Ingo Molnar (来自 Red Hat) 和 Andrew Morton (来自Wollongong大学) 两人都开发了内核补丁,这些补丁为内核中长代码路径片段插入了抢占点。你可以在这里这里发现Ingo Molnar的补丁包 ,或在这里找到Andrew Morton的补丁包。
另外,Morton还提供了测量响应速度的工具,比如测量内核忽视请求时间长度等,有关详细信息在他的快速响应补丁网站上可以找到。
目前,至少有两个组织开发了抢占式内核,为内核抢占问题提供了更基础、更强大的解决途径。
在2002年1-2月ELJ上发表的系列文章的第一篇中,我列举了不少希望Linux能够支持的实时功能,它们包括多优先级、用户空间中断处理和DMA、同步机制中的优先级继承(优先级继承可用来解决优先级反转问题。当优先级反转发生时,优先级较低的任务被暂时地提高它的优先级,使得该任务能尽快执行,释放出优先级较高的任务所需要的资源)、微妙级的时钟解析度 、全面实现POSIX1003.1b规范要求的功能和调度程序的恒量耗时算法等。我们在本文中也会简要讨论它们。
要明白这些性能改进都必须依赖给内核打补丁,但是打补丁后的内核就不再兼容其它标准内核了,比如抢占式内核需要修改自旋锁的代码,这时如果一个驱动程序的二进制执行文件如果不采用修改过的自旋锁,那么很可能无法正常抢占。所以改造内核时一般都需要内核源代码并重新编译内核代码。值得一提的是,使用模块化的Linux驱动程序是使得源代码兼容的一个有效方法。我们不赞成仅仅发布驱动程序的已经编译的二进制文件而没有源代码,因为这样作既缺乏兼容性保障也不符合开源精神。
改进
各种各样对内核的改进能够为应用程序透明地提供众多好处。比如,提高内核的抢占性主要通过抢占式内核或是添加抢占点着两个途径。从内核角度进行改进可以使得应用程序不用做修改就能获得高的响应速度。
透明性也应该考虑是否改变对内核来说也是透明的,换句话说就是,是否采用的方法能自动跟踪内核的改变。Molnar和Morton提出的插入抢占点的方法需要测量新内核中的调度响应时间,并且在此基础上插入抢占点到合适位置。
相反,如果在SMP锁定基础上创建抢占式内核,则可以自动、无须修改地过渡到新内核。而且如果在SMP-锁定机制中实现抢占性,则当内核开发者提高SMP锁定粒度(granularity)时,也同时会自动提高抢占粒度。我们会看到SMP锁定将粒度稳步提高,因为SMP不断扩张必定需要不断提高锁定粒度。
正是由于这种新引进的SMP锁,内核抢占才自2.4版本或更新(实际在2.6才正是支持)内核后被正式实现。而早期内核缺乏SMP锁。
抢占式内核的另外一个需要强调的重要优势是它使得代码——但代码不会发觉——可被抢占,比如,驱动程序开发者不必为要使驱动程序可被抢占去编写任何特殊代码,驱动程序的代码能在必要时被抢占,除非驱动程序持有锁。所以在内核其它部分,编写优良的SMP安全(代码中不存在多处理器并发访问共享资源的危险)的驱动程序将自动受益于抢占式内核。而另一方面,非SMP安全的驱动程序可能不能和抢占式内核和谐工作。
也许有人会发觉,只要某些驱动程序没有请求锁,内核代码就可以调用它。比如我们使用MontaVista的抢占式内核做简单测试中发现,动态装载的驱动程序中read()和write()函数都可以正常被抢占,但是函数init_module()、open()和close()却不能。这意味着如果一个低优先级的进程执行操作open()或close(),那么它有可能被一个新唤醒的高优先级别进程推迟抢占时机。
实际上,开发者最好去测量响应时间,因为即使使用抢占式内核,我们也有可能找到有那些代码片段持有锁的时间超过了应用程序允许的长间。
比如MontaVista, 提供了一个抢占式内核,对那些长期持锁的代码片段插入了抢占点,同时也提供了测量工具,以便开发者可以测量它们实际应用程序和环境的抢占性。
SMP锁的目的是确保内核重入(re-entrance)安全。也就是,如果并行运行的进程需要访问内核资源,那么对这些资源访问必须安全、可靠。越精细的锁定粒度越能保证竞争进程运行的并行巡行能力越强。因为并行能力的提高需要减少堵塞(因为锁的争用),(而细粒度锁可减少阻塞机会)。
上述概念也同样适用于单处理器。对于I/O系统来说,如果将I/O设备看作一个独立的处理器,那么应用程序和I/O活动的并行运行无疑能提高系统吞吐量。提高抢占性,意味着高优先级的I/O范畴进程(I/O-bound process)会更频繁地被唤醒,从而提高了吞吐量。这样以来,虽然可能带来某些负作用,比如我们可能会需要更多的上下文切换和在内核关键路径上执行更多代码(关键路径直那些访问共享资源的代码路径,为了防止竞争条件需要上锁等操作),但是即使如此,我们仍然可以获得更大的系统吞吐量。
允许内核抢占的优势是显而易见的,标准内核迟早会引入抢占功能(目前已经具有)。抢占式内核有些实现中可以提供几微妙的响应时间,而更精确的实现则可把响应时间提高到几十分之一微秒。
环顾嵌入Linux生产商,MontaVista 和TimeSys提供了抢占式内核;REDSonic使用抢占点;LynuxWorks和红帽子使用RTLinux;Lineo 使用RTAI;OnCore通过和Linux系统调用兼容的API(和LynuxWorks利用LynuxOs一样)与在他们的抢占式微内核上。
运行一个Linux内核(可抢占的)达到实时目的。

抢占点
抢占点的核心思想是在特定点上调用调度程序去检查是否有更高优级的任务做好了运行准备,需要即刻运行。Molnar和Morton就是利用该思想,测量内核路径的执行时间,找到时间过长的路径插入调度检查点(等于打断了长路径)。你可以通过抢占补丁代码或是对比打过补丁的内核和先前未达补丁的内核,从中发现那些地方需要插入抢占点。抢占补丁看起来很象if (current ->need_resched) schedule()(这是用户空间抢占执行的典型情形);
为了使用Andrew Morton开发的抢占点内核补丁,需要从上面给出的URL下载该补丁,同时还要从kernel.org下载恰当版本的内核源代码。然后内核打上补丁然后重新编译,详细细节可以在这里找到,但要注意这些是为2.4版本老版本内核而言的。另外还请注意你可能需要升级你的开发环境。
使用Molnar的补丁要做的事情同上。下载补丁、编译新内核,Morton开发了针对2.4多个版本的补丁。而Molnar的补丁针对一些2.2版本内核和早期的2.4版本内核。

抢占式内核
抢占式内核使得用户程序在执行系统调用期间可以被抢占,从而使新被唤醒的高优先级进程能够运行。这种抢占并非可以在内核中任意位置都能安全进行,比如在临界区中的代码就不能发生抢占。临界区是指同一时间内不可以有超过一个进程在其中执行的指令序列。在Linux内核中这些部分需要用自旋锁保护。
MontaVista 和 TimeSys采用类似的方法建立抢占式内核。他们巧妙地将自旋锁的功能加强为也能防止抢占(2.6新内核中自选锁也是这样做的)。依靠该方法,抢占只能发生在未使用自旋锁的其它部分。当一个更高优先级的进程被唤醒,调度程序就能抢占低优先级进程正在执行的系统调用,只要此刻该系统调用代码没有被自旋锁保护——加上自旋锁意味不可抢占。
另外,使用抢占式内核,打断锁(解开再锁上)使得重新调度相比采用抢占点(低响应时间)补丁要简单。因为如果内核释放了一个锁,然后又再获得它,那么当锁被释放时,就会进行抢占检查。内核中有些地方需要上锁——比如一个循环——但不需要一直持有锁,可以每进行一次循环,锁就要被释放,然后再获得。
MontaVista实现抢占主要通过一个计数器(2.6版本中称它为抢占计数,用它可以跟踪锁定嵌套数),当自旋锁被获取时,该计数就增加1。当高优先级别的进程被唤醒,调度程序会检查抢占计数——检查是否为零——判断是否此刻允许抢占(如果为0,就可以发生抢占)。依靠这个计数器,当锁嵌套调用时,抢占机制能很好的工作,但是任何持有自旋锁的临界区域都不允许抢占,即使锁用来保护的是些无关资源。
TimeSys采用一个优先级遗传互斥量(priority inheritance mutex)。利用这种机制,一个高优先级的进程可以抢占一个持有不同资源互斥量的低优先级别的进程。另外因为采用了优先级遗传,所以持有互斥量的低优先级的进程不能无限推迟等待在该互斥量上的高优先级进程。这样以来解决了所谓的优先级翻转问题(Priority Inversion Problem)。
MontaVista公司开发的抢占包可以从SourceForge kpreempt 站点获得。MontaVista公司的开源精神很值得赞赏,他们同时也提供了他们的实时调度程序与高解析度定时器,这些都可以从SourceForge的这里这里获得。
SourceForge 的kpreempt 项目同时也含有Robert Love开发的抢占式内核的连接(请看2002年4月和5月出版的Linux杂志,其中介绍了他对内核改进的详细信息)。 另外 MontaVista的补丁也由Love维护(现在Robert Love已经归于MontaVista麾下),虽然MontaVista 也参与维护工作。最新的补丁可在此处下载。
Love最新发布的补丁现在已经可以和Ingo Molnar的恒时(耗时为衡量)调度程序补丁协同工作。Molnar的时间复杂度为O(1)的调度程序可以用于2.4版本内核,并且目前也已经植入了2.5版本的内核中了。TimeSys也将自己的抢占式内核发布在了他们的网站上。抢占式内核需要的补丁包都已经完备。要得到这些补丁,你需要将它们对比2.4.7内核代码数生成diff文件。上述的这些抢占式内核程序都遵守GPL协议。
TimeSys还为实时程序开发者提供了许多额外功能,这些功能不能无偿下载。它们包括实时调度和实时资源分配技术。这些模块填加了额外的系统调用,比如提供对新增功能的访问方法等。
如果那些朋友对这些细节感性趣,可以通过我们提供的许多线索找到想要的信息。自旋锁机制的关键信息包含在文件include/linux/spinlock.h中,MontaVista 和TimeSys都对该文件进行了修改。
有趣的是,虽然MontaVista 和TimeSys都对旧函数改了名字,但是他们仍然使用这些旧函数。以前老的自旋锁函数仍然需要。比如不允许在调度程序执行期间发生抢占内核的行为,否则会带来无穷无尽的递归调用。MontaVista使用象_raw_spin_lock和 _raw_read_lock这样的命名;TimeSys 使用象old_spin_lock 和 old_spin_lock_irq这样的命名。
可以在TimeSys发布的代码kernel/include/linux/mutex.h中发现它是利用write_lock() 和 read_lock()函数——他们实现了互斥锁——定义自旋锁的。具体实现函数do_write_lock()可以在文件kernel/kernel/mutex.c中看到,它实现了互斥锁定功能。
其它的内核实时方法
另一个能有效提高实时性的方法是提高时钟的粒度。TimeSys, MontaVista, REDSonic 和其他公司都提高了时钟解析度,比如,TimeSys 在上下文切换期间使用Pentium处理器的时间戳计数器(Stamp Counter)来确保准确地对CPU时间进行记帐,该记帐值要被诸如 getrusage()等函数用到。
许多开发者,也包含作者都认为Linux缺乏对POSIX1003的全面支持是个很重要的缺陷。幸运的是,目前已经有了解决方法,特别是,TimeSys公司已经有了一个不错的实现。
除了对POSIX的贡献外,TimeSys已经开发了一些全新的资源访问控制机制。这些新技术使得实时应用程序可以节约CPU时间或网络带宽。比如结合它们的中断线程模型、抢占式内核和其它功能,能提供高出标准内核2或3倍的响应速度。
到目前为止,Linux好象还不允许让用户空间应用程序自己注册函数处理中断,该机制称为用户空间中断处理机制,它目前已经在RIX、SGI's UNIX等系统中得到使用。
有趣的是,SGI在Linux中,利用他们的ds1286实时时钟接口,可以从用户空间通过一个实时时钟访问中断。可以在这里.找到相关信息
和用户级别中断处理相关的是用户空间和设备的DMA通讯,提供此功能的补丁可以从这里获得。
保证
显然,没有哪个实时Linux厂商愿意保证响应速度。如果提供保证的话,那么应该类似下面的声明:
使用我们的Linux内核,和所需的硬件与设备等,我们可以保证你的应用程序如果被锁在内存,具有最高优先权。。那么将能够在你实时硬件发出中断信号后的N微妙内被唤醒。如果你无法获得上述能力,我们将视其为一个bug。
但是我们从没看到过该类保证,这是为什么呢? 我们认为有如下几种可能。
厂商的这种保证毫无意义,没有客户需要它。但我们认为许多开发者希望获得保证,事实上,硬实时本身就意味着一个保证。
厂商没有充分测量他们的内核和环境,以至于有能力给出一个保证。这是一个小花招。因为单独测量无法给出另人满意的保证,代码必须在所有环境中被测试而且要在最坏情况下测试。从厂商的宣称中,看起来它们似乎已经花了大量精力去测量和学习代码,但事实上,对工程师来说可能给出给定环境下的一些数据更能让人放心。
如果考虑所有需要保证的方面,Linux所涉及的方面实在是太多了。这点可能就是问题的所在。开发者希望能够改写他们的内核,他们希望能下载驱动程序并使用。这些行为都超出了厂商的控制,所以如果厂商公开声明一个保证,就有可能仅仅是针对某一个特定系统在一些选定的情况下才有效的。
也许我们会看到某些妥协的保证,例如像“在Pentium 系列的计算机上应用程序响应速度为100ms或更短”另外加上驱动程序花去的时间。驱动程序使用的时间非常重要,因为驱动程序中的中断处理代码往往在响应时间中占主要部分。
下一次讲什么?
在第三篇文章中,我们将讨论Linux内核以外有何其它方法提高实时性。我们要讨论诸如RTLinux and RTAI所采用的方法。还要借助基准对各种方法进行全面比较。
Kevin Dankwardt
在前两篇文章中(请看实时和Linux的第一部分实时和Linux的第二部分:抢占式内核)我们讨论了实时的基本概念和提高内核实时性的途径。这次我们将介绍两种改造普通内核为实时内核的例子,其中要引入一种介于硬件和普通Linux之间独立的小型实时性内核。我们同时还要借助基准(一组测量程序)来对桌面/服务器使用的Linux内核与改造过的内核做比较。
我们不深入讨论LynxWorks和OnCore系统提供的与Linux兼容的优先级(proprietary)内核。LynuxWorks提供了一个兼容Linux API的实时内核;而OnCore系统则提供了一个实时微内核,并利用各种各样的方法提供了Linux所具有的功能。通过这些方法,用户可以在微内核之上运行具有实时性的Linux内核。
本文所讨论的方法局限于单CPU的实时性。如果系统使用多个CPU,那么可能需要其它新的解决方法。比如,这时不要在一个已经有其它实时进程等待的CPU上调用新进程,如此才可避免内核抢占。另外,多CPU系统还可能将中断报告给一个特定的CPU,然后中断处理程序自另一个CPU上结束,从而避免了延迟中断响应的问题。
我们所谈到的所有的Linux实时措施,通常都不合适多CPU系统——比如,RTAI系统它必须使用额外的功能才满足多CPU系统——但是我们针对的是嵌入系统的开发者,多数嵌入Linux设备都只有唯一的通用CPU。
什么是实时子内核?
在典型的实时系统中有许多任务必须具有确定性,具有实时性。另外对硬件中断的响应往往也必须具有确定性。为了获得这些要求有一个巧妙的办法:建立一个即可以提供上述服务(有确定性)又能运行标准Linux内核的小型操作系统内核,这样以来可以满足实时要求的同时还能提供所有标准Linux的功能。
这些实时子内核都实现了几组API函数分别为任务、中断处理程序、与Linux进程通讯等服务。当子内核任务运行或当子内核处理中断时,Linux将被挂起等待。注意,这里Linux是不允许禁止中断的(标准Linux在处理中断时,一般是要禁止同类或所有中断的)。另外要明确:这些子内核不是完整的操作系统内核,它们没有足够的设备驱动程序,也没有提供其它的库,因此它们仅仅作为Linux系统的一个插件,而并非独立的操作系统。
但是这些子内核随软件版本的更新换代,更多更强的功能被不断加入,其结构也变得越来越复杂。虽然如此子内核的最大魅力还是在于:它们不影响使用标准Linux的任何功能,它们仅仅处理应用程序中对实时性有要求的那一部分任务。
从某种角度看,Linux在这里是子内核OS中的一个任务,而且是个优先级最低的任务,或是个空任务(idle)。
图1描述了子内核和Linux之间的关系。

子内核和Linux协同工作需要作三件事情:
1)对Linux内核打补丁给其加入一些钩子(hook),便于填加功能。
2)修改中断处理程序。
3)利用可装载模块提供API和所需功能。
子内核为实时任务提供了一组API。这些API类似于POSIX线程和其它POSIX函数,另外也实现一些新函数。使用子内核意味着:虽然子内核API独立实现而且和传统LinuxAPI可能有少许不同,但是Linux开发者可以像使用熟悉的Linux API一样利用子内核的API创建实时任务。
内核原代码中的中断处理程序需要修改。比如修改禁止中断的内核代码,使得在内核源代码和驱动程序代码重新编译后,该代码实际上失去禁止中断的作用。一定要注意这个变化,因为如果疏忽,比如,让一个驱动程序脱离修改过的头文件在别处被编译,那么它可能会禁止中断并且会破坏实时性。另外一些非标准代码,比如,简单的内联一个中断禁止汇编语句指令就有可能破坏实时性。好在,实际操作中这些情况并不常见,而且也可以避免。这点再次印证了一个道理——不存在绝对的实时方法。
RTLinux 和RTAI
使用子内核的两个最著名的系统是RTLinux和RTAT。这两个系统都是为硬实时要求设计,其中用到的技术决非仅仅是抢占式内核(但也有些地方比抢占内核要弱)。在实际应用中,实时系统为开发者提供了很大便利。RTLinux和RTAI都提供了许多额外的、实时相关函数。比如RTAI就提供了一恒时(rate-monotonic)调度方法和early-deadline-first 调度方法。当然同时也保留了传统的优先级调度方法。
子内核提供了POSIX规范指定的函数和一些专有函数,同时还提供了创建任务,禁止/激活中断、同步和通讯的API接口。当使用RTLinux或RTAI时开发者可以使用POSIX函数和上述的新建API 。
RTLinux和RTAI都支持用户空间进程和内核协同工作。因为Linux实时应用程序显然希望能够使用Liunx的功能。所以RTLinux支持用户空间进程调用信号处理程序,而且也同时支持在内核空间和用户空间读写FIFO和共享内存。RTAI也不例外地提供了FIFO、共享内存和一个完整的硬实时机制——LXRT,该机制可以在用户空间中使用。
但是既是使用这些机制也并不能使得Linux内核实时化,一个用户空间进程仍然必须避免使用系统调用,因为系统调用有可能在内核中堵塞。另外无论是RTLinux还是RTAI都无法和一个抢占式内核一同工作。因为这两种方法是正交的(orthogonal),也许今后他们可能融合。特别是现在Love的补丁已经进入2.5版本的标准内核,而且很有可能在2.6稳定版中发布。
选择时要考虑的问题
开发者需要实时性时,可以选择RTlinux,RTAI、Love的抢占式内核和TimeSys的抢占内核。选择的标准众多,我们这里简要地说明多数开发者最关心的几个问题。
l 哪一种是开源的,是由独立的自由开发者维护的? RTAI和Love。
l 哪一种软件的技术具有专利?RTLinux。
l 哪一种是2.5内核树的一部分 ? Love。
l 哪些除了抢占式内核外还有其它实时功能?TimeSys,RTAI和RTLinux
l 哪些可以继续免费下载?RTAI和Love(我以经验判断)
l 哪些能控制中断,并且有可能提供接近机器速度的(near-machine-level)的响应速度? RTLinux和RTAI。
内核是否对不同处理器都有效
注意这些实时方法对于可以运行Linux的所有处理器都有效;如果要移植到新处理器,就需要进行一些额外的改造。但是不要担心,我们上面提到的4种系统或补丁的研发行为都是相当活跃的,我可以打宝票对新型CPU的支持指日可待。
Love的抢占内核目前只可用于x86系列的处理器,但是借助MontaVista的支持很有可能被移植到MontaVista支持的绝大多数CPU上——如果不是全部的话,它们包括PowerPC, ARM, MIPS, SuperH等等;TimeSys内核目前可用于PowerPC, ARM, SuperH 和 Pentium等处理器。RTlinux可用于x86和PowerPC;RTAI可用于x86和PowerPC。
基准
你可以从网上(请看资源筐中的K Computing基准)下载基准程序。我们使用的所有基准程序都是运行在456Mhz的Celeron体系结构上的,但对于其它,如x86的CPU,也有类似的运行结果。目前对于其它类型的CPU还没有相应的基准程序。
我们将测量这几种内核:红帽子7.2版本的内核,该内核的基础是2.4.7版本的Linux内核;TimeSys Linux3.0内核,它的基础是2.4.7版本Linux内核;另外还要测量打上Robert Love补丁和MontaVista的抢占补丁的内核。我们将简称这些内核为红帽子、TimeSys和Love。我们会分别用基准程序测量RTAI和RTLinux内核的。
基准程序包括测量nanosleep()调用的响应速度。睡眠时间的精确度对内核是否能可靠的服务用户空间的实时进程至关重要。Linux nanosleep()函数允许请求纳秒级的睡眠。我们基准程序将请求50豪秒睡眠,有趣的是,请求nanosleep()睡眠N豪秒,实际却肯定要睡眠N+10毫秒。所以我们测量实际睡眠时间与预想的60豪秒的接近程度。另外,要注意当请求睡眠小于两毫秒时,nanosleep()在内核中进行忙循环。所以忙等待将无法象真睡眠一样模拟中断响应时间。
基准程序一共有1000个例子,后988个有图形显示。前两个没有使用图形显示,是为了避免冷缓存造成的缓存减速(cache slowdowns as a result of a cold cache)。基准程序通过mlockall()被锁定到内存并且通过sched_set_scheduler()和sched_get_priority_max()给予其最高的FIFO优先级。
基准的核心是:
t1 = get_cycles();
nanosleep(fifty_ms, NULL);
t2 = get_cycles();
jitter[i] = t2 - t1;
get_cycles()函数是以机器无关的方法读取CPU的周期记数(cycle counter)的。在0x86机器上需要读时间戳计数(TSC),TSC随CPU频率增加。所以如果在500MHZ的CPU上,TSC每秒增加500,000,000次
CPU的频率可以在/proc/cpuinfo 中获得。我们每10个指令周期读一次TSC,然后就可以精确比较我们的定时时间和实际时间的差值了。
和我们希望的睡眠时间50+10毫秒的误差,计算如下:
diff = (jitter/KHz) - 10 - 50;

下面五个Benno Senoner的基准程序用来进行压力测试,它们是响应速度测试的一部分,这些对系统的压力测试分别是:拷贝磁盘文件、读磁盘文件、写磁盘文件、读/proc文件系统和执行X11性能测试。 图2-6给出了三种内核装入上述任务时的负载情况。
图 2拷贝磁盘文件
图 3读磁盘文件
图 4写磁盘文件
图 5 读/proc文件系统
图 6 执行X11性能测试
可以看出,抢占内核能够提供更好的响应速度,而且因为抢占内核在不需要改变API的情况下能被应用程序使用。所以它们显然对Linux开发者最有吸引力。
由于红帽子内核响应速度明显低于Love和TimeSys的内核,所以我们进一步显示对比Love和TimeSys的抢占内核。结果请看图7-11
图 7 Love和TimeSys :拷贝磁盘文件
图 8 Love和TimeSys :读磁盘文件
图 9 Love和TimeSys :写磁盘文件
图 10 Love和TimeSys :读/proc文件系统
图 11 Love和TimeSys :执行X11性能测试

RTLinux 和 RTAI 基准
我们希望RTAI和RTLinux能在高负载的情况下提供强健的性能。通过我们基准程序的测试,它们的确能符合我们的要求。但是我们仍然必须防范一些不当操作,要记住以下原则 :对内存分配这样的任务执行期间不要堵塞,不要使用没有禁止中断的驱动程序,要避免耗时的优先级反转(优先级反转是指一个任务等待比它优先级低的任务释放资源而被阻塞,如果这时 有中等优先级的就绪任务,阻塞会进一步恶化。优先级继承技术可用来解决优先级反转问题)
为了测量RTAI和RTLinux基准,我们创建了周期运行的任务并测量实际执行时刻和希望周期率的间隔,RTLinux和RTAI的最坏情况时差为30毫秒或更短。我们的基准程序可以免费下载(请看K Computing 基准资源框)。

资源

没有评论: