Home Linux的UIO驱动机制
Post
Cancel

Linux的UIO驱动机制

Preface

For many types of devices, creating a Linux kernel driver is overkill. All that is really needed is some way to handle an interrupt and provide access to the memory space of the device. The logic of controlling the device does not necessarily have to be within the kernel, as the device does not need to take advantage of any of other resources that the kernel provides. One such common class of devices that are like this are for industrial I/O cards.

To address this situation, the userspace I/O system (UIO) was designed. For typical industrial I/O cards, only a very small kernel module is needed. The main part of the driver will run in user space. This simplifies development and reduces the risk of serious bugs within a kernel module.

对于许多类型的设备,创建 Linux 内核驱动程序是多余的。真正需要的是某种方式来处理中断并提供对设备内存空间的访问。控制设备的逻辑不一定必须在内核中,因为设备不需要利用内核提供的任何其他资源。此类常见的一类设备用于工业 I/O 卡。

为了解决这种情况,设计了用户空间 I/O 系统 (UIO)。对于典型的工业 I/O 卡,只需要一个非常小的内核模块。驱动程序的主要部分将在用户空间中运行。这简化了开发并降低了内核模块中出现严重错误的风险。另外,其实还提供了驱动的兼容性,方便移植,因为运行在用户空间了,不需要频繁跟着kernel版本更新了。

请注意,UIO 不是通用驱动程序接口。其他内核子系统(如网络或串行或 USB)已经很好地处理过设备不适合 UIO 驱动程序。非常适合 UIO 驱动程序的硬件满足以下所有要求:

  • 该设备具有可以映射的内存。可以通过写入该内存来完全控制设备
  • 设备通常会产生中断
  • 该设备不适合标准内核子系统之一。(如PCI子系统-PCI设备,USB子系统-USB设备。串口设备,IIC设备等,内核已经提供框架的)

About UIO

如果使用 UIO 作为卡的驱动程序,可以达到以下效果。

  • 只需编写和维护一个小的内核模块
  • 使用个人习惯的所有工具和库在用户空间开发驱动程序的主要部分(内核空间总会保留一小部分)。
  • 驱动程序中的错误不会使内核崩溃
  • 无需重新编译内核即可更新驱动程序

How UIO works

Each UIO device is accessed through a device file and several sysfs attribute files. The device file will be called /dev/uio0 for the first device, and /dev/uio1, /dev/uio2 and so on for subsequent devices.

/dev/uioX is used to access the address space of the card. Just use mmap() to access registers or RAM locations of your card.

Interrupts are handled by reading from /dev/uioX. A blocking read() from /dev/uioX will return as soon as an interrupt occurs. You can also use select() on /dev/uioX to wait for an interrupt. The integer value read from /dev/uioX represents the total interrupt count. You can use this number to figure out if you missed some interrupts.

每个 UIO 设备都通过一个设备文件和几个 sysfs 属性文件来访问。 对于第一个设备,设备文件将被称为 /dev/uio0,对于后续设备,设备文件将被称为 /dev/uio1、/dev/uio2 等等。

/dev/uioX 用于访问卡的地址空间。 只需使用 mmap() 访问卡的寄存器或 RAM 位置。

通过从 /dev/uioX 读取来处理中断。 一旦发生中断,来自 /dev/uioX 的阻塞 read() 将立即返回。 您还可以在 /dev/uioX 上使用 select() 来等待中断。 从 /dev/uioX 读取的整数值表示总中断计数。 您可以使用此数字来确定您是否错过了一些中断。

这里中断处理是否会丢中断?如果中断频繁,就不停轮询?DPDK,SPDK就这样做的




For some hardware that has more than one interrupt source internally, but not separate IRQ mask and status registers, there might be situations where userspace cannot determine what the interrupt source was if the kernel handler disables them by writing to the chip’s IRQ register. In such a case, the kernel has to disable the IRQ completely to leave the chip’s register untouched. Now the userspace part can determine the cause of the interrupt, but it cannot re-enable interrupts. Another cornercase is chips where re-enabling interrupts is a read-modify-write operation to a combined IRQ status/acknowledge register. This would be racy if a new interrupt occurred simultaneously.

To address these problems, UIO also implements a write() function. It is normally not used and can be ignored for hardware that has only a single interrupt source or has separate IRQ mask and status registers. If you need it, however, a write to /dev/uioX will call the irqcontrol() function implemented by the driver. You have to write a 32-bit value that is usually either 0 or 1 to disable or enable interrupts. If a driver does not implement irqcontrol(), write() will return with -ENOSYS.

对于内部有多个中断源但没有单独的 IRQ 掩码和状态寄存器的某些硬件,如果内核处理程序通过写入芯片的 IRQ 寄存器来禁用它们,用户空间可能无法确定中断源是什么。 在这种情况下,内核必须完全禁用 IRQ 以保持芯片寄存器不变。 现在用户空间部分可以确定中断的原因,但不能重新启用中断。 另一个极端情况是重新启用中断的芯片是对组合 IRQ 状态/确认寄存器的读-修改-写操作。 如果同时发生新的中断,这将是不恰当的。

为了解决这些问题,UIO 还实现了 write() 函数。 它通常不使用,对于只有一个中断源或具有单独 IRQ 掩码和状态寄存器的硬件可以忽略。 但是,如果您需要它,写入 /dev/uioX 将调用驱动程序实现的 irqcontrol() 函数。 您必须写入一个通常为 0 或 1 的 32 位值来禁用或启用中断。 如果驱动程序没有实现 irqcontrol(),write() 将返回 -ENOSYS。




To handle interrupts properly, your custom kernel module can provide its own interrupt handler. It will automatically be called by the built-in handler.

为了正确处理中断,您的自定义内核模块可以提供自己的中断处理程序。 它将由内置处理程序自动调用。 ??就是在小的内核模块中补充irqhandler函数??

For cards that don’t generate interrupts but need to be polled, there is the possibility to set up a timer that triggers the interrupt handler at configurable time intervals. This interrupt simulation is done by calling uio_event_notify() from the timer’s event handler.

对于不产生中断但需要轮询的卡,可以设置一个定时器,以可配置的时间间隔触发中断处理程序。 此中断模拟是通过从定时器的事件处理程序调用 uio_event_notify() 来完成的。

Each driver provides attributes that are used to read or write variables. These attributes are accessible through sysfs files. A custom kernel driver module can add its own attributes to the device owned by the uio driver, but not added to the UIO device itself at this time. This might change in the future if it would be found to be useful.

每个驱动程序都提供用于读取或写入变量的属性。 这些属性可通过 sysfs 文件访问。 自定义内核驱动模块可以将自己的属性添加到 uio 驱动拥有的设备中,但此时不能添加到 UIO 设备本身。 如果发现它有用,将来可能会改变。




The following standard attributes are provided by the UIO framework:

  • name: The name of your device. It is recommended to use the name of your kernel module for this.
  • version: A version string defined by your driver. This allows the user space part of your driver to deal with different versions of the kernel module.
  • event: The total number of interrupts handled by the driver since the last time the device node was read.

UIO 框架提供了以下标准属性:

  • 名称:您的设备的名称。 建议为此使用内核模块的名称。

  • version:驱动程序定义的版本字符串。 这允许驱动程序的用户空间部分处理不同版本的内核模块。

  • event:自上次读取设备节点以来驱动程序处理的中断总数。




These attributes appear under the /sys/class/uio/uioX directory. Please note that this directory might be a symlink, and not a real directory. Any userspace code that accesses it must be able to handle this.

Each UIO device can make one or more memory regions available for memory mapping. This is necessary because some industrial I/O cards require access to more than one PCI memory region in a driver.

Each mapping has its own directory in sysfs, the first mapping appears as /sys/class/uio/uioX/maps/map0/. Subsequent mappings create directories map1/, map2/, and so on. These directories will only appear if the size of the mapping is not 0.

Each mapX/ directory contains four read-only files that show attributes of the memory:

  • name: A string identifier for this mapping. This is optional, the string can be empty. Drivers can set this to make it easier for userspace to find the correct mapping.
  • addr: The address of memory that can be mapped.
  • size: The size, in bytes, of the memory pointed to by addr.
  • offset: The offset, in bytes, that has to be added to the pointer returned by mmap() to get to the actual device memory. This is important if the device’s memory is not page aligned. Remember that pointers returned by mmap() are always page aligned, so it is good style to always add this offset.

这些属性出现在 /sys/class/uio/uioX 目录下。 请注意,此目录可能是符号链接,而不是真实目录。 任何访问它的用户空间代码都必须能够处理这个问题。

每个 UIO 设备可以使一个或多个内存区域可用于内存映射。 这是必要的,因为某些工业 I/O 卡需要访问驱动程序中的多个 PCI 内存区域。

每个映射在 sysfs 中都有自己的目录,第一个映射显示为 /sys/class/uio/uioX/maps/map0/。 后续映射会创建目录 map1/、map2/ 等。 这些目录只有在映射的大小不为 0 时才会出现。

每个 mapX/ 目录包含四个显示内存属性的只读文件:

  • name:此映射的字符串标识符。 这是可选的,字符串可以为空。 驱动程序可以设置它以使用户空间更容易找到正确的映射。
  • addr:可以映射的内存地址。
  • size:addr 指向的内存大小,以字节为单位。
  • offset:偏移量,以字节为单位,必须添加到 mmap() 返回的指针以获取实际设备内存。 如果设备的内存不是页面对齐的,这很重要。 请记住,mmap() 返回的指针始终是页面对齐的,因此始终添加此偏移量是一种不错的方式。

From userspace, the different mappings are distinguished by adjusting the offset parameter of the mmap() call. To map the memory of mapping N, you have to use N times the page size as your offset:

从用户空间,通过调整 mmap() 调用的 offset 参数来区分不同的映射。 要映射映射 N 的内存,您必须使用 N 倍的页面大小作为您的偏移量:

1
offset = N * getpagesize();

Sometimes there is hardware with memory-like regions that can not be mapped with the technique described here, but there are still ways to access them from userspace. The most common example are x86 ioports. On x86 systems, userspace can access these ioports using ioperm(), iopl(), inb(), outb(), and similar functions.

Since these ioport regions can not be mapped, they will not appear under /sys/class/uio/uioX/maps/ like the normal memory described above. Without information about the port regions a hardware has to offer, it becomes difficult for the userspace part of the driver to find out which ports belong to which UIO device.

To address this situation, the new directory /sys/class/uio/uioX/portio/ was added. It only exists if the driver wants to pass information about one or more port regions to userspace. If that is the case, subdirectories named port0, port1, and so on, will appear underneath /sys/class/uio/uioX/portio/.

Each portX/ directory contains four read-only files that show name, start, size, and type of the port region:

  • name: A string identifier for this port region. The string is optional and can be empty. Drivers can set it to make it easier for userspace to find a certain port region.
  • start: The first port of this region.
  • size: The number of ports in this region.
  • porttype: A string describing the type of port.

有时有些硬件的内存区域无法使用此处描述的技术进行映射,但仍有一些方法可以从用户空间访问它们。 最常见的例子是 x86 ioports。 在 x86 系统上,用户空间可以使用 ioperm()、iopl()、inb()、outb() 和类似函数访问这些 ioport。

由于这些 ioport 区域无法映射,因此它们不会像上面描述的普通内存那样出现在 /sys/class/uio/uioX/maps/ 下。 如果没有有关硬件必须提供的端口区域的信息,驱动程序的用户空间部分就很难找出哪些端口属于哪个 UIO 设备。

为了解决这种情况,添加了新目录 /sys/class/uio/uioX/portio/。 仅当驱动程序想要将有关一个或多个端口区域的信息传递给用户空间时才存在。 如果是这种情况,名为 port0、port1 等的子目录将出现在 /sys/class/uio/uioX/portio/ 下。

补充:这部分现在应该用不到了。

Writing your own kernel module

Please have a look at uio_cif.c as an example. The following paragraphs explain the different sections of this file.

struct uio_info

This structure tells the framework the details of your driver, Some of the members are required, others are optional.

  • const char *name: Required. The name of your driver as it will appear in sysfs. I recommend using the name of your module for this.
  • const char *version: Required. This string appears in /sys/class/uio/uioX/version.
  • struct uio_mem mem[ MAX_UIO_MAPS ]: Required if you have memory that can be mapped with mmap(). For each mapping you need to fill one of the uio_mem structures. See the description below for details.
  • struct uio_port port[ MAX_UIO_PORTS_REGIONS ]: Required if you want to pass information about ioports to userspace. For each port region you need to fill one of the uio_port structures. See the description below for details.
  • long irq: Required. If your hardware generates an interrupt, it’s your modules task to determine the irq number during initialization. If you don’t have a hardware generated interrupt but want to trigger the interrupt handler in some other way, set irq to UIO_IRQ_CUSTOM. If you had no interrupt at all, you could set irq to UIO_IRQ_NONE, though this rarely makes sense.
  • unsigned long irq_flags: Required if you’ve set irq to a hardware interrupt number. The flags given here will be used in the call to request_irq().
  • int (*mmap)(struct uio_info *info, struct vm_area_struct *vma): Optional. If you need a special mmap() function, you can set it here. If this pointer is not NULL, your mmap() will be called instead of the built-in one.
  • int (*open)(struct uio_info *info, struct inode *inode): Optional. You might want to have your own open(), e.g. to enable interrupts only when your device is actually used.
  • int (*release)(struct uio_info *info, struct inode *inode): Optional. If you define your own open(), you will probably also want a custom release() function.
  • int (*irqcontrol)(struct uio_info *info, s32 irq_on): Optional. If you need to be able to enable or disable interrupts from userspace by writing to /dev/uioX, you can implement this function. The parameter irq_on will be 0 to disable interrupts and 1 to enable them.

Usually, your device will have one or more memory regions that can be mapped to user space. For each region, you have to set up a struct uio_mem in the mem[] array. Here’s a description of the fields of struct uio_mem:

  • const char *name: Optional. Set this to help identify the memory region, it will show up in the corresponding sysfs node.
  • int memtype: Required if the mapping is used. Set this to UIO_MEM_PHYS if you have physical memory on your card to be mapped. Use UIO_MEM_LOGICAL for logical memory (e.g. allocated with __get_free_pages() but not kmalloc()). There’s also UIO_MEM_VIRTUAL for virtual memory.
  • phys_addr_t addr: Required if the mapping is used. Fill in the address of your memory block. This address is the one that appears in sysfs.
  • resource_size_t size: Fill in the size of the memory block that addr points to. If size is zero, the mapping is considered unused. Note that you must initialize size with zero for all unused mappings.
  • void *internal_addr: If you have to access this memory region from within your kernel module, you will want to map it internally by using something like ioremap(). Addresses returned by this function cannot be mapped to user space, so you must not store it in addr. Use internal_addr instead to remember such an address.

Please do not touch the map element of struct uio_mem! It is used by the UIO framework to set up sysfs files for this mapping. Simply leave it alone.

Sometimes, your device can have one or more port regions which can not be mapped to userspace. But if there are other possibilities for userspace to access these ports, it makes sense to make information about the ports available in sysfs. For each region, you have to set up a struct uio_port in the port[] array. Here’s a description of the fields of struct uio_port:

  • char *porttype: Required. Set this to one of the predefined constants. Use UIO_PORT_X86 for the ioports found in x86 architectures.
  • unsigned long start: Required if the port region is used. Fill in the number of the first port of this region.
  • unsigned long size: Fill in the number of ports in this region. If size is zero, the region is considered unused. Note that you must initialize size with zero for all unused regions.

Please do not touch the portio element of struct uio_port! It is used internally by the UIO framework to set up sysfs files for this region. Simply leave it alone.

Adding an interrupt handler

What you need to do in your interrupt handler depends on your hardware and on how you want to handle it. You should try to keep the amount of code in your kernel interrupt handler low. If your hardware requires no action that you have to perform after each interrupt, then your handler can be empty.

If, on the other hand, your hardware needs some action to be performed after each interrupt, then you must do it in your kernel module. Note that you cannot rely on the userspace part of your driver. Your userspace program can terminate at any time, possibly leaving your hardware in a state where proper interrupt handling is still required.

There might also be applications where you want to read data from your hardware at each interrupt and buffer it in a piece of kernel memory you’ve allocated for that purpose. With this technique you could avoid loss of data if your userspace program misses an interrupt.

您需要在中断处理程序中做什么取决于您的硬件以及您希望如何处理它。 您应该尽量减少内核中断处理程序中的代码量。 如果您的硬件不需要您在每次中断后必须执行的操作,那么您的处理程序可以为空。

另一方面,如果您的硬件需要在每次中断后执行某些操作,那么您必须在内核模块中执行此操作。 请注意,您不能依赖驱动程序的用户空间部分。 您的用户空间程序可以随时终止,可能会使您的硬件处于仍需要适当中断处理的状态。

在某些应用程序中,您可能希望在每次中断时从硬件中读取数据,并将其缓冲在您为此目的分配的一块内核内存中。 使用这种技术,如果您的用户空间程序错过了中断,您可以避免数据丢失。

A note on shared interrupts: Your driver should support interrupt sharing whenever this is possible. It is possible if and only if your driver can detect whether your hardware has triggered the interrupt or not. This is usually done by looking at an interrupt status register. If your driver sees that the IRQ bit is actually set, it will perform its actions, and the handler returns IRQ_HANDLED. If the driver detects that it was not your hardware that caused the interrupt, it will do nothing and return IRQ_NONE, allowing the kernel to call the next possible interrupt handler.

If you decide not to support shared interrupts, your card won’t work in computers with no free interrupts. As this frequently happens on the PC platform, you can save yourself a lot of trouble by supporting interrupt sharing.

关于共享中断的说明:您的驱动程序应尽可能支持中断共享。 当且仅当您的驱动程序可以检测到您的硬件是否触发了中断时才有可能。 这通常通过查看中断状态寄存器来完成。 如果您的驱动程序看到实际设置了 IRQ 位,它将执行其操作,并且处理程序返回 IRQ_HANDLED。 如果驱动程序检测到不是您的硬件引起了中断,它将什么都不做并返回 IRQ_NONE,允许内核调用下一个可能的中断处理程序。

如果您决定不支持共享中断,您的卡将无法在没有空闲中断的计算机上工作。 由于PC平台经常出现这种情况,支持中断共享可以省去很多麻烦。

Using uio_pdrv for platform devices

In many cases, UIO drivers for platform devices can be handled in a generic way. In the same place where you define your struct platform_device, you simply also implement your interrupt handler and fill your struct uio_info. A pointer to this struct uio_info is then used as platform_data for your platform device.

You also need to set up an array of struct resource containing addresses and sizes of your memory mappings. This information is passed to the driver using the .resource and .num_resources elements of struct platform_device.

You now have to set the .name element of struct platform_device to "uio_pdrv" to use the generic UIO platform device driver. This driver will fill the mem[] array according to the resources given, and register the device.

The advantage of this approach is that you only have to edit a file you need to edit anyway. You do not have to create an extra driver.

在许多情况下,平台设备的 UIO 驱动程序可以以通用方式处理。 在定义 struct platform_device 的同一位置,您只需实现中断处理程序并填充 struct uio_info。 然后将指向此 struct uio_info 的指针用作平台设备的 platform_data

您还需要设置一个 struct resource 数组,其中包含内存映射的地址和大小。 此信息使用 struct platform_device.resource.num_resources 元素传递给驱动程序。

您现在必须将 struct platform_device.name 元素设置为 "uio_pdrv" 才能使用通用 UIO 平台设备驱动程序。 该驱动程序将根据给定的资源填充mem[]数组,并注册设备。

这种方法的优点是您只需要编辑一个无论如何都需要编辑的文件。 您不必创建额外的驱动程序。

Using uio_pdrv_genirq for platform devices

Especially in embedded devices, you frequently find chips where the irq pin is tied to its own dedicated interrupt line. In such cases, where you can be really sure the interrupt is not shared, we can take the concept of uio_pdrv one step further and use a generic interrupt handler. That’s what uio_pdrv_genirq does.

The setup for this driver is the same as described above for uio_pdrv, except that you do not implement an interrupt handler. The .handler element of struct uio_info must remain NULL. The .irq_flags element must not contain IRQF_SHARED.

You will set the .name element of struct platform_device to "uio_pdrv_genirq" to use this driver.

The generic interrupt handler of uio_pdrv_genirq will simply disable the interrupt line using disable_irq_nosync(). After doing its work, userspace can reenable the interrupt by writing 0x00000001 to the UIO device file. The driver already implements an irq_control() to make this possible, you must not implement your own.

Using uio_pdrv_genirq not only saves a few lines of interrupt handler code. You also do not need to know anything about the chip’s internal registers to create the kernel part of the driver. All you need to know is the irq number of the pin the chip is connected to.

When used in a device-tree enabled system, the driver needs to be probed with the "of_id" module parameter set to the "compatible" string of the node the driver is supposed to handle. By default, the node’s name (without the unit address) is exposed as name for the UIO device in userspace. To set a custom name, a property named "linux,uio-name" may be specified in the DT node.

尤其是在嵌入式设备中,您经常会发现 irq 引脚与自己的专用中断线相连的芯片。在这种情况下,您可以确定中断不是共享的,我们可以将 uio_pdrv 的概念更进一步,并使用通用中断处理程序。这就是 uio_pdrv_genirq 所做的。

此驱动程序的设置与上述 uio_pdrv 的设置相同,只是您没有实现中断处理程序。 struct uio_info 的 .handler 元素必须保持为 NULL。 .irq_flags 元素不得包含 IRQF_SHARED。

您将 struct platform_device 的 .name 元素设置为“uio_pdrv_genirq”以使用此驱动程序。

uio_pdrv_genirq 的通用中断处理程序将简单地使用 disable_irq_nosync() 禁用中断线。完成工作后,用户空间可以通过将 0x00000001 写入 UIO 设备文件来重新启用中断。驱动程序已经实现了一个 irq_control() 来实现这一点,你不能实现你自己的。

使用 uio_pdrv_genirq 不仅节省了几行中断处理程序代码。您也不需要了解芯片内部寄存器的任何信息来创建驱动程序的内核部分。您只需要知道芯片连接到的引脚的 irq 号。

在启用设备树的系统中使用时,需要使用“of_id”模块参数设置为驱动程序应该处理的节点的“兼容”字符串来探测驱动程序。默认情况下,节点的名称(不包括单元地址)在用户空间中作为 UIO 设备的名称公开。要设置自定义名称,可以在 DT 节点中指定名为“linux,uio-name”的属性。

Using uio_dmem_genirq for platform devices

In addition to statically allocated memory ranges, they may also be a desire to use dynamically allocated regions in a user space driver. In particular, being able to access memory made available through the dma-mapping API, may be particularly useful. The uio_dmem_genirq driver provides a way to accomplish this.

This driver is used in a similar manner to the "uio_pdrv_genirq" driver with respect to interrupt configuration and handling.

Set the .name element of struct platform_device to "uio_dmem_genirq" to use this driver.

When using this driver, fill in the .platform_data element of struct platform_device, which is of type struct uio_dmem_genirq_pdata and which contains the following elements:

  • struct uio_info uioinfo: The same structure used as the uio_pdrv_genirq platform data
  • unsigned int *dynamic_region_sizes: Pointer to list of sizes of dynamic memory regions to be mapped into user space.
  • unsigned int num_dynamic_regions: Number of elements in dynamic_region_sizes array.

The dynamic regions defined in the platform data will be appended to the mem[] array after the platform device resources, which implies that the total number of static and dynamic memory regions cannot exceed MAX_UIO_MAPS.

The dynamic memory regions will be allocated when the UIO device file, /dev/uioX is opened. Similar to static memory resources, the memory region information for dynamic regions is then visible via sysfs at /sys/class/uio/uioX/maps/mapY/*. The dynamic memory regions will be freed when the UIO device file is closed. When no processes are holding the device file open, the address returned to userspace is ~0.

除了静态分配的内存范围之外,它们还可能希望在用户空间驱动程序中使用动态分配的区域。特别是,能够访问通过 dma-mapping API 提供的内存可能特别有用。 uio_dmem_genirq 驱动程序提供了一种方法来实现这一点。

在中断配置和处理方面,该驱动程序的使用方式与“uio_pdrv_genirq”驱动程序类似。

将 struct platform_device 的 .name 元素设置为“uio_dmem_genirq”以使用此驱动程序。

使用该驱动时,填写struct platform_device的.platform_data元素,它的类型为struct uio_dmem_genirq_pdata,包含以下元素:

struct uio_info uioinfo:与 uio_pdrv_genirq 平台数据使用的结构相同

unsigned int *dynamic_region_sizes:指向要映射到用户空间的动态内存区域大小列表的指针。

unsigned int num_dynamic_regions:dynamic_region_sizes 数组中的元素数。

平台数据中定义的动态区域将附加到平台设备资源之后的mem[]数组中,这意味着静态和动态内存区域的总数不能超过MAX_UIO_MAPS。

当打开 UIO 设备文件 /dev/uioX 时,将分配动态内存区域。与静态内存资源类似,动态区域的内存区域信息随后通过位于 /sys/class/uio/uioX/maps/mapY/* 的 sysfs 可见。当 UIO 设备文件关闭时,动态内存区域将被释放。当没有进程保持设备文件打开时,返回给用户空间的地址为 ~0。

Writing a driver in userspace

Once you have a working kernel module for your hardware, you can write the userspace part of your driver. You don’t need any special libraries, your driver can be written in any reasonable language, you can use floating point numbers and so on. In short, you can use all the tools and libraries you’d normally use for writing a userspace application.

一旦你的硬件有了一个可以工作的内核模块,你就可以编写驱动程序的用户空间部分。 你不需要任何特殊的库,你的驱动程序可以用任何合理的语言编写,你可以使用浮点数等等。 简而言之,您可以使用通常用于编写用户空间应用程序的所有工具和库。

Getting information about your UIO device

Information about all UIO devices is available in sysfs. The first thing you should do in your driver is check name and version to make sure you’re talking to the right device and that its kernel driver has the version you expect.

You should also make sure that the memory mapping you need exists and has the size you expect.

There is a tool called lsuio that lists UIO devices and their attributes. It is available here:

http://www.osadl.org/projects/downloads/UIO/user/

With lsuio you can quickly check if your kernel module is loaded and which attributes it exports. Have a look at the manpage for details.

The source code of lsuio can serve as an example for getting information about an UIO device. The file uio_helper.c contains a lot of functions you could use in your userspace driver code.

获取有关您的 UIO 设备的信息¶ sysfs 中提供了有关所有 UIO 设备的信息。 您应该在驱动程序中做的第一件事是检查名称和版本,以确保您正在与正确的设备通信,并且其内核驱动程序具有您期望的版本。

您还应该确保您需要的内存映射存在并且具有您期望的大小。

有一个名为 lsuio 的工具可以列出 UIO 设备及其属性。 可在此处获得:

http://www.osadl.org/projects/downloads/UIO/user/

使用 lsuio,您可以快速检查您的内核模块是否已加载以及它导出的属性。 有关详细信息,请查看手册页。

lsuio 的源代码可以作为获取有关 UIO 设备信息的示例。 文件 uio_helper.c 包含许多可以在用户空间驱动程序代码中使用的函数。

mmap() device memory

After you made sure you’ve got the right device with the memory mappings you need, all you have to do is to call mmap() to map the device’s memory to userspace.

The parameter offset of the mmap() call has a special meaning for UIO devices: It is used to select which mapping of your device you want to map. To map the memory of mapping N, you have to use N times the page size as your offset:

1
offset = N * getpagesize();

N starts from zero, so if you’ve got only one memory range to map, set offset = 0. A drawback of this technique is that memory is always mapped beginning with its start address.

在你确定你有正确的设备和你需要的内存映射之后,你所要做的就是调用 mmap() 来将设备的内存映射到用户空间。

mmap()调用的参数offset对于UIO设备有特殊的意义:用来选择你的设备要映射到哪个映射。 要映射映射 N 的内存,您必须使用 N 倍的页面大小作为您的偏移量:

1
offset = N * getpagesize();

N 从零开始,因此如果您只有一个要映射的内存范围,请设置 offset = 0。这种技术的一个缺点是内存总是从其起始地址开始映射。

Waiting for interrupts

After you successfully mapped your devices memory, you can access it like an ordinary array. Usually, you will perform some initialization. After that, your hardware starts working and will generate an interrupt as soon as it’s finished, has some data available, or needs your attention because an error occurred.

/dev/uioX is a read-only file. A read() will always block until an interrupt occurs. There is only one legal value for the count parameter of read(), and that is the size of a signed 32 bit integer (4). Any other value for count causes read() to fail. The signed 32 bit integer read is the interrupt count of your device. If the value is one more than the value you read the last time, everything is OK. If the difference is greater than one, you missed interrupts.

You can also use select() on /dev/uioX.

成功映射设备内存后,您可以像访问普通数组一样访问它。 通常,您将执行一些初始化。 之后,您的硬件开始工作,并会在完成后立即生成中断,有一些可用数据,或者因为发生错误而需要您的注意。

/dev/uioX 是一个只读文件。 read() 将始终阻塞,直到发生中断。 read() 的 count 参数只有一个合法值,即有符号 32 位整数 (4) 的大小。 count 的任何其他值都会导致 read() 失败。 读取的有符号 32 位整数是设备的中断计数。 如果该值比您上次读取的值大一,则一切正常。 如果差值大于 1,则您错过了中断。

您还可以在 /dev/uioX 上使用 select()。

Generic PCI UIO driver

The generic driver is a kernel module named uio_pci_generic. It can work with any device compliant to PCI 2.3 (circa 2002) and any compliant PCI Express device. Using this, you only need to write the userspace driver, removing the need to write a hardware-specific kernel module.

通用驱动程序是一个名为 uio_pci_generic 的内核模块。 它可以与任何兼容 PCI 2.3(大约 2002 年)的设备和任何兼容的 PCI Express 设备一起使用。 使用它,您只需要编写用户空间驱动程序,无需编写特定于硬件的内核模块。

Making the driver recognize the device

Since the driver does not declare any device ids, it will not get loaded automatically and will not automatically bind to any devices, you must load it and allocate id to the driver yourself. For example:

由于驱动程序没有声明任何设备ID,它不会自动加载,也不会自动绑定到任何设备,您必须自己加载并分配ID给驱动程序。 例如:

1
2
modprobe uio_pci_generic
echo "8086 10f5" > /sys/bus/pci/drivers/uio_pci_generic/new_id

If there already is a hardware specific kernel driver for your device, the generic driver still won’t bind to it, in this case if you want to use the generic driver (why would you?) you’ll have to manually unbind the hardware specific driver and bind the generic driver, like this:

1
2
echo -n 0000:00:19.0 > /sys/bus/pci/drivers/e1000e/unbind
echo -n 0000:00:19.0 > /sys/bus/pci/drivers/uio_pci_generic/bind

You can verify that the device has been bound to the driver by looking for it in sysfs, for example like the following:

1
ls -l /sys/bus/pci/devices/0000:00:19.0/driver

Which if successful should print:

1
.../0000:00:19.0/driver -> ../../../bus/pci/drivers/uio_pci_generic

Note that the generic driver will not bind to old PCI 2.2 devices. If binding the device failed, run the following command:

1
dmesg

and look in the output for failure reasons.

Things to know about uio_pci_generic

Interrupts are handled using the Interrupt Disable bit in the PCI command register and Interrupt Status bit in the PCI status register. All devices compliant to PCI 2.3 (circa 2002) and all compliant PCI Express devices should support these bits. uio_pci_generic detects this support, and won’t bind to devices which do not support the Interrupt Disable Bit in the command register.

On each interrupt, uio_pci_generic sets the Interrupt Disable bit. This prevents the device from generating further interrupts until the bit is cleared. The userspace driver should clear this bit before blocking and waiting for more interrupts.

使用 PCI 命令寄存器中的中断禁用位和 PCI 状态寄存器中的中断状态位来处理中断。 所有符合 PCI 2.3(大约 2002 年)的设备和所有符合 PCI Express 的设备都应该支持这些位。 uio_pci_generic 检测到这种支持,并且不会绑定到不支持命令寄存器中的中断禁用位的设备。

每次中断时,uio_pci_generic 都会设置中断禁止位。 这可以防止设备在该位被清除之前产生进一步的中断。 用户空间驱动程序应在阻塞和等待更多中断之前清除该位。

Writing userspace driver using uio_pci_generic

Userspace driver can use pci sysfs interface, or the libpci library that wraps it, to talk to the device and to re-enable interrupts by writing to the command register.

用户空间驱动程序可以使用 pci sysfs 接口或包装它的 libpci 库与设备通信并通过写入命令寄存器来重新启用中断。

Example code using uio_pci_generic

Here is some sample userspace driver code using uio_pci_generic:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>

int main()
{
    int uiofd;
    int configfd;
    int err;
    int i;
    unsigned icount;
    unsigned char command_high;

    uiofd = open("/dev/uio0", O_RDONLY);
    if (uiofd < 0) {
        perror("uio open:");
        return errno;
    }
    configfd = open("/sys/class/uio/uio0/device/config", O_RDWR);
    if (configfd < 0) {
        perror("config open:");
        return errno;
    }

    /* Read and cache command value */
    err = pread(configfd, &command_high, 1, 5);
    if (err != 1) {
        perror("command config read:");
        return errno;
    }
    command_high &= ~0x4;

    for(i = 0;; ++i) {
        /* Print out a message, for debugging. */
        if (i == 0)
            fprintf(stderr, "Started uio test driver.\n");
        else
            fprintf(stderr, "Interrupts: %d\n", icount);

        /****************************************/
        /* Here we got an interrupt from the
           device. Do something to it. */
        /****************************************/

        /* Re-enable interrupts. */
        err = pwrite(configfd, &command_high, 1, 5);
        if (err != 1) {
            perror("config write:");
            break;
        }

        /* Wait for next interrupt. */
        err = read(uiofd, &icount, 4);
        if (err != 4) {
            perror("uio read:");
            break;
        }

    }
    return errno;
}

Generic Hyper-V UIO driver

The generic driver is a kernel module named uio_hv_generic. It supports devices on the Hyper-V VMBus similar to uio_pci_generic on PCI bus.

Making the driver recognize the device

Since the driver does not declare any device GUID’s, it will not get loaded automatically and will not automatically bind to any devices, you must load it and allocate id to the driver yourself. For example, to use the network device class GUID:

1
2
modprobe uio_hv_generic
echo "f8615163-df3e-46c5-913f-f2d2f965ed0e" > /sys/bus/vmbus/drivers/uio_hv_generic/new_id

If there already is a hardware specific kernel driver for the device, the generic driver still won’t bind to it, in this case if you want to use the generic driver for a userspace library you’ll have to manually unbind the hardware specific driver and bind the generic driver, using the device specific GUID like this:

1
2
echo -n ed963694-e847-4b2a-85af-bc9cfc11d6f3 > /sys/bus/vmbus/drivers/hv_netvsc/unbind
echo -n ed963694-e847-4b2a-85af-bc9cfc11d6f3 > /sys/bus/vmbus/drivers/uio_hv_generic/bind

You can verify that the device has been bound to the driver by looking for it in sysfs, for example like the following:

1
ls -l /sys/bus/vmbus/devices/ed963694-e847-4b2a-85af-bc9cfc11d6f3/driver

Which if successful should print:

1
.../ed963694-e847-4b2a-85af-bc9cfc11d6f3/driver -> ../../../bus/vmbus/drivers/uio_hv_generic

Things to know about uio_hv_generic

On each interrupt, uio_hv_generic sets the Interrupt Disable bit. This prevents the device from generating further interrupts until the bit is cleared. The userspace driver should clear this bit before blocking and waiting for more interrupts.

When host rescinds a device, the interrupt file descriptor is marked down and any reads of the interrupt file descriptor will return -EIO. Similar to a closed socket or disconnected serial device.

  • The vmbus device regions are mapped into uio device resources:

    Channel ring buffers: guest to host and host to guestGuest to host interrupt signalling pagesGuest to host monitor pageNetwork receive buffer regionNetwork send buffer region

If a subchannel is created by a request to host, then the uio_hv_generic device driver will create a sysfs binary file for the per-channel ring buffer. For example:

1
/sys/bus/vmbus/devices/3811fe4d-0fa0-4b62-981a-74fc1084c757/channels/21/ring

参考:

kernel source: [kernel source]/drivers/uio/

https://www.kernel.org/doc/html/latest/driver-api/uio-howto.html#generic-pci-uio-driver

https://www.cnblogs.com/allcloud/p/7808776.html

https://www.cnblogs.com/kb342/p/5168197.html

This post is licensed under CC BY 4.0 by the author.

Shell Records - DiskBenchMark

DMA映射