======i/oat====== =====Contents===== * [[https://www.linuxfoundation.org/#I.2FOAT|1 I/OAT]] * [[https://www.linuxfoundation.org/#Implementation_on_Linux|1.1 Implementation on Linux]] * [[https://www.linuxfoundation.org/#Net_stack_modifications|1.2 Net stack modifications]] * [[https://www.linuxfoundation.org/#Patches|1.2.1 Patches]] * [[https://www.linuxfoundation.org/#Kernel_acceptance_status|2 Kernel acceptance status]] * [[https://www.linuxfoundation.org/#Performance_data|3 Performance data]] ===== I/OAT ===== **I/OAT (I/O Acceleration Technology)** is the name for a collection of techniques by Intel to improve network throughput. The most significant of these is the DMA engine. The DMA engine is meant to offload from the CPU the copying of [[https://www.linuxfoundation.org/node/add/wiki?gids[]=5066| SKB ]] data to the user buffers. This is not a zero-copy receive, but does allow the CPU to do other work while the copy operations are performed by the DMA engine. ==== Implementation on Linux ==== The I/OAT patch series consists of three general areas. First, it adds a **DMA subsystem** to the kernel, which abstracts the DMA engine hardware from users of it. Second, it adds the **I/OAT hardware driver**, which plugs into the DMA subsystem and handles controlling the actual hardware. Finally, it implements a series of **modifications to the network stack** to make use of asynchronous copy offload. ==== Net stack modifications ==== The net stack modifications, given that they touch very important code, have recieved the most scrutiny. Significant changes: - Data members have been added (to ''struct sk_buff'' and ''struct sock_common'' most notably) - ''sk_eat_skb()'' has an added parameter - ''tcp_recvmsg()'': Code added to pin user buffer memory on entry. Code added to wait for async copies to complete, and unpin memory, before exiting. - ''tcp_rcv_established()'': Code added to initiate async copies if possible. ''dma_try_early_copy()'' added to tcp.c. === Patches === **Updated to ioat-1.7 and netdev latest git (20060508)** - [[http://kernel.org/pub/linux/kernel/people/grover/ioat/patches/01-dma_memcpy_subsystem|DMA subsystem]] - [[http://kernel.org/pub/linux/kernel/people/grover/ioat/patches/02-ioatdma_driver|HW driver]] - [[http://kernel.org/pub/linux/kernel/people/grover/ioat/patches/03-ioat_net_dma_client|set up net as DMA client]] - [[http://kernel.org/pub/linux/kernel/people/grover/ioat/patches/04-ioat_skb_ucopy|utility functions]] - [[http://kernel.org/pub/linux/kernel/people/grover/ioat/patches/05-ioat_net_struct_changes|structure changes]] - [[http://kernel.org/pub/linux/kernel/people/grover/ioat/patches/06-tcp_cleanup_rbuf|rename cleanup_rbuf and make non-static]] - [[http://kernel.org/pub/linux/kernel/people/grover/ioat/patches/07-sk_eat_skb|modify ''sk_eat_skb'']] - [[http://kernel.org/pub/linux/kernel/people/grover/ioat/patches/08-ioat_tcp_copybreak_sysctl|add sysctl for copy size tuning]] - [[http://kernel.org/pub/linux/kernel/people/grover/ioat/patches/09-ioat_tcp_offload|modify the stack to do recv copy offload]] ===== Kernel acceptance status ===== Intel presented technical information at OLS 2005 (but no code.) Posted all code but HW driver for review November 2005. Posted updated patch with HW driver March 3 2006, and again incorporating dev community feedback March 29 2006. I/OAT has been queued for 2.6.18. ===== Performance data ===== This is the initial data we posted to netdev March 16 2006. [[http://kernel.org/pub/linux/kernel/people/grover/ioat/chariot-icb.50-portscaling-notouch.pdf|initial Chariot portscaling without data access]] This is more Chariot data, but also includes results with its data verification on, thus touching the data. The CPU gap is narrower (esp on 8 port) but still noteworthy. [[http://kernel.org/pub/linux/kernel/people/grover/ioat/chariot-icb1.5-portscaling-both.pdf|later Chariot portscaling with and without data access]] This data shows that I/OAT really benefits from larger application buffer sizes. There is a CPU spike at 2K, although also increased throughput. This could be eliminated by increasing the tcp_dma_copybreak sysctl ("cat 4096 > /proc/sys/net/ipv4/tcp_dma_copybreak"), which disables I/OAT at or below that application buffer size. [[http://kernel.org/pub/linux/kernel/people/grover/ioat/chariot-icb1.5-varbuff-touch.pdf|Chariot using different application buffer sizes]] This shows netperf performance. Notice we are using fewer clients than the Chariot tests. A slight CPU savings at higher application buffer sizes, but less noteworthy than Chariot. [[http://kernel.org/pub/linux/kernel/people/grover/ioat/netperf-icb1.3-varbuff-notouch.pdf|netperf using different application buffer sizes]] This data shows 6 individual runs of Tbench, showing 7-10% drop in CPU utilization. [[http://kernel.org/pub/linux/kernel/people/grover/ioat/tbench-icb1.5-allruns.pdf|Tbench showing CPU utilization across six runs]] Results from SPECWeb. Since this is a TX test, I/OAT should not impact performance, and these indicate it doesn't. [[http://kernel.org/pub/linux/kernel/people/grover/ioat/SPECweb_Banking.20060330-131442_NO_IOAT.txt|SPECWeb with no I/OAT]] [[http://kernel.org/pub/linux/kernel/people/grover/ioat/SPECweb_Banking.20060330-153606_IOAT.txt|SPECWeb with I/OAT enabled]] This data shows results with different numbers of ports. It includes both standard netperf data, as well as results using a new option only present in netperf's SVN repo that touches the data after it is received. [[http://kernel.org/pub/linux/kernel/people/grover/ioat/netperf-icb-1.5-postscaling-both.pdf|netperf showing port scaling with touched data]]