Table of Contents

tcp_memory_documentation

Contents

intro

This page was created by Ian McDonald when trying to implement memory management for Net:DCCP. Feel free to delete/reorganise as very rough and was just internal notes but thought it might be of some use perhaps! Much of it is just using grep to find out what calls what…

Memory in TCP is controlled by three sysctls under net/ipv4 which are tcp_mem, tcp_rmem, tcp_wmem.

Note this documentation covers TCP specific memory tracking and excludes pure socket memory management in other parts of the tree.

The documentation for what the values mean are in Documentation/networking/ip-sysctl.txt. For tcp_rmem and tcp_wmem the three values are min, default and max. For tcp_mem the values are low, pressure and high and are used to moderate memory consumption. tcp_mem is measured in pages.

These are defined as variables in include/net/tcp.h with the same name and sysctl_ in front of them.

Also in tcp.h are tcp_memory_allocate and tcp_memory_pressure.

sk→sk_rcvbuff sets the maximum receiver buffer size and sk→sk_rmem_alloc sets the memory actually used.

rmem_max and wmem_max for general purpose use are used only in net/core/sock.c sock_setsockopt to stop a user taking more buffer than system allows and to set sk→sndbuff and rcvbuff.

memory_allocated is in pages as is sysctl_mem/sysctl_tcp_mem.

net/ipv4/tcp.c

The sysctls are defined here and setup in net/ipv4/sysctl_net_ipv4.c. The constants are defined in include/linux/sysctl.h

tcp_memory_allocated is defined and is atomic. Also tcp_memory_pressure is defined here. tcp_enter_memory_pressure just sets this and management stats also.

do_tcp_sendpages is called by tcp_sendpage which is a callback function.

tcp_push is called by by do_tcp_sendpages, tcp_sendmsg.

do_tcp_setsocketopt is called by tcp_setsockopt and compat_tcp_setsockopt

sysctl_mem is used here which is initialised by protocol e.g. net/ipv4/tcp_ipv4.c sets this to sysctl_tcp_mem

tcp_close checks at one point if under memory pressure - if so then it kills socket straight away (I think!!)

In tcp_init the sysctls are set to their default sizes.

tcp_cleanup_rbuf calls tcp_select_window from tcp_output.c which adjusts sizes based on memory. tcp_cleanup_rbuf is called in tcp_read_sock, tcp_recvmsg, do_tcp_setsockopt and tcp_rcv_established from tcp_input.c ===== net/core/stream.c ===== memory_allocated gets used here by sk_stream_mem_reclaim and sk_stream_mem_schedule. sk_stream_mem_schedule is used by sk_stream_rmem_schedule and sk_stream_wmem_schedule in include/net/sock.h.

sk_stream_mem_schedule checks whether the memory can be used for a request and enters memory pressure if needed or refuses a request. For transmits it calls sk_stream_moderate_buf if needed.

net/ipv4/tcp_input.c

tcp_grow_window can only proceed if not under memory pressure and rcv_ssthresh < tcp_space (from include/net/tcp.h). tcp_grow_window alters rcv_ssthresh based on memory and calls tcp_win_from_space and is called only from tcp_grow_window. tcp_grow_window is called only from tcp_event_data_recv provided skb→len >= 128 (why is this I wonder?) tcp_event_data_recv is called by tcp_data_queue and tcp_rcv_established. tcp_data_queue is called by tcp_rcv_established and tcp_rcv_state_process tcp_rcv_established is called by tcp_v4_do_rcv in net/ipv4/tcp_ipv4.c which is called from tcp_v4_recv or directly via callback tcp_rcv_state_process is called by tcp_v4_do_rcv as above and in net/ipv4/tcp_minisocks.c tcp_child_process. tcp_child_process is called from tcp_v4_do_rcv tcp_clamp_window sk→sk_rcvbuff gets altered as does tp→rcv_ssthresh. This is only called by tcp_prune_queue. tcp_prune_queue also alters the above. tcp_data_queue calls this only. tcp_should_expand_sndbuf makes checks on memory. This is called by only tcp_new_space. tcp_new_space is called by only tcp_check_space. tcp_check_space is called by tcp_data_snd_check. tcp_data_snd_check is called by tcp_rcv_established, tcp_rcv_state_process tcp_fixup_rcvbuf and tcp_fixup_sndbuf make checks on memory. These are called by tcp_init_buffer_space. tcp_init_buffer_space is called by tcp_rcv_synsent_state_process and tcp_rcv_state_process. tcp_rcv_synsent_state_process is called by tcp_rcv_state_process. tcp_rcv_space_adjust sk→sk_rcvbuff gets altered. This calls tcp_win_from_space which is defined in include/net/tcp.h which calcuates a window size based on the space passsed in. tcp_rcv_space_adjust gets called by tcp_data_queue, tcp_copy_to_iovec (which gets called by tcp_rcv_established), tcp_dma_try_early_copy (which also gets called by tcp_rcv_established), tcp_read_sock (which gets called only by sun rpc) in tcp.c and tcp_recvmsg in tcp.c tcp_new_space makes checks on memory and gets called only by tcp_check_space. tcp_check_space calls tcp_new_space. this gets called only by tcp_data_snd_check tcp_data_snd_check is called by tcp_rcv_established and tcp_rcv_state_process tcp_ack_snd_check calls tcp_select_window from tcp_output.c and is called from tcp_ack_snd_check, tcp_rcv_established tcp_ack_snd_check is called from tcp_rcv_established and tcp_rcv_state_process tcp_fin calls sk_stream_mem_reclaim from include/net/sock.h and is called by tcp_ofo_queue, tcp_data_queue. tcp_ofo_queue is called by tcp_data_queue. tcp_ack calls tcp_ack_update_window. tcp_ack is called by tcp_rcv_established and tcp_rcv_synsent_state_process and tcp_rcv_state_process tcp_sacktag_write_queue is used by tcp_ack tcp_check_sack_reneging is used by tcp_fastretrans_alert tcp_fastretrans_alert is used by tcp_ack ===== net/ipv4/tcp_output.c ===== tcp_select_window if there is memory_presure tp→rcv_ssthresh is reduced. This is called by tcp_select_window, tcp_cleanup_rbuf from tcp.c, tcp_ack_snd_check from tcp_input.c. This function also uses tcp_space from include/net/tcp.h tcp_select_window is called from tcp_transmit_skb which is called from multiple places in this file. tcp_select_initial_windows uses memory settings to decide whether to do window scaling. It picks the maximum out of tcp_rmem[2] and rmem_max tso_fragment is used by tcp_write_xmit and tcp_push_one. tcp_write_xmit is used by tcp_push_pending_frames

tcp_push_pending_frames is used by tcp_push_pending_frames from include/net/tcp.h, tcp_push, do_tcp_sendpages, tcp_sendmsg all from net/ipv4/tcp.c and tcp_send_fin from this file. tcp_send_fin is used by tcp_shutdown and tcp_close from net/ipv4/tcp.c tcp_push_one is used by tcp_sendmsg and do_tcp_sendpages both from net/ipv4/tcp.c tcp_fragment is used by tcp_sacktag_write_queue from net/ipv4/tcp_input.c and tso_fragment, tcp_retransmit_skb, tcp_write_wakeup all from this file. tcp_retransmit_skb is used by tcp_check_sack_reneging from net/ipv4/tcp_input.c and tcp_retransmit_timer from net/ipv4/tcp_timer.c and tcp_xmit_retransmit_queue from this file tcp_write_wakeup is used by tcp_keepalive_timer from net/ipv4/tcp_timer.c, tcp_send_probe0 from this file tcp_xmit_retransmit_queue is used by tcp_fastretrans_alert from net/ipv4/tcp_input.c and tcp_simple_retransmit from this file. tcp_send_probe0 is used by tcp_probe_timer from net/ipv4/tcp_timer.c tcp_simple_retransmit is used by tcp_fastretrans_alert from net/ipv4/tcp_input.c and do_pmtu_discovery from net/ipv4/tcp_ipv4.c tcp_mtu_probe is called by tcp_write_xmit ===== net/ipv4/tcp_timer.c ===== tcp_out_of_resources uses memory checks. This is called by tcp_write_timeout, tcp_probe_timer. tcp_write_timeout is called by tcp_retransmit_timer which is called by tcp_write_timer which is part of a callback in tcp_init_xmit_timers. tcp_probe_timer is called by tcp_write_timer tcp_delack_timer if there is memory pressure sk_stream_mem_reclaim from include/net/sock.h is called. tcp_delack_timer is part of a callback in tpc_init_xmit_timers. tcp_keepalive_timer is called by tcp_init_xmit_timers ===== net/ipv4/tcp_ipv4.c ===== default values get copied over to a new socket. do_pmtu_discovery is called by tcp_v4_err which is a callback function ===== include/net/tcp.h ===== in tcp_fast_path_check tcp_fast_path_on is only called if receive buffers aren't all used. This is used in multiple functions. It is used in net/ipv4/tcp.c tcp_recvmsg, and from net/ipv4/tcp_input.c tcp_ack_update_window and tcp_data_queue. tcp_space returns a call from tcp_win_from_space with receive space free. This is called from tcp_grow_window from net/ipv4/tcp_input.c and tcp_select_window from net/ipv4/tcp_output.c

tcp_push_pending_frames is used by do_tcp_setsockopt from net/ipv4/tcp.c and tcp_data_snd_check from net/ipv4/tcp_input.c

include/net/sock.h

sk_stream_mem_reclaim is in include/net/sock.h and checks if memory sk→sk_forward_alloc is over a threshold (a page) and if so calls sk_stream_mem_reclaim. This is in net/core/stream.c and seems to claim back memory and also removes memory pressure if needed. reclaim gets called in multiple places. sk_stream_mem_reclaim seems to have an unnecessary check as it is repeated again in sk_stream_mem_reclaim. The second one seems unnecessary and has been removed in latest tree.

sk_stream_mem_reclaim gets called from tcp_close in net/ipv4/tcp.c and tcp_event_data_recv, tcp_fin, tcp_prune_queue from net/ipv4/tcp_input.c

sk_stream_rmem_schedule uses sk_stream_mem_schedule from net/core/stream.c. It is used by tcp_data_queue from net/ipv4/tcp_input.c

sk_stream_wmem_schedule uses sk_stream_mem_schedule from net/core/stream.c. It is used by sk_stream_alloc_pskb in this file and do_tcp_sendpages, tcp_sendmsg from net/ipv4/tcp.c

sk_stream_alloc_pskb is called by sk_stream_alloc_skb in this file and do_tcp_send_pages, tcp_sendmsg from net/ipv4/tcp.c, tso_fragment from net/ipv4/tcp_output.c

sk_stream_alloc_skb is called by tcp_fragment and tcp_mtu_probe from net/ipv4/tcp_output.c