人工智能 requests连接池即长连接分析

aotee · December 14, 2019 · 79 hits

requests 长连接

遗留问题

  • 为什么 requests 短连接时,虽然 http 头中携带了Connection: keep-alive头,服务端却回复Connection: close头?client 和 server 是如何得知此次连接是短连接的?

requests使用Session()即长连接时,请求头会携带字段Connection: keep-alive,服务端回复相同的 http 头

如下 5 个 http 连接公用一个 tcp 连接

  with requests.Session() as s:
       s.get("http://www.baidu.com")
       s.get("http://www.baidu.com")
       s.get("http://www.baidu.com")
       s.get("http://www.baidu.com")
       s.get("http://www.baidu.com")

而下面的请求各自使用一个 tcp 连接,用完即销毁,请求头里面也有Connection: keep-alive头,但是服务端回复Connection: close

requests.get(“http://www.baidu.com")

requests.get(“http://www.baidu.com")

requests.get(“http://www.baidu.com")

requests.get(“http://www.baidu.com")

使用

基本使用方法:申请一个 tcp 连接,进行多个 http 请求后,关闭连接

Basic Usage::

  import requests
  s = requests.Session()
  进行多个连接
  s.get('https://httpbin.org/get')      
  s.get('https://httpbin.org/get')
  关闭连接  
  s.close()

Or as a context manager::

  with requests.Session() as s:
      进行多个连接  
      s.get('https://httpbin.org/get')      
      s.get('https://httpbin.org/get')          

"""

原理

撸下代码

调用顺序:

requests.Session().get() --> requests.Session().request() --> requests.Session().send()

查看requests包Session()send()方法的第 645 行和 646 行代码,表示调用使用某个适配器的send()

# Send the request
r = adapter.send(request, **kwargs)

如果使用的默认的HTTPAdapter(),查看其send()方法,下面的conn最终获得一个HTTPConnectionPool对象

try:
    conn = self.get_connection(request.url, proxies)
except LocationValueError as e:
    raise InvalidURL(e, request=request)

最终调用到了urllib3.connectionpool.HTTPConnectionPool()._get_conn()

def _get_conn(self, timeout=None):
    """
    Get a connection. Will return a pooled connection if one is available.

    If no connections are available and :prop:`.block` is ``False``, then a
    fresh connection is returned.

    :param timeout:
        Seconds to wait before giving up and raising
        :class:`urllib3.exceptions.EmptyPoolError` if the pool is empty and
        :prop:`.block` is ``True``.
    """
    conn = None
    try:
        conn = self.pool.get(block=self.block, timeout=timeout)

    except AttributeError:  # self.pool is None
        raise ClosedPoolError(self, "Pool is closed.")

    except queue.Empty:
        if self.block:
            raise EmptyPoolError(self,
                                 "Pool reached maximum size and no more "
                                 "connections are allowed.")
        pass  # Oh well, we'll create a new connection then

    # If this is a persistent connection, check if it got disconnected
    if conn and is_connection_dropped(conn):
        log.debug("Resetting dropped connection: %s", self.host)
        这里关闭了已经无用的连接,比如服务端主动关闭,而requests所在的客户端处于CLOSE_WAIT状态tcp连接
        conn.close()
        if getattr(conn, 'auto_open', 1) == 0:
            # This is a proxied connection that has been mutated by
            # httplib._tunnel() and cannot be reused (since it would
            # attempt to bypass the proxy)
            conn = None

    return conn or self._new_conn()            

里面调用了urllib3.util.connection.is_connection_dropped()

def is_connection_dropped(conn):  # Platform-specific
    """
    Returns True if the connection is dropped and should be closed.

    :param conn:
        :class:`httplib.HTTPConnection` object.

    Note: For platforms like AppEngine, this will always return ``False`` to
    let the platform handle connection recycling transparently for us.
    """
    sock = getattr(conn, 'sock', False)
    if sock is False:  # Platform-specific: AppEngine
        return False
    if sock is None:  # Connection already closed (such as by httplib).
        return True
    try:
        # Returns True if readable, which here means it's been dropped
        return wait_for_read(sock, timeout=0.0)
    except NoWayToWaitForSocketError:  # Platform-specific: AppEngine
        return False

继而调用了urllib3.util.wait.wait_for_read()

def wait_for_read(sock, timeout=None):
    """ Waits for reading to be available on a given socket.
    Returns True if the socket is readable, or False if the timeout expired.
    """
    return wait_for_socket(sock, read=True, timeout=timeout)

我们再看下处于 CLOSE_WAIT 状态的 tcp 连接,查看 Recv-Q(数据已经在本地接收缓冲区,但是还没有 recv()) 的值不为零,是不是刚好呼应了上面的 wait_for_read

Proto Recv-Q Send-Q Local Address         Foreign Address         State           PID/Program name
tcp        1          0       127.0.0.1:8071          127.0.0.1:46423        CLOSE_WAIT      5782/java
tcp        1          0       127.0.0.1:8071          127.0.0.1:41958         CLOSE_WAIT      5782/java
tcp        1          0       127.0.0.1:8071          127.0.0.1:42004        CLOSE_WAIT      5782/java
tcp        1          0       127.0.0.1:8071          127.0.0.1:52313         CLOSE_WAIT      5782/java                     

结论

一般情况下,服务端会维持一段时间的长连接,但是一旦超过闲置时间没有连接发生,则主动断开:如下表示闲置时间为 5s

--httpKeepAliveTimeout   = how long idle HTTP keep-alive connections are kept around (in ms; default 5000)
  1. 客户端 requests 长连接时:各个请求共用同一个 tcp 长连接,但是超过一定时间不使用,则服务端主动断开连接,服务端处于 time_wait 状态,直至超时后释放资源; 而客户端会长时间处于 close_wait 状态 (因为 requests 长连接时,不会主动自动发送 close 报文及 FIN 报文),那么这个 close_wait 到什么时候才关闭呢? 如以上代码走读分析,当新的 http 请求产生的时候,发现旧的 tcp 连接不可用 (因为已经处于 close_wait 状态了嘛),则关闭这个连接,并开启一个新的 tcp 长连接

    requests对于每个新的请求,都去池子里请求一个连接,在请求的过程中,判断连接是否可用,不可用,关闭,然后创建一个新的连接
    
  2. 客户端 requests 短连接时:每个请求开启新的 tcp 连接,完成后,主动断开,这时候,客户端存在大量的 time_wait 状态,服务端没有问题,会迅速关闭 tcp 连接

No Reply at the moment.
You need to Sign in before reply, if you don't have an account, please Sign up first.