Total Pageviews

Monday 29 April 2024

SSH connection via SOCKS proxy, and FTP connection via SOCKS proxy

 

Since the SOCKS proxy performs at Layer 5 of the OSI model (the session layer), you may use it with many applications that work in a Layer higher than Layer 5, such as FTP, Telnet, HTTP, SSH…

SSH via SOCKS proxy

For example, if you want to SSH to a Far_Away_Host via the SOCKS proxy we just created. You can do:

$ ssh -o ProxyCommand='nc -x localhost:12345 %h %p' username@Far_Away_Host

After login into the Far_Away_Host, you can check that you are deemed as connected from SSH_remote_host_IP instead of your local machine!

username@Far_Away_Host$ who  
username pts/3 2021-03-29 14:08 (SSH_remote_host_IP)

FTP via SOCKS proxy

Another example is the SOCKS proxy setting in FileZilla, an FTP client:

SOCKS proxy setting in FileZilla

Further Reading

There is a convenient tool — sshuttle suggested by smw on Hacker News. It works as a poor man’s VPN using ssh, which also doesn’t require admin on the remote machine. The manual is here. It can be easily installed on, e.g., macOS and Ubuntu via:

# on Ubuntu
$ sudo apt install sshuttle
# on macOS, installed by macports
$ sudo port install sshuttle

The simplest way to use it is:

$ sshuttle -r username@SSH_remote_host_IP 0.0.0.0/0

, which forwards all traffics on your local machine to the remote host!

Spring-AI

 An Application Framework for AI Engineering.

https://docs.spring.io/spring-ai/reference/1.0-SNAPSHOT/index.html

 build status

Welcome to the Spring AI project!

The Spring AI project provides a Spring-friendly API and abstractions for developing AI applications.

Let's make your @Beans intelligent!

For further information go to our Spring AI reference documentation.

Project Links

Educational Resources

Some selected videos. Search YouTube! for more.

  • Spring Tips: Spring AI
    Watch Spring Tips video
  • Overview of Spring AI @ Devoxx 2023
    Watch the Devoxx 2023 video
  • Introducing Spring AI - Add Generative AI to your Spring Applications
    Watch the video

Getting Started

Please refer to the Getting Started Guide for instruction on adding your dependencies.

Note, the new Spring CLI project lets you get up and running in two simple steps, described in detail here.

  1. Install Spring CLI
  2. Type spring boot new --from ai --name myai in your terminal

Adding Dependencies manually

Note that are two main steps.

  1. Add the Spring Milestone and Snapshot repositories to your build system.
  2. Add the Spring AI BOM
  3. Add dependencies for the specific AI model, Vector Database or other component dependencies you require.

Overview

Despite the extensive history of AI, Java's role in this domain has been relatively minor. This is mainly due to the historical reliance on efficient algorithms developed in languages such as C/C++, with Python serving as a bridge to access these libraries. The majority of ML/AI tools were built around the Python ecosystem. However, recent progress in Generative AI, spurred by innovations like OpenAI's ChatGPT, has popularized the interaction with pre-trained models via HTTP. This eliminates much of the dependency on C/C++/Python libraries and opens the door to the use of programming languages such as Java.

The Python libraries LangChain and LlamaIndex have become popular to implement Generative AI solutions and can be implemented in other programming languages. These Python libraries share foundational themes with Spring projects, such as:

  • Portable Service Abstractions
  • Modularity
  • Extensibility
  • Reduction of boilerplate code
  • Integration with diverse data sources
  • Prebuilt solutions for common use cases

Taking inspiration from these libraries, the Spring AI project aims to provide a similar experience for Spring developers in the AI domain.

Note, that the Spring AI API is not a direct port of either LangChain or LlamaIndex. You will see significant differences in the API if you are familiar with those two projects, though concepts and ideas are fairly portable.

Feature Overview

This is a high level feature overview. The features that are implemented lay the foundation, with subsequent more complex features building upon them.

You can find more details in the Reference Documentation

Interacting with AI Models

ChatClient: A foundational feature of Spring AI is a portable client API for interacting with generative AI models. With this portable API, you can initially target one AI chat model, for example OpenAI and then easily swap out the implementation to another AI chat model, for example Amazon Bedrock's Anthropic Model. When necessary, you can also drop down to use non-portable model options.

Spring AI supports many AI models. For an overview see here. Specific models currently supported are

  • OpenAI
  • Azure OpenAI
  • Amazon Bedrock (Anthropic, Llama, Cohere, Titan, Jurassic2)
  • HuggingFace
  • Google VertexAI (PaLM2, Gemini)
  • Mistral AI
  • Stability AI
  • Ollama
  • PostgresML
  • Transformers (ONNX)
  • Anthropic Claude3

Prompts: Central to AI model interaction is the Prompt, which provides specific instructions for the AI to act upon. Crafting an effective Prompt is both an art and science, giving rise to the discipline of "Prompt Engineering". These prompts often leverage a templating engine for easy data substitution within predefined text using placeholders.

Explore more on Prompts in our concept guide. To learn about the Prompt class, refer to the Prompt API guide.

Prompt Templates: Prompt Templates support the creation of prompts, particularly when a Template Engine is employed.

Delve into PromptTemplates in our concept guide. For a hands-on guide to PromptTemplate, see the PromptTemplate API guide.

Output Parsers: AI model outputs often come as raw java.lang.String values. Output Parsers restructure these raw strings into more programmer-friendly formats, such as CSV or JSON.

Get insights on Output Parsers in our concept guide.. For implementation details, visit the OutputParser API guide.

Incorporating your data

Incorporating proprietary data into Generative AI without retraining the model has been a breakthrough. Retraining models, especially those with billions of parameters, is challenging due to the specialized hardware required. The 'In-context' learning technique provides a simpler method to infuse your pre-trained model with data, whether from text files, HTML, or database results. The right techniques are critical for developing successful solutions.

Retrieval Augmented Generation

Retrieval Augmented Generation, or RAG for short, is a pattern that enables you to bring your data to pre-trained models. RAG excels in the 'query over your docs' use-case.

Learn more about Retrieval Augmented Generation.

Bringing your data to the model follows an Extract, Transform, and Load (ETL) pattern. The subsequent classes and interfaces support RAG's data preparation.

Documents:

The Document class encapsulates your data, including text and metadata, for the AI model. While a Document can represent extensive content, such as an entire file, the RAG approach segments content into smaller pieces for inclusion in the prompt. The ETL process uses the interfaces DocumentReader, DocumentTransformer, and DocumentWriter, ending with data storage in a Vector Database. This database later discerns the pieces of data that are pertinent to a user's query.

Document Readers:

Document Readers produce a List<Document> from diverse sources like PDFs, Markdown files, and Word documents. Given that many sources are unstructured, Document Readers often segment based on content semantics, avoiding splits within tables or code sections. After the initial creation of the List<Document>, the data flows through transformers for further refinement.

Document Transformers:

Transformers further modify the List<Document> by eliminating superfluous data, like PDF margins, or appending metadata (e.g., primary keywords or summaries). Another critical transformation is subdividing documents to fit within the AI model's token constraints. Each model has a context-window indicating its input and output data limits. Typically, one token equates to about 0.75 words. For instance, in model names like gpt-4-32k, "32K" signifies the token count.

Document Writers:

The final ETL step within RAG involves committing the data segments to a Vector Database. Though the DocumentWriter interface isn't exclusively for Vector Database writing, it the main type of implementation.

Vector Stores: Vector Databases are instrumental in incorporating your data with AI models. They ascertain which document sections the AI should use for generating responses. Examples of Vector Databases include Chroma, Postgres, Pinecone, Qdrant, Weaviate, Mongo Atlas, and Redis. Spring AI's VectorStore abstraction permits effortless transitions between database implementations.

Cloning the repo

This repository contains large model files. To clone it you have to either:

  • Ignore the large files (won't affect the spring-ai behaviour) : GIT_LFS_SKIP_SMUDGE=1 git clone git@github.com:spring-projects/spring-ai.git.
  • Or install the Git Large File Storage before cloning the repo.

Building

To build with running unit tests

./mvnw clean package

To build including integration tests. Set API key environment variables for OpenAI and Azure OpenAI before running.

./mvnw clean verify -Pintegration-tests

To run a specific integration test allowing for up to two attempts to succeed. This is useful when a hosted service is not reliable or times out.

./mvnw -pl vector-stores/spring-ai-pgvector-store -Pintegration-tests -Dfailsafe.rerunFailingTestsCount=2 -Dit.test=PgVectorStoreIT verify

To build the docs

./mvnw -pl spring-ai-docs antora

The docs are then in the directory spring-ai-docs/target/antora/site/index.html

To reformat using the java-format plugin

./mvnw spring-javaformat:apply

To update the year on license headers using the license-maven-plugin

./mvnw license:update-file-header -Plicense

To check javadocs using the javadoc:javadoc

./mvnw javadoc:javadoc -Pjavadoc
from https://github.com/spring-projects/spring-ai 
-------------------------------------------------------

Getting Started

This section offers jumping off points for how to get started using Spring AI.

You should follow the steps in each of the following section according to your needs.

Spring CLI

The Spring CLI, simplifies creating new applications directly from your terminal. Like the 'create-react-app' command for those familiar with the JavaScript ecosystem, Spring CLI provides a spring boot new command to create Spring-based projects. Spring CLI also offers features to integrate external code bases into your current project, and many other productivity features.


It is important to understand that the "Spring CLI" is a distinct project from the "Spring Boot CLI", each with its own set of functionalities.

To begin creating a Spring AI application, follow these steps:

  1. Download the latest Spring CLI Release and follow the installation instructions.

  2. To create a simple OpenAI-based application, use the command:

    spring boot new --from ai --name myai
  1. Consult the generated README.md file for guidance on obtaining an OpenAI API Key and running your first AI application.

To add the same simple AI application to an existing project, execute:

spring boot add ai

Spring CLI allows users to define their own project catalogs that define which projects you can create or add to your existing code base.

Spring Initializr

Head on over to start.spring.io and select the AI Models and Vector Stores that you want to use in your new applications.

Add Milestone and Snapshot Repositories

If you prefer to add the dependency snippets by hand, follow the directions in the following sections.

To use the Milestone and Snapshot version, you need to add references to the Spring Milestone and/or Snapshot repositories in your build file.

For Maven, add the following repository definitions as needed:

  <repositories>
    <repository>
      <id>spring-milestones</id>
      <name>Spring Milestones</name>
      <url>https://repo.spring.io/milestone</url>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
    </repository>
    <repository>
      <id>spring-snapshots</id>
      <name>Spring Snapshots</name>
      <url>https://repo.spring.io/snapshot</url>
      <releases>
        <enabled>false</enabled>
      </releases>
    </repository>
  </repositories>

For Gradle, add the following repository definitions as needed:

repositories {
  mavenCentral()
  maven { url 'https://repo.spring.io/milestone' }
  maven { url 'https://repo.spring.io/snapshot' }
}

Dependency Management

The Spring AI Bill of Materials (BOM) declares the recommended versions of all the dependencies used by a given release of Spring AI. Using the BOM from your application’s build script avoids the need for you to specify and maintain the dependency versions yourself. Instead, the version of the BOM you’re using determines the utilized dependency versions. It also ensures that you’re using supported and tested versions of the dependencies by default, unless you choose to override them.

If you’re a Maven user, you can use the BOM by adding the following to your pom.xml file -

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>0.8.1-SNAPSHOT</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

Gradle users can also use the Spring AI BOM by leveraging Gradle (5.0+) native support for declaring dependency constraints using a Maven BOM. This is implemented by adding a 'platform' dependency handler method to the dependencies section of your Gradle build script. As shown in the snippet below this can then be followed by version-less declarations of the Starter Dependencies for the one or more spring-ai modules you wish to use, e.g. spring-ai-openai.

dependencies {
  implementation platform("org.springframework.ai:spring-ai-bom:0.8.1-SNAPSHOT")
  // Replace the following with the starter dependencies of specific modules you wish to use
  implementation 'org.springframework.ai:spring-ai-openai'
}

Add dependencies for specific components

Each of the following sections in the documentation shows which dependencies you need to add to your project build system.

Chat Models

 

 

Sunday 28 April 2024

蝸居地下,直擊"寄生上流"的半地下屋

 

被譽為"味蕾聯合國"! 泰國對外來文化接受度高 曼谷成台商心中兵家必爭之地 台裔老闆混搭菜色開創商機 日式燒肉搭配台灣滷味成功征服當地

 

-很想去台湾,新马泰,日本走一趟。

Java诊断工具: greys-anatomy


线上系统为何经常出错?数据库为何屡遭黑手?业务调用为何频频失败?连环异常堆栈案,究竟是哪次调用所为? 数百台服务器意外雪崩背后又隐藏着什么?是软件的扭曲还是硬件的沦丧? 走进科学带你了解Greys, Java线上问题诊断工具。

相关文档

程序安装

  • 远程安装

    curl -sLk http://ompc.oss.aliyuncs.com/greys/install.sh|sh
  • 远程安装(短链接)

    curl -sLk http://t.cn/R2QbHFc|sh

    最新版本

    VERSION : 1.7.6.6

    1. 支持JDK9
    2. greys.sh脚本支持tar的解压缩模式(有些机器没有unzip),默认unzip
    3. 修复 #219 问题

    版本号说明

    主版本.大版本.小版本.漏洞修复

    • 主版本

      这个版本更新说明程序架构体系进行了重大升级,比如之前的0.1版升级到1.0版本,整个软件的架构从单机版升级到了SOCKET多机版。并将Greys的性质进行的确定:Java版的HouseMD,但要比前辈们更强。

    • 大版本

      程序的架构设计进行重大改造,但不影响用户对这款软件的定位。

    • 小版本

      增加新的命令和功能

    • 漏洞修复

      对现有版本进行漏洞修复和增强

      • 主版本大版本、之间不做任何向下兼容的承诺,即0.1版本的Client不保证一定能正常访问1.0版本的Server。

      • 小版本不兼容的版本会在版本升级中指出

      • 漏洞修复保证向下兼容

    维护者

    程序编译

  • 打开终端

    git clone git@github.com:oldmanpushcart/greys-anatomy.git
    cd greys-anatomy/bin
    ./greys-packages.sh
    • 程序执行

      target/目录下生成对应版本的release文件,比如当前版本是1.7.0.4,则生成文件target/greys-1.7.0.4-bin.zip

      程序在本地编译时会主动在本地安装当前编译的版本,所以编译完成后即相当在本地完成了安装。

    心路感悟

    我编写和维护这款软件已经5年了,5年中Greys也从0.1版本一直重构到现在的1.7。在这个过程中我得到了许多人的帮助与建议,并在年底我计划发布2.0版本,将开放Greys的底层通讯协议,支持websocket访问。

    多年的问题排查经验我没有过多的分享,一个Java程序员个中的苦闷也无从分享,一切我都融入到了这款软件的命令中,希望这些沉淀能帮助到可能需要到的你少走一些弯路,同时我也非常期待你们对她的反馈,这样我将感到非常开心和有成就感。

    帮助我们

    Greys的成长需要大家的帮助。

    • 分享你使用Greys的经验

      我非常希望能得到大家的使用反馈和经验分享,如果你有,请将分享文章敏感信息脱敏之后邮件给我:oldmanpushcart@gmail.com,我将会分享给更多的同行。

    • 帮助我完善代码或文档

      一款软件再好,也需要详细的帮助文档;一款软件再完善,也有很多坑要埋。今天我的精力非常有限,希望能得到大家共同的帮助。

       from  https://github.com/oldmanpushcart/greys-anatomy

    mac系统的路由和DSN相关知识


    Mac下,上网,尤其是在双网卡一起使用的时候, 一个网卡连内网,一个网卡连外网,经常会碰到ip不通(路由问题,比较好解决)或者dns解析不了问题. 或者是在通过VPN连公司网络会插入一些内网route,导致部分网络访问不了.

    即使对Linux下的DNS解析无比熟悉了,但是在Mac下还是花了一些时间来折腾,配置不好路由和DNS是不配使用Mac的,所以记录下。

    route

    如果ip不通就看路由表, 根据内外网IP增加/删除相应的路由信息,常用命令如下:

    sudo route -n add 10.176/16 192.168.3.1
    sudo route -n add -net 10.176.0.0/16 192.168.3.1 //添加路由, 访问10.176.0.0/16 走192.168.3.1
    sudo route -n delete -net 10.176.0.0/16 192.168.3.1
    sudo route -n delete 0.0.0.0 192.168.184.1
    sudo route -n add 0.0.0.0 192.168.184.1 //添加默认路由访问外网
    sudo route -n delete 0.0.0.0 192.168.3.1
    sudo route -n add 10.176/16 192.168.3.1
    sudo route -n delete 0.0.0.0 192.168.184.1 -ifscope en0
    sudo route -n add 0.0.0.0 192.168.184.1
    sudo networksetup -setdnsservers 'Apple USB Ethernet Adapter' 202.106.196.115 202.106.0.20 114.114.114.114
    sudo networksetup -setdnsservers 'USB 10/100/1000 LAN' 223.5.5.5 30.30.30.30 114.114.114.114
    ip route get 8.8.8.8 //linux
    route get 8.8.8.8 //macos
    netstat -rn //查看路由
    netstat -nr -f inet //只看ipv4相关路由

    如果本来IP能通,连上VPN后就通不了,那一定是VPN加入了一些更精细的路由导致原来的路由不通了,那么很简单停掉VPN就能恢复或者增加一条更精确的路有记录进去,或者删掉VPN增加的某条路由.

    DNS 解析

    mac下DNS解析问题搞起来比较费劲,相应的资料也不多, 经过上面的操作后如果IP能通,域名解析有问题,一般都是DNS解析出了问题

    mac下 /etc/resolv.conf 不再用来解析域名, 只有nslookup能用到resolv.conf

    cat /etc/resolv.conf
    #
    # macOS Notice
    #
    # This file is not consulted for DNS hostname resolution, address
    # resolution, or the DNS query routing mechanism used by most
    # processes on this system.
    #
    # To view the DNS configuration used by this system, use:
    # scutil --dns
    scutil --dns //查看DNS 解析器
    scutil --nwi //查看网络

    解析出了问题先检查nameserver

    scutil –dns 一般会展示一大堆的resolver, 每个resolver又可以有多个nameserver

    A scoped DNS query can use only specified network interfaces (e.g. Ethernet or WiFi), while non-scoped can use any available interface.

    More verbosely, an application that wants to resolve a name, sends a request (either scoped or non-scoped) to a resolver (usually a DNS client application), if the resolver does not have the answer cached, it sends a DNS query to a particular nameserver (and this goes through one interface, so it is always “scoped”).

    In your example resolver #1 “for scoped queries” can use only en0 interface (Ethernet).

    修改 nameserver

    默认用第一个resolver, 如果第一个resolver没有nameserver那么域名没法解析, 可以修改dns resolver的nameserver:

    $networksetup -listallnetworkservices //列出网卡service, 比如 wifi ,以下是我的 macos 输出
    An asterisk (*) denotes that a network service is disabled.
    USB 10/100/1000 LAN
    Apple USB Ethernet Adapter
    Wi-Fi
    Bluetooth PAN
    Thunderbolt Bridge
    $sudo networksetup -setdnsservers 'Wi-Fi' 202.106.196.115 202.106.0.20 114.114.114.114 //修改nameserver
    $networksetup -getdnsservers Wi-Fi //查看对应的nameserver, 跟 scutil --dns 类似

    如上, 只要是你的nameserver工作正常那么DNS就肯定回复了

    删掉所有DNS nameserver:

    One note to anyone wanting to remove the DNS, just write “empty” (without the quotes) instead of the DNS: sudo networksetup -setdnsservers <networkservice> empty

    networksetup用法

    查看设备和配置

    $networksetup -listallnetworkservices
    An asterisk (*) denotes that a network service is disabled.
    USB 10/100/1000 LAN
    Apple USB Ethernet Adapter
    Wi-Fi
    Bluetooth PAN
    Thunderbolt Bridge
    Thunderbolt Bridge 2
    #查看网卡配置
    $networksetup -getinfo "USB 10/100/1000 LAN"
    DHCP Configuration
    IP address: 30.25.25.195
    Subnet mask: 255.255.255.128
    Router: 30.25.25.254
    Client ID:
    IPv6 IP address: none
    IPv6 Router: none
    Ethernet Address: 44:67:52:02:16:d4
    $networksetup -listallhardwareports
    Hardware Port: USB 10/100/1000 LAN
    Device: en7
    Ethernet Address: 44:67:52:02:16:d4
    Hardware Port: Wi-Fi
    Device: en0
    Ethernet Address: 88:66:5a:10:e4:2b
    Hardware Port: Thunderbolt Bridge
    Device: bridge0
    Ethernet Address: 82:0a:d5:01:b4:00
    VLAN Configurations
    ===================
    $networksetup -getinfo "Thunderbolt Bridge"
    DHCP Configuration
    Client ID:
    IPv6: Automatic
    IPv6 IP address: none
    IPv6 Router: none
    //查看wifi和热点
    networksetup -listpreferredwirelessnetworks en0
    networksetup -getairportnetwork "en0"

    dhcp、route、domain配置

    [-setmanual networkservice ip subnet router]
    [-setdhcp networkservice [clientid]]
    [-setbootp networkservice]
    [-setmanualwithdhcprouter networkservice ip]
    [-getadditionalroutes networkservice]
    [-setadditionalroutes networkservice [dest1 mask1 gate1] [dest2 mask2 gate2] ..
    . [destN maskN gateN]]
    #给网卡配置ip、网关
    $ networksetup -getinfo "Apple USB Ethernet Adapter" DHCP Configuration
    Client ID:
    IPv6: Automatic
    IPv6 IP address: none
    IPv6 Router: none
    Ethernet Address: (null)
    $networksetup -setmanual "Apple USB Ethernet Adapter" 192.168.100.100 255.255.255.0 192.168.100.1
    $networksetup -getinfo "Apple USB Ethernet Adapter"
    Manual Configuration
    IP address: 192.168.100.100
    Subnet mask: 255.255.255.0
    Router: 192.168.100.1
    IPv6: Automatic
    IPv6 IP address: none
    IPv6 Router: none
    Ethernet Address: (null)

    代理配置

    //ftp
    [-getftpproxy networkservice]
    [-setftpproxy networkservice domain portnumber authenticated username password]
    [-setftpproxystate networkservice on | off]

    网页

    [-getwebproxy networkservice]
    [-setwebproxy networkservice domain portnumber authenticated username password]
    [-setwebproxystate networkservice on | off]
    $networksetup -setwebproxy "Built-in Ethernet" proxy.company.com 80
    $networksetup -setwebproxy "Built-In Ethernet" proxy.company.com 80 On authusername authpassword

    Socks5 代理

    $networksetup -setsocksfirewallproxy "USB 10/100/1000 LAN" 127.0.0.1 13659
    $networksetup -getsocksfirewallproxy "USB 10/100/1000 LAN"
    Enabled: Yes
    Server: 127.0.0.1
    Port: 13659
    Authenticated Proxy Enabled: 0

    总结

    mac同时连wifi(外网或者vpn)和有线(内网), 如果内网干扰了访问外部ip, 就检查路由表,调整顺序. 如果内网干扰了dns,可以通过scutil –dns查看dns顺序到系统配置里去掉不必要的resolver

    参考资料

    macOS的networksetup命令来管理网络

    在Mac下使用脚本重载proxy自动配置脚本(pac)

    ----------------------------------------------------------

    在Mac下使用脚本重载proxy自动配置脚本(pac)

    Mac下对网络设备使用proxy自动配置脚本可以透明使用代理穿墙(可以配合ssh tunnel和tor)。但是我一直不知道如何用脚本让系统重新载入pac文件(在更新了pac的规则时我们需要重载配置)。昨天一位叫做Dylan的网友留言告诉了我如何做,我在此记录一下。在命令行下面:

    networksetup listallnetworkservices

    然后会会返回一个网络连接服务的列表:
    An asterisk (*) denotes that a network service is disabled.
    Bluetooth DUN
    ADSL
    Ethernet
    FireWire
    AirPort
    Bluetooth PAN

    我一般需要配置pac文件的是Ethernet和AirPort,那么相应的重载命令是:


    sudo networksetup -setautoproxystate 'AirPort' off
    sudo networksetup -setautoproxyurl 'AirPort' 'file://localhost/Users/tin/pac/tin.pac'
    sudo networksetup -setautoproxystate 'AirPort' on
    sudo networksetup -setautoproxystate 'Ethernet' off
    sudo networksetup -setautoproxyurl 'Ethernet' 'file://localhost/Users/tin/pac/tin.pac'
    sudo networksetup -setautoproxystate 'Ethernet' on

    然后pac文件就已经被重载完毕啦。

    顺便共享一下我在bash下的alias:

    alias px='ssh -qTfnNC -D 7777 someuser@server.com'
    alias rpx="sudo networksetup -setautoproxystate 'AirPort' off;sudo networksetup -setautoproxyurl 'AirPort' 'file://localhost/Users/tin/pac/tin .pac';sudo networksetup -setautoproxystate 'AirPort' on;sudo networksetup -setautoproxystate 'Ethernet' off;sudo networksetup -setautoproxyurl 'Ethernet' 'file://localhost/Users/tin/pac/tin.pac';sudo networksetup -setautoproxystate 'Ethernet' on"

    (用privoxy把socks代理转变为http代理,safari浏览器就可以使用代理做dns查询了。)

     

     

    通过tcpdump对Unix domain Socket 进行抓包解析


    背景介绍

    大多时候我们可以通过tcpdump对网络抓包分析请求、响应数据来排查问题。但是如果程序是通过Unix Domain Socket方式来访问的那么tcpdump就看不到Unix Domain Socket里面具体流淌的内容了,本文希望找到一种方法达到如同抓包查看网卡内容一样来抓包查看Unix Domain Socket上具体的请求、响应数据。

    socat工具

    类似nc,但是是个超级增强版的nc,主要用作两个独立数据通道之间的双向数据传输的继电器(或者说代理)

    基本原理,通过socat在Unix-Socket和TCP/UDP port之间建立一个代理,然后对代理上的端口进行抓包。

    以下案例通过对 docker.sock 抓包来分析方案。大多时候我们都可以通过curl 来将http post请求发送到docker deamon所监听的端口,这些请求和响应都可以通过tcpdump抓包分析得到。但是我们通过 docker ps / docker run 将命令发给本地 docker-deamon的时候就是将请求翻译成 http请求发给了 docker.sock, 这个时候如果需要排查问题就没法用tcpdump来分析http内容了。

    通过socat 启动一个tcp端口来代理Unix Domain Socket

    启动本地8080端口,将docker.sock映射到8080端口,8080收到的东西都会转给docker.sock,docker.sock收到的东西都通过抓8080的包看到,但是要求应用访问8080而不是docker.sock。

    socat -d -d TCP-LISTEN:8080,fork,bind=127.0.0.1 UNIX:/var/run/docker.sock
    

    缺点:需要修改客户端的访问方式

    sudo curl --unix-socket /var/run/docker.sock http://localhost/images/json
    

    上面的访问方式对8080抓包还是抓不到,因为绕过了我们的代理。

    只能通过如下方式访问8080端口,然后请求通过socat代理转发给docker.sock,整个结果跟访问–unix-socket是一样的,这个时候通过8080端口抓包能看到–unix-socket的工作数据

    sudo curl http://localhost:8080/images/json
    

    通过socat启动另外一个Unix Domain Socket代理,但不是tcpdump抓包

    sudo mv /var/run/docker.sock /var/run/docker.sock.original
    sudo socat -t100 -d -x -v UNIX-LISTEN:/var/run/docker.sock,mode=777,reuseaddr,fork UNIX-CONNECT:/var/run/docker.sock.original
    

    优点:客户端访问方式不变,还是直接访问–unix-socket
    缺点:输出的数据不如tcpdump方便,也就不能用wireshark来分析了

    本质也还是socat代理,只是不是用的一个tcp端口来代理了,而是通过一个unix-socet代理了另外一个unix-socket,直接在代理上输出所有收发的数据

    完美的办法,客户端不用改访问方式,tcpdump也能抓到数据

    sudo mv /var/run/docker.sock /var/run/docker.sock.original
    sudo socat TCP-LISTEN:8089,reuseaddr,fork UNIX-CONNECT:/var/run/docker.sock.original
    sudo socat UNIX-LISTEN:/var/run/docker.sock,fork TCP-CONNECT:127.0.0.1:8089
    

    然后客户端还是直接访问–unix-socket
    sudo curl –unix-socket /var/run/docker.sock http://localhost/images/json

    这个时候通过tcpdump在8089端口上就能抓到数据了

    sudo tcpdump -i lo -netvv port 8089
    

    实际是结合前面两种方法,做了两次代理,先将socket映射到8089端口上,然后再将8089端口映射到一个新的socket上,最后client访问这个新的socket。

    实际流程如下: client -> 新socket -> 8089 -> 原来的socket 这个时候对8089可以任意抓包了

    参考来源:https://mivehind.net/2018/04/20/sniffing-unix-domain-sockets/

    一些socat的其它用法

    把监听在远程主机12.34.56.78上的mysql服务Unix Domain Socket映射到本地的/var/run/mysqld.temp.sock, 这样就可以用mysql -S /var/run/mysqld/mysqld.sock来访问远程主机的mysql服务了。

    socat "UNIX-LISTEN:/var/run/mysqld.temp.sock,reuseaddr,fork" EXEC:"ssh root@12.34.56.78 socat STDIO UNIX-CONNECT:/var/run/mysqld/mysqld.sock"
    

    还可以用下面的命令把12.34.56.78上的mysql映射到本地的5500端口,然后使用mysql -p 5500命令访问。

    socat TCP-LISTEN:5500 EXEC:'ssh root@12.34.56.78 "socat STDIO UNIX-CONNECT:/var/run/mysqld/mysqld.sock"'
    

    把12.34.56.78的udp 161端口映射到本地的1611端口

    socat udp-listen:1611 system:'ssh root@12.34.56.78 "socat stdio udp-connect:remotetarget:161"'    
    

    通过socat启动server,带有各种参数,比nc更灵活

    Server: socat -dd tcp-listen:2000,keepalive,keepidle=10,keepcnt=2,reuseaddr,keepintvl=1 -
    Client: socat -dd - tcp:localhost:2000,keepalive,keepidle=10,keepcnt=2,keepintvl=1
    
    Drop Connection (Unplug Cable, Shut down Link(WiFi/Interface)): sudo iptables -A INPUT -p tcp --dport 2000 -j DROP
    

    启动本地8080端口,将docker.sock映射到8080端口(docker.sock收到的东西都通过抓8080的包看到)。 8080收到的东西都会转给docker.sock

    socat -d -d TCP-LISTEN:8080,fork,bind=99.13.252.208 UNIX:/var/run/docker.sock
    

    用socat远程Unix Domain Socket映射

    除了将我们本地服务通过端口映射提供给其它人访问,我们还可以通过端口转发玩一些更high的。比如下面这条命令,它把监听在远程主机12.34.56.78上的mysql服务Unix Domain Socket映射到本地的/var/run/mysqld.temp.sock,这样,小明就可以用mysql -S /var/run/mysqld/mysqld.temp.sock来访问远程主机的mysql服务了。

    socat "UNIX-LISTEN:/var/run/mysqld.temp.sock,reuseaddr,fork" EXEC:"ssh root@12.34.56.78 socat STDIO UNIX-CONNECT\:/var/run/mysqld/mysqld.sock"
    

    当然,小明如果不喜欢本地Unix Domain Socket,他还可以用下面的命令把12.34.56.78上的mysql映射到本地的5500端口,然后使用mysql -p 5500命令访问。

    socat TCP-LISTEN:5500 EXEC:'ssh root@12.34.56.78 "socat STDIO UNIX-CONNECT:/var/run/mysqld/mysqld.sock"'
    
    # 把监听在远程主机12.34.56.78上的mysql服务Unix Domain Socket映射到本地的/var/run/mysqld.temp.sock, 这样就可以用mysql -S /var/run/mysqld/mysqld.sock来访问远程主机的mysql服务了。
    socat "UNIX-LISTEN:/var/run/mysqld.temp.sock,reuseaddr,fork" EXEC:"ssh root@12.34.56.78 socat STDIO UNIX-CONNECT:/var/run/mysqld/mysqld.sock"
    # 还可以用下面的命令把12.34.56.78上的mysql映射到本地
    # 的5500端口,然后使用mysql -p 5500命令访问。
    socat TCP-LISTEN:5500 EXEC:'ssh root@12.34.56.78 "socat STDIO UNIX-CONNECT:/var/run/mysqld/mysqld.sock"'
    # 把12.34.56.78的udp 161端口映射到本地的1611端口:
    socat udp-listen:1611 system:'ssh root@12.34.56.78 "socat stdio udp-connect:remotetarget:161"'
    

    socat启动网络服务

    在一个窗口中启动 socat 作为服务端,监听在 1000 端口:

    # start a TCP listener at port 1000, and echo back the received data
    $ sudo socat TCP4-LISTEN:1000,fork exec:cat

    另一个窗口用 nc 作为客户端来访问服务端,建立 socket:

    # connect to the local TCP listener at port 1000
    $ nc localhost 1000

    curl 7.57版本可以直接访问 : --unix-socket

    7.57之后的版本才支持curl --unix-socket,大大方便了我们的测试

    //Leave 测试断开一个网络
    curl -H "Content-Type: application/json" -X POST -d '{"NetworkID":"47866b0071e3df7e8053b9c8e499986dfe5c9c4947012db2d963c66ca971ed4b","EndpointID":"3d716436e629701d3ce8650e7a85c133b0ff536aed173c624e4f62a381656862"}' --unix-socket /run/docker/plugins/vlan.sock http://localhost/NetworkDriver.Leave
    
    //取镜像列表
    sudo curl --unix-socket /var/run/docker.sock http://localhost/images/json
    
    curl 11.239.155.97:2376/debug/pprof/goroutine?debug=2
    echo -e "GET /debug/pprof/goroutine?debug=2 HTTP/1.1\r\n" | sudo nc -U /run/docker/plugins/vlan.sock
    echo -e "GET /debug/pprof/goroutine?debug=2 HTTP/1.1\r\n" | sudo nc -U /var/run/docker.sock
    //升级curl到7.57后支持 --unix-socket
    sudo curl --unix-socket /var/run/docker.sock http://localh卡路里ost/images/json
    sudo curl --unix-socket /run/docker/plugins/vlan.sock http://localhost/NetworkDriver.GetCapabilities
    //Leave
    curl -H "Content-Type: application/json" -X POST -d '{"NetworkID":"47866b0071e3df7e8053b9c8e499986dfe5c9c4947012db2d963c66ca971ed4b","EndpointID":"3d716436e629701d3ce8650e7a85c133b0ff536aed173c624e4f62a381656862"}' --unix-socket /run/docker/plugins/vlan.sock http://localhost/NetworkDriver.Leave
    
    sudo curl --no-buffer -XGET --unix-socket /var/run/docker.sock http://localhost/events
    

    Unix Domain Socket工作原理

    接收connect 请求的时候,会申请一个新 socket 给 server 端将来使用,和自己的 socket 建立好连接关系以后,就放到服务器正在监听的 socket 的接收队列中。这个时候,服务器端通过 accept 就能获取到和客户端配好对的新 socket 了。

    主要的连接操作都是在这个函数中完成的。和我们平常所见的 TCP 连接建立过程,这个连接过程简直是太简单了。没有三次握手,也没有全连接队列、半连接队列,更没有啥超时重传.

    Unix Domain Socket和127.0.0.1通信相比,如果包的大小是1K以内,那么性能会有一倍以上的提升,包变大后,性能的提升相对会小一些。

    tcpdump 抓包使用的是 libpcap 这种机制。它的大致原理是:在收发包时,如果该包符合 tcpdump 设置的规则(BPF filter),那么该网络包就会被拷贝一份到 tcpdump 的内核缓冲区,然后以 PACKET_MMAP 的方式将这部分内存映射到 tcpdump 用户空间,解析后就会把这些内容给输出了。

    通过上图你也可以看到,在收包的时候,如果网络包已经被网卡丢弃了,那么 tcpdump 是抓不到它的;在发包的时候,如果网络包在协议栈里被丢弃了,比如因为发送缓冲区满而被丢弃,tcpdump 同样抓不到它。我们可以将 tcpdump 的能力范围简单地总结为:网卡以内的问题可以交给 tcpdump 来处理;对于网卡以外(包括网卡上)的问题,tcpdump 可能就捉襟见肘了。这个时候,你需要在对端也使用 tcpdump 来抓包。

    tcpdump 技巧

    tcpdump -B/–buffer-size=*buffer_size:Set the operating system capture buffer size to buffer_size, in units of KiB (1024 bytes). tcpdump 丢包,造成这种丢包的原因是由于libcap抓到包后,tcpdump上层没有及时的取出,导致libcap缓冲区溢出,从而覆盖了未处理包,此处即显示为*dropped by kernel,注意,这里的kernel并不是说是被linux内核抛弃的,而是被tcpdump的内核,即libcap抛弃掉的

    获取接口设备列表

    tcpdump的-D获取接口设备列表。看到此列表后,可以决定要在哪个接口上捕获流量。

    #tcpdump -D
    1.eth0
    2.bond0
    3.docker0
    4.nflog (Linux netfilter log (NFLOG) interface)
    5.nfqueue (Linux netfilter queue (NFQUEUE) interface)
    6.eth1
    7.usbmon1 (USB bus number 1)
    8.usbmon2 (USB bus number 2)
    9.veth6f2ee76
    10.veth8cb61c2
    11.veth9d9d363
    12.veth16c25ac
    13.veth190f0fc
    14.veth07103d7
    15.veth09119c0
    16.veth9770e1a
    17.any (Pseudo-device that captures on all interfaces)
    18.lo [Loopback]
    # tcpdump -X //解析内容

    TCP 疑难问题的轻量级分析手段:TCP Tracepoints

    Tracepoint 是我分析问题常用的手段之一,在遇到一些疑难问题时,我通常都会把一些相关的 Tracepoint 打开,把 Tracepoint 输出的内容保存起来,然后再在线下环境中分析。通常,我会写一些 Python 脚本来分析这些内容,毕竟 Python 在数据分析上还是很方便的。

    对于 TCP 的相关问题,我也习惯使用这些 TCP Tracepoints 来分析问题。要想使用这些 Tracepoints,你的内核版本需要为 4.16 及以上。这些常用的 TCP Tracepoints 路径位于 /sys/kernel/debug/tracing/events/tcp/ 和 /sys/kernel/debug/tracing/events/sock/,它们的作用如下表所示:



    参考资料:

    https://mivehind.net/2018/04/20/sniffing-unix-domain-sockets/

    https://superuser.com/questions/484671/can-i-monitor-a-local-unix-domain-socket-like-tcpdump

    https://payloads.online/tools/socat

    计算机网络(Computer Networking: A Top-Down Approach)

     

     

    玩转TShark(Wireshark的命令行版)


    在我感叹Wireshark图形界面的强大时候,有时候也抱怨有点慢,或者感叹下要是有命令行界面版该多好啊,实际上TShark就是WireShark的命令行版,WireShark的功能基本都有,还能组合grep/awk等编程处理分析抓包文件。

    下面让我们通过一些例子来学习TShark的常用功能,所有用到的.cap/.pcap等都是通过tcpdump抓到的包。请收藏好,下次碰到类似问题直接用文章中的命令跑一下。

    wireshark问题

    不再展示协议内容

    比如,info列不再显示mysql 的request、response,但是下方的二进制解析能看到select等语句,这种一般是配置文件中 disable 了mysql协议。

    配置文件名:C:\Users\xijun.rxj\AppData\Roaming\Wireshark\disabled_protos

    如果抓包缺失很大(比如进出走两个网卡,实际只抓了一个网卡),那么协议解析后也不会正确显示。

    IO graph图表无法展示数据

    一般是缺数据,先把IO graph关掉再重新打开就可以了,注意图表title显示

    tcp segment of a reassembled pdu

    这个提示是指,wireshark需要将多个tcp协议包重新组合成特定协议内容(比如MySQL,HTTP),但是因为包缺失(或者每个包大小截断了)导致reassembled失败。实际上wireshark已经成功检测到该协议,只是在解析这个协议的时候缺失包导致解析不好。

    这个时候可以试试将指定协议的reassembled属性关掉.

    PDU:Protocol Data Unit

    If the reassembly is successful, the TCP segment containing the last part of the packet will show the packet.
    The reassembly might fail if some TCP segments are missing.

    TCP segment of a reassembled PDU means that:

    1. Wireshark/TShark thinks it knows what protocol is running atop TCP in that TCP segment;
    2. that TCP segment doesn’t contain all of a “protocol data unit” (PDU) for that higher-level protocol, i.e. a packet or protocol message for that higher-level protocol, and doesn’t contain the last part of that PDU, so it’s trying to reassemble the multiple TCP segments containing that higher-level PDU.

    常用命令

    #parse 8507/4444 as mysql protocol, default only parse 3306 as mysql.
    sudo tshark -i eth0 -d tcp.port==8507,mysql -T fields -e mysql.query 'port 8507'
    sudo tshark -i any -c 50 -d tcp.port==4444,mysql -Y " ((tcp.port eq 4444 ) )" -o tcp.calculate_timestamps:true -T fields -e frame.number -e frame.time_epoch -e frame.time_delta_displayed -e ip.src -e tcp.srcport -e tcp.dstport -e ip.dst -e tcp.time_delta -e tcp.stream -e tcp.len -e mysql.query
    #query time
    sudo tshark -i eth0 -Y " ((tcp.port eq 3306 ) and tcp.len>0 )" -o tcp.calculate_timestamps:true -T fields -e frame.number -e frame.time_epoch -e frame.time_delta_displayed -e ip.src -e tcp.srcport -e tcp.dstport -e ip.dst -e tcp.time_delta -e tcp.stream -e tcp.len -e mysql.query
    #每隔3秒钟生成一个新文件,总共生成5个文件后(15秒后)终止抓包,然后包名也按时间规范好了
    sudo tcpdump -t -s 0 tcp port 3306 -w 'dump_%Y-%m-%d_%H:%M:%S.pcap' -G 3 -W 5 -Z root
    #每隔30分钟生成一个包并压缩
    nohup sudo tcpdump -i eth0 -t -s 0 tcp and port 3306 -w 'dump_%Y-%m-%d_%H:%M:%S.pcap' -G 1800 -W 48 -Z root -z gzip &
    #file size 1000M
    nohup sudo tcpdump -i eth0 -t -s 0 tcp and port 3306 -w 'dump_' -C 1000 -W 300 -Z root -z gzip &
    #抓取详细SQL语句, 快速确认client发过来的具体SQL内容:
    sudo tshark -i any -f 'port 8527' -s 0 -l -w - |strings
    sudo tshark -i eth0 -d tcp.port==3306,mysql -T fields -e mysql.query 'port 3306'
    sudo tshark -i eth0 -R "ip.addr==11.163.182.137" -d tcp.port==3306,mysql -T fields -e mysql.query 'port 3306'
    sudo tshark -i eth0 -R "tcp.srcport==62877" -d tcp.port==3001,mysql -T fields -e tcp.srcport -e mysql.query 'port 3001'

    分析mysql的每个SQL响应时间

    应用有输出的日志显示DB慢,DB监控到的日志显示自己很快,经常扯皮,如果直接在应用机器的网卡抓包,然后分析到每个SQL的响应时间,那么DB、网络都可以甩锅了(有时候应用统计的时间包含了应用自身的时间、取连接的时间等)

    tshark -r 213_php.cap -Y "mysql.query or (  tcp.srcport==3306)" -o tcp.calculate_timestamps:true -T fields -e frame.number -e frame.time_epoch  -e frame.time_delta_displayed  -e ip.src -e tcp.srcport -e tcp.dstport -e ip.dst -e tcp.time_delta -e tcp.stream -e tcp.len -e mysql.query |sort -nk9 -nk1
    
    34143 1475902394.645073000 0.000342000 10.100.53.17 3306 40383 10.100.10.213 0.000153000 2273 0
    34145 1475902394.645333000 0.000260000 10.100.53.17 3306 40383 10.100.10.213 0.000253000 2273 77
    34150 1475902394.645537000 0.000204000 10.100.53.17 3306 40383 10.100.10.213 0.000146000 2273 0
    34151 1475902394.645706000 0.000169000 10.100.53.17 3306 40383 10.100.10.213 0.000169000 2273 11
    34153 1475902394.645737000 0.000031000 10.100.10.213 40383 3306 10.100.53.17 0.000031000 2273 21 SET NAMES 'utf8'
    34161 1475902394.646390000 0.000158000 10.100.53.17 3306 40383 10.100.10.213 0.000653000 2273 11
    34162 1475902394.646418000 0.000028000 10.100.10.213 40383 3306 10.100.53.17 0.000028000 2273 22 START TRANSACTION
    34164 1475902394.646713000 0.000295000 10.100.53.17 3306 40383 10.100.10.213 0.000295000 2273 11
    34166 1475902394.646776000 0.000063000 10.100.10.213 40383 3306 10.100.53.17 0.000063000 2273 46 select AUTO_SEQ_t_order.nextval from dual
    34194 1475902394.651468000 0.000909000 10.100.53.17 3306 40383 10.100.10.213 0.004692000 2273 100
    34195 1475902394.651782000 0.000314000 10.100.10.213 40383 3306 10.100.53.17 0.000314000 2273 576 insert into t_order (`out_order_no`,`pk_order`,`uid`,`ytid`,`platform`,`origin_price`,`price`,`partner_id`,`ip`,`sources`,`pay_state`,`type`,`product_type`,`device`,`extension`,`spm`,`ext2`,`createtime`,`pay_channel`,`use_ytid`,`updatetime`) values ('2016100822003361672230261573284','261573284','336167223','336167223','1','500','500','100000','42.49.141.142','2','1','1','2','3','{\"showid\":\"286083\",\"play_url\":\"http:\\/\\/v.youku.com\\/v_show\\/id_XMTczOTM5NjU1Mg==.html\",\"permit_duration\":172800}','','','2016-10-08 12:53:14','201','0','2016-10-08 12:53:14')
    34196 1475902394.653275000 0.001493000 10.100.53.17 3306 40383 10.100.10.213 0.001493000 2273 19
    34197 1475902394.653410000 0.000135000 10.100.10.213 40383 3306 10.100.53.17 0.000135000 2273 370 insert into t_order_product (`fk_order`,`product_id`,`origin_price`,`price`,`discount`,`deliver_state`,`product_url`,`product_name`,`amount`,`ytid`,`sub_product_id`,`createtime`) values ('2016100822003361672230261573284','4000010000','500','500','0','1','http://vip.youku.com','���������������������2:���������������','1','336167223','286083','2016-10-08 12:53:14')
    34198 1475902394.658326000 0.004916000 10.100.53.17 3306 40383 10.100.10.213 0.004916000 2273 19
    34199 1475902394.658407000 0.000081000 10.100.10.213 40383 3306 10.100.53.17 0.000081000 2273 11 commit
    34200 1475902394.659626000 0.001219000 10.100.53.17 3306 40383 10.100.10.213 0.001219000 2273 11
    34201 1475902394.659811000 0.000185000 10.100.10.213 40383 3306 10.100.53.17 0.000185000 2273 22 START TRANSACTION
    34202 1475902394.660054000 0.000243000 10.100.53.17 3306 40383 10.100.10.213 0.000243000 2273 11
    34203 1475902394.660126000 0.000072000 10.100.10.213 40383 3306 10.100.53.17 0.000072000 2273 125 SELECT * FROM t_order where ( out_order_no = '2016100822003361672230261573284' ) AND ( ytid = '336167223' ) FOR UPDATE
    34209 1475902394.661970000 0.001844000 10.100.53.17 3306 40383 10.100.10.213 0.001844000 2273 2214
    34211 1475902394.662069000 0.000099000 10.100.10.213 40383 3306 10.100.53.17 0.000089000 2273 122 update t_order set `pay_state`='2',`updatetime`='2016-10-08 12:53:14' where pk_order='261573284' and ytid='336167223'
    34213 1475902394.662917000 0.000848000 10.100.53.17 3306 40383 10.100.10.213 0.000848000 2273 19
    34216 1475902394.663049000 0.000088000 10.100.10.213 40383 3306 10.100.53.17 0.000132000 2273 11 commit
    34225 1475902394.664204000 0.000264000 10.100.53.17 3306 40383 10.100.10.213 0.001155000 2273 11
    34226 1475902394.664269000 0.000065000 10.100.10.213 40383 3306 10.100.53.17 0.000065000 2273 115 SELECT * FROM t_order where ( out_order_no = '2016100822003361672230261573284' ) AND ( ytid = '336167223' )
    34235 1475902394.665694000 0.000061000 10.100.53.17 3306 40383 10.100.10.213 0.001425000 2273 2214
    34354 1475902394.681464000 0.000157000 10.100.53.17 3306 40383 10.100.10.213 0.000187000 2273 0
    34174 1475902394.648046000 0.001123000 10.100.53.19 3306 33471 10.100.10.213 0.000151000 2275 0
    34176 1475902394.648331000 0.000285000 10.100.53.19 3306 33471 10.100.10.213 0.000278000 2275 77
    34179 1475902394.648482000 0.000151000 10.100.53.19 3306 33471 10.100.10.213 0.000127000 2275 0
    34180 1475902394.648598000 0.000116000 10.100.53.19 3306 33471 10.100.10.213 0.000116000 2275 11
    34181 1475902394.648606000 0.000008000 10.100.10.213 33471 3306 10.100.53.19 0.000008000 2275 21 SET NAMES 'utf8'
    34182 1475902394.648846000 0.000240000 10.100.53.19 3306 33471 10.100.10.213 0.000240000 2275 11
    34183 1475902394.648885000 0.000039000 10.100.10.213 33471 3306 10.100.53.19 0.000039000 2275 380 select pk_auto_renew_account as account_id,fk_user as uid,platform,ytid,fk_member_conf_id as member_id,fk_product_id as product_id,price,fk_pay_channel as pay_channel,renew_type,fk_order,fk_auto_renew_subscribe_log as fk_subscribe_log,state,memo,nexttime,createtime,updatetime from t_auto_renew_account where ( ytid = '354295193' ) AND ( platform = '1' ) AND ( state <> '3' )
    34184 1475902394.650040000 0.001155000 10.100.53.19 3306 33471 10.100.10.213 0.001155000 2275 1727
    34189 1475902394.650559000 0.000519000 10.100.53.19 3306 33471 10.100.10.213 0.000198000 2275 0

    或者:
    tshark -r gege_drds.pcap -Y “ ((tcp.srcport eq 3306 ) and tcp.len>0 )” -o tcp.calculate_timestamps:true -T fields -e frame.number -e frame.time_epoch -e frame.time_delta_displayed -e ip.src -e tcp.srcport -e tcp.dstport -e ip.dst -e tcp.time_delta -e tcp.stream -e tcp.len -e tcp.analysis.ack_rtt

    这个命令跑出来,倒数第四列基本就是rt

    967     1548148159.346612000    0.000442000     192.168.4.18    3306    44026   192.168.100.30  0.005255000     17      1576    0.005255000
    969     1548148159.346826000    0.000214000     192.168.4.18    3306    44090   192.168.100.30  0.005425000     15      1576    0.005425000
    973     1548148159.347428000    0.000602000     192.168.4.18    3306    44070   192.168.100.30  0.005517000     8       2500    0.005517000
    979     1548148159.348640000    0.001212000     192.168.4.18    3306    44048   192.168.100.30  0.005517000     22      2462    0.005517000
    981     1548148159.348751000    0.000111000     192.168.4.18    3306    44066   192.168.100.30  0.005855000     21      2692    0.005855000
    983     1548148159.348844000    0.000093000     192.168.4.18    3306    44046   192.168.100.30  0.004589000     3       2692    0.004589000
    985     1548148159.348981000    0.000137000     192.168.4.18    3306    44012   192.168.100.30  0.004885000     19      2443    0.004885000
    990     1548148159.349293000    0.000312000     192.168.4.18    3306    44074   192.168.100.30  0.005923000     5       2692    0.005923000
    994     1548148159.349671000    0.000378000     192.168.4.18    3306    44080   192.168.100.30  0.004889000     4       2730    0.004889000
    1009    1548148159.350591000    0.000920000     192.168.4.18    3306    44022   192.168.100.30  0.004187000     14      1448    0.004187000
    1010    1548148159.350592000    0.000001000     192.168.4.18    3306    44022   192.168.100.30  0.000001000     14      1052    
    1013    1548148159.350790000    0.000198000     192.168.4.18    3306    44002   192.168.100.30  0.005998000     0       1576    0.005998000
    1026    1548148159.352207000    0.001417000     192.168.4.18    3306    44026   192.168.100.30  0.005348000     17      1448    0.005348000
    1027    1548148159.352217000    0.000010000     192.168.4.18    3306    44026   192.168.100.30  0.000010000     17      1052    
    1036    1548148159.352973000    0.000756000     192.168.4.18    3306    44090   192.168.100.30  0.005940000     15      2500    0.005940000
    1041    1548148159.353683000    0.000710000     192.168.4.18    3306    44070   192.168.100.30  0.005190000     8       2692    0.005190000
    1043    1548148159.353737000    0.000054000     192.168.4.18    3306    44066   192.168.100.30  0.004635000     21      1448    0.004635000
    1044    1548148159.353749000    0.000012000     192.168.4.18    3306    44066   192.168.100.30  0.000012000     21      128     
    1051    1548148159.354289000    0.000540000     192.168.4.18    3306    44046   192.168.100.30  0.004911000     3       1576    0.004911000
    1054    1548148159.354511000    0.000222000     192.168.4.18    3306    44080   192.168.100.30  0.004515000     4       1576    0.004515000
    1055    1548148159.354530000    0.000019000     192.168.4.18    3306    44074   192.168.100.30  0.004909000     5       1576    0.004909000
    1065    1548148159.355412000    0.000882000     192.168.4.18    3306    44012   192.168.100.30  0.005217000     19      2692    0.005217000
    1067    1548148159.355496000    0.000084000     192.168.4.18    3306    44048   192.168.100.30  0.005231000     22      2610    0.005231000
    1072    1548148159.356111000    0.000615000     192.168.4.18    3306    44052   192.168.100.30  0.005830000     24      2730    0.005830000
    1076    1548148159.356545000    0.000434000     192.168.4.18    3306    44022   192.168.100.30  0.005615000     14      2692    0.005615000
    1079    1548148159.357012000    0.000467000     192.168.4.18    3306    44002   192.168.100.30  0.005966000     0       2462    0.005966000
    1082    1548148159.357235000    0.000223000     192.168.4.18    3306    44072   192.168.100.30  0.004817000     23      2692    0.004817000
    1093    1548148159.359244000    0.002009000     192.168.4.18    3306    44070   192.168.100.30  0.005188000     8       1576    0.005188000
    

    MySQL响应时间直方图【第八列的含义– Time since previous frame in this TCP stream: seconds】

    tshark -r gege_drds.pcap -Y "mysql.query or (tcp.srcport==3306  and tcp.len>60)" -o tcp.calculate_timestamps:true -T fields -e frame.number -e frame.time_epoch  -e frame.time_delta_displayed  -e ip.src -e tcp.srcport -e tcp.dstport -e ip.dst -e tcp.time_delta -e tcp.stream -e tcp.len | awk 'BEGIN {sum0=0;sum3=0;sum10=0;sum30=0;sum50=0;sum100=0;sum300=0;sum500=0;sum1000=0;sumo=0;count=0;sum=0} {rt=$8; if(rt>=0.000) sum=sum+rt; count=count+1; if(rt<=0.000) sum0=sum0+1; else if(rt<0.003) sum3=sum3+1 ; else if(rt<0.01) sum10=sum10+1; else if(rt<0.03) sum30=sum30+1; else if(rt<0.05) sum50=sum50+1; else if(rt < 0.1) sum100=sum100+1; else if(rt < 0.3) sum300=sum300+1; else if(rt < 0.5) sum500=sum500+1; else if(rt < 1) sum1000=sum1000+1; else sum=sum+1 ;} END{printf "-------------\n3ms:\t%s \n10ms:\t%s \n30ms:\t%s \n50ms:\t%s \n100ms:\t%s \n300ms:\t%s \n500ms:\t%s \n1000ms:\t%s \n>1s:\t %s\n-------------\navg: %.6f \n" , sum3,sum10,sum30,sum50,sum100,sum300,sum500,sum1000,sumo,sum/count;}'
    
     -------------
    3ms:    145037 
    10ms:    78811 
    30ms:    7032 
    50ms:    2172 
    100ms:    1219 
    300ms:    856 
    500ms:    449 
    1000ms:118
    >1s:    0
    -------------
    avg: 0.005937 
    

    对于rt分析,要注意一个query多个response情况(response结果多,分包了),分析这种rt的时候只看query之后的第一个response,其它连续response需要忽略掉。

    有时候应用说修改库存的代码都加了事务,但是数据库里库存对不上,这锅压力好大,抓个包看看应用发过来的SQL是啥

    开发测试环境上通过如下命令也可以直接用tshark抓包分析SQL语句:

    sudo tshark -i eth0 -d tcp.port==3306,mysql -T fields -e mysql.query 'port 3306'
    

    这样就直接看到发出的SQL是否是autocommit=1了

    HTTP响应时间分析

    //按秒汇总每个http response 耗时
    tshark -r dump.pcap -Y 'http.time>0 ' -T fields -e frame.number -e frame.time_epoch -e frame.time_delta_displayed -e ip.src -e ip.dst -e tcp.stream -e http.request.full_uri -e http.response.code -e http.time | awk '{ print int($2), $8 }' | awk '{ sum[$1]+=$2; count[$1]+=1 ;} END { for (key in count) { printf "time= %s \t count=%s \t avg=%.6f \n", key, count[key], sum[key]/count[key] } }' | sort -k2n | awk '{ print strftime("%c",$2), $0 }'
    //on macOS
    tshark -r dump.pcap -Y 'http.response_for.uri contains "health" ' -T fields -e frame.number -e frame.time_epoch -e frame.time_delta_displayed -e ip.src -e ip.dst -e tcp.stream -e http.request.full_uri -e http.response_for.uri -e http.time | awk '{ print int($2/10), $8 }' | awk '{ sum[$1]+=$2; count[$1]+=1 ;} END { for (key in count) { printf "time= %s \t count=%s \t avg=%.6f \n", key, count[key], sum[key]/count[key] } }' | sort -k2n | gawk '{ print strftime("%c",$2), $0 }'

    按http response分析响应时间

    第三列是RT,倒数第二列是stream,同一个stream是一个连接。对应http response 200的是请求响应结果的RT

    # tshark -nr 10.cap -o tcp.calculate_timestamps:true -Y "http.request or http.response" -T fields -e frame.number -e frame.time_epoch -e tcp.time_delta -e ip.src -e ip.dst -e tcp.stream -e http.request.full_uri -e http.response.code -e http.response.phrase | sort -nk6 -nk1
    82579 1631791992.105383000 0.000113000 172.26.2.13 172.26.13.107 1198 http://plantegg/ajax.sword
    83167 1631791992.261663000 0.156042000 172.26.13.107 172.26.2.13 1198 200
    84917 1631791992.775011000 0.513106000 172.26.2.13 172.26.13.107 1198 http://plantegg/ajax.sword
    86388 1631791993.188458000 0.413018000 172.26.13.107 172.26.2.13 1198 200
    87391 1631791993.465156000 0.276608000 172.26.2.13 172.26.13.107 1198 http://plantegg/ajax.sword
    88067 1631791993.645780000 0.179832000 172.26.13.107 172.26.2.13 1198 200
    89364 1631791993.994322000 0.348324000 172.26.2.13 172.26.13.107 1198 http://plantegg/ajax.sword
    89843 1631791994.140131000 0.145169000 172.26.13.107 172.26.2.13 1198 200
    91387 1631791994.605527000 0.465245000 172.26.2.13 172.26.13.107 1198 http://plantegg/ajax.sword
    92271 1631791994.920607000 0.314639000 172.26.13.107 172.26.2.13 1198 200
    93491 1631791995.323424000 0.402724000 172.26.2.13 172.26.13.107 1198 http://plantegg/ajax.sword
    93860 1631791995.403614000 0.079834000 172.26.13.107 172.26.2.13 1198 200
    97221 1631791996.347307000 0.943423000 172.26.2.13 172.26.13.107 1198 http://plantegg/ajax.sword
    97862 1631791996.544563000 0.196448000 172.26.13.107 172.26.2.13 1198 200
    99613 1631791997.065735000 0.521095000 172.26.2.13 172.26.13.107 1198 http://plantegg/ajax.sword
    82714 1631791992.141943000 0.000122000 172.26.2.13 172.26.12.147 1199 http://plantegg/ajax.sword
    83055 1631791992.235637000 0.093471000 172.26.12.147 172.26.2.13 1199 200
    84789 1631791992.739133000 0.503423000 172.26.2.13 172.26.12.147 1199 http://plantegg/ajax.sword
    85525 1631791992.946220000 0.206860000 172.26.12.147 172.26.2.13 1199 200
    88208 1631791993.677995000 0.731490000 172.26.2.13 172.26.12.147 1199 http://plantegg/ajax.sword
    88638 1631791993.800956000 0.122637000 172.26.12.147 172.26.2.13 1199 200
    91010 1631791994.476918000 0.675911000 172.26.2.13 172.26.12.147 1199 http://plantegg/ajax.sword
    92079 1631791994.874566000 0.397357000 172.26.12.147 172.26.2.13 1199 200
    94480 1631791995.581990000 0.707200000 172.26.2.13 172.26.12.147 1199 http://plantegg/ajax.sword
    94764 1631791995.665365000 0.082906000 172.26.12.147 172.26.2.13 1199 200
    96241 1631791996.090803000 0.425378000 172.26.2.13 172.26.12.147 1199 http://plantegg/ajax.sword
    96731 1631791996.215406000 0.124276000 172.26.12.147 172.26.2.13 1199 200
    98832 1631791996.818172000 0.602695000 172.26.2.13 172.26.12.147 1199 http://plantegg/ajax.sword
    99735 1631791997.105453000 0.286845000 172.26.12.147 172.26.2.13 1199 200
    83462 1631791992.351494000 0.000042000 172.26.2.13 172.26.9.77 1200 http://plantegg/ajax.sword
    84309 1631791992.558541000 0.206305000 172.26.9.77 172.26.2.13 1200 200
    86253 1631791993.152426000 0.593767000 172.26.2.13 172.26.9.77 1200 http://plantegg/ajax.sword
    86740 1631791993.270402000 0.117311000 172.26.9.77 172.26.2.13 1200 200
    89775 1631791994.112908000 0.842414000 172.26.2.13 172.26.9.77 1200 http://plantegg/ajax.sword
    90429 1631791994.312254000 0.199015000 172.26.9.77 172.26.2.13 1200 200
    92840 1631791995.086191000 0.773857000 172.26.2.13 172.26.9.77 1200 http://plantegg/ajax.sword
    93262 1631791995.257123000 0.170488000 172.26.9.77 172.26.2.13 1200 200

    改进版本,每10秒钟统计http response耗时,最后按时间排序输出:

    tshark -r 0623.pcap -Y 'http.time>0 ' -T fields -e frame.number -e frame.time_epoch -e frame.time_delta_displayed -e ip.src -e ip.dst -e tcp.stream -e http.request.full_uri -e http.response_for.uri -e http.time | awk '{ print int($2/10), $8 }' | awk '{ sum[$1]+=$2; count[$1]+=1 ;} END { for (key in count) { printf "time= %s \t count=%s \t avg=%.6f \n", key, count[key], sum[key]/count[key] } }' | sort -k2n | gawk '{ print strftime("%c",$2*10), $0 }'
    四 6/23 14:17:30 2022 time= 165596505 count=15289 avg=0.012168
    四 6/23 14:17:40 2022 time= 165596506 count=38725 avg=0.013669
    四 6/23 14:17:50 2022 time= 165596507 count=42545 avg=0.014140
    四 6/23 14:18:00 2022 time= 165596508 count=45613 avg=0.016915
    四 6/23 14:18:10 2022 time= 165596509 count=49033 avg=0.018768
    四 6/23 14:18:20 2022 time= 165596510 count=49797 avg=0.025015
    四 6/23 14:18:30 2022 time= 165596511 count=49670 avg=0.034057
    四 6/23 14:18:40 2022 time= 165596512 count=49524 avg=0.040647
    四 6/23 14:18:50 2022 time= 165596513 count=49204 avg=0.034251
    四 6/23 14:19:00 2022 time= 165596514 count=48024 avg=0.037120
    四 6/23 14:19:10 2022 time= 165596515 count=49301 avg=0.041453
    四 6/23 14:19:20 2022 time= 165596516 count=42174 avg=0.049191
    四 6/23 14:19:30 2022 time= 165596517 count=49437 avg=0.050924
    四 6/23 14:19:40 2022 time= 165596518 count=49563 avg=0.050709
    四 6/23 14:19:50 2022 time= 165596519 count=49517 avg=0.047916
    四 6/23 14:20:00 2022 time= 165596520 count=48256 avg=0.057453
    四 6/23 14:20:10 2022 time= 165596521 count=49412 avg=0.053587
    四 6/23 14:20:20 2022 time= 165596522 count=51361 avg=0.053422
    四 6/23 14:20:30 2022 time= 165596523 count=45610 avg=0.067171
    四 6/23 14:20:40 2022 time= 165596524 count=54 avg=2.886536

    分析包的总概览:

    $ capinfos rsb2.cap
    File name: rsb2.cap
    File type: Wireshark/tcpdump/... - pcap
    File encapsulation: Ethernet
    Packet size limit: file hdr: 65535 bytes
    Number of packets: 510 k
    File size: 143 MB
    Data size: 135 MB
    Capture duration: 34 seconds
    Start time: Tue Jun 7 11:15:31 2016
    End time: Tue Jun 7 11:16:05 2016
    Data byte rate: 3997 kBps
    Data bit rate: 31 Mbps
    Average packet size: 265.62 bytes
    Average packet rate: 15 kpackets/sec
    SHA1: a8367d0d291eab6ba78732d092ae72a5305756a2
    RIPEMD160: ec991772819f316d2f629745d4b58fb861e41fc6
    MD5: 53975139fa49581eacdb42bd967cbd58
    Strict time order: False

    分析每两个IP之间的流量:

    $ tshark -r retrans.cap -q -z 'conv,ip'
    ================================================================================
    IPv4 Conversations
    Filter:<No Filter>
    | <- | | -> | | Total | Relative | Duration |
    | Frames Bytes | | Frames Bytes | | Frames Bytes | Start | |
    100.98.50.214 <-> 10.117.41.213 425 60647 544 350182 969 410829 0.856983000 88.7073
    10.252.138.13 <-> 10.117.41.213 381 131639 451 45706 832 177345 3.649894000 79.5370
    10.168.127.178 <-> 10.117.41.213 335 118164 390 39069 725 157233 3.456698000 81.2639
    10.168.246.105 <-> 10.117.41.213 435 23490 271 14634 706 38124 0.000000000 89.7614
    10.117.49.244 <-> 10.117.41.213 452 24408 221 11934 673 36342 0.289990000 89.6024
    100.97.197.0 <-> 10.117.41.213 45 4226 107 7310 152 11536 0.538867000 88.0736
    100.97.196.0 <-> 10.117.41.213 48 4576 102 6960 150 11536 0.524268000 89.0840
    100.97.196.128 <-> 10.117.41.213 39 3462 90 6116 129 9578 0.573839000 88.0728
    100.97.197.128 <-> 10.117.41.213 27 1998 81 5562 108 7560 1.071232000 87.0382
    100.98.148.129 <-> 10.117.41.213 55 3630 37 2442 92 6072 0.571963000 86.7362
    ================================================================================

    分析每个会话的流量:

    $ tshark -r retrans.cap -q -z 'conv,tcp'
    ================================================================================
    TCP Conversations
    Filter:<No Filter>
    | <- | | -> | | Total | Relative | Duration |
    | Frames Bytes | | Frames Bytes | | Frames Bytes | Start | |
    10.117.41.213:33362 <-> 100.98.50.214:3306 143 107183 108 17345 251 124528 9.556973000 79.9993
    10.117.41.213:32695 <-> 100.98.50.214:3306 131 95816 118 17843 249 113659 3.464596000 54.7814
    10.117.41.213:33737 <-> 100.98.50.214:3306 107 67199 82 11842 189 79041 69.539519000 13.0781
    10.117.41.213:33736 <-> 100.98.50.214:3306 58 37851 31 4895 89 42746 69.539133000 8.2015
    10.117.41.213:33735 <-> 100.98.50.214:3306 51 37654 27 3338 78 40992 69.538573000 20.0257
    10.117.41.213:33681 <-> 100.98.50.214:3306 22 2367 15 2480 37 4847 58.237482000 0.0082
    10.252.138.13:17926 <-> 10.117.41.213:3306 13 3454 17 1917 30 5371 77.462089000 0.2816
    10.168.127.178:21250 <-> 10.117.41.213:3306 13 4926 17 2267 30 7193 77.442197000 0.6282
    10.252.138.13:17682 <-> 10.117.41.213:3306 13 5421 17 2267 30 7688 34.945805000 0.7274
    10.168.127.178:21001 <-> 10.117.41.213:3306 18 9872 11 1627 29 11499 21.220800000 35.0242
    10.252.138.13:17843 <-> 10.117.41.213:3306 13 4453 15 1510 28 5963 59.176447000 10.8169
    10.168.127.178:20927 <-> 10.117.41.213:3306 12 4414 15 1510 27 5924 13.686763000 0.1860
    10.252.138.13:17481 <-> 10.117.41.213:3306 11 4360 16 1564 27 5924 3.649894000 0.1810
    10.252.138.13:17928 <-> 10.117.41.213:3306 11 3077 15 1461 26 4538 77.467248000 0.6720
    10.168.127.178:21241 <-> 10.117.41.213:3306 11 3077 15 1461 26 4538 77.376858000 0.4669
    10.168.127.178:21201 <-> 10.117.41.213:3306 12 3971 14 2571 26 6542 64.890147000 5.4010
    10.168.127.178:21184 <-> 10.117.41.213:3306 12 6775 14 1794 26 8569 64.073021000 5.6804
    10.252.138.13:17545 <-> 10.117.41.213:3306 11 4379 15 1510 26 5889 13.940379000 0.1845
    10.168.127.178:20815 <-> 10.117.41.213:3306 11 4360 15 1510 26 5870 3.456698000 0.1901
    10.252.138.13:17864 <-> 10.117.41.213:3306 12 2985 12 1129 24 4114 59.855131000 9.7005
    10.252.138.13:17820 <-> 10.117.41.213:3306 11 5529 13 1740 24 7269 49.537379000 0.1669
    10.252.138.13:17757 <-> 10.117.41.213:3306 11 6006 13 1740 24 7746 45.507148000 0.7587
    10.252.138.13:17677 <-> 10.117.41.213:3306 11 5529 13 1740 24 7269 34.806484000 0.5017
    10.168.127.178:21063 <-> 10.117.41.213:3306 11 3848 13 1390 24 5238 29.902032000 0.0133
    10.252.138.13:17516 <-> 10.117.41.213:3306 11 5985 13 1740 24 7725 11.505585000 0.1494
    10.252.138.13:17507 <-> 10.117.41.213:3306 11 3570 13 1424 24 4994 9.652955000 0.0151
    10.252.138.13:17490 <-> 10.117.41.213:3306 11 5985 13 1740 24 7725 4.865639000 0.1275

    分析每个包的response time:

    $ tshark -r rsb2.cap -o tcp.calculate_timestamps:true -T fields -e frame.number -e frame.time_epoch -e ip.src -e ip.dst -e tcp.stream -e tcp.len -e tcp.analysis.initial_rtt -e tcp.time_delta

    1481 1465269331.308138000 100.98.199.36 10.25.92.13 302 0 0.002276000
    1482 1465269331.308186000 10.25.92.13 100.98.199.36 361 11 0.000063000
    1483 1465269331.308209000 100.98.199.36 10.25.92.13 496 0 0.004950000
    1484 1465269331.308223000 100.98.199.36 10.25.92.13 513 0 0.000000000
    1485 1465269331.308238000 100.98.199.36 10.25.92.13 326 0 0.055424000
    1486 1465269331.308246000 100.98.199.36 10.25.92.13 514 0 0.000000000
    1487 1465269331.308261000 10.25.92.71 10.25.92.13 48 0 0.000229000
    1488 1465269331.308277000 100.98.199.36 10.25.92.13 254 0 0.055514000
    1489 1465269331.308307000 100.98.199.36 10.25.92.13 292 0 0.002096000
    1490 1465269331.308383000 100.98.199.36 10.25.92.13 308 0 0.055406000
    1491 1465269331.308403000 100.98.199.36 10.25.92.13 75 0 0.041664000
    1492 1465269331.308421000 100.98.199.36 10.25.92.13 291 0 0.001973000
    1493 1465269331.308532000 100.98.199.36 10.25.92.13 509 0 0.002100000
    1494 1465269331.308567000 100.98.199.36 10.25.92.13 123 0 0.041560000
    1495 1465269331.308576000 100.98.199.36 10.25.92.13 232 11 0.063317000
    1496 1465269331.308584000 100.98.199.36 10.25.92.13 465 655 0.018121000
    1497 1465269331.308626000 100.98.199.36 10.25.92.13 61 655 0.042409000
    1498 1465269331.308637000 100.98.199.36 10.25.92.13 146 0 0.001520000
    1499 1465269331.308639000 100.98.199.36 10.25.92.13 510 0 0.001460000
    1500 1465269331.308645000 100.98.199.36 10.25.92.13 237 11 0.063273000

    分析有问题的包、概览:

    $ tshark -r retrans.cap -q -z 'expert,note'
    Errors (22)
    =============
    Frequency Group Protocol Summary
    22 Malformed MySQL Malformed Packet (Exception occurred)
    Warns (749)
    =============
    Frequency Group Protocol Summary
    538 Sequence TCP ACKed segment that wasn't captured (common at capture start)
    192 Sequence TCP Connection reset (RST)
    19 Sequence TCP Previous segment not captured (common at capture start)
    Notes (1162)
    =============
    Frequency Group Protocol Summary
    84 Sequence TCP TCP keep-alive segment
    274 Sequence TCP Duplicate ACK (#1)
    37 Sequence TCP ACK to a TCP keep-alive segment
    23 Sequence TCP This frame is a (suspected) retransmission
    262 Sequence TCP Duplicate ACK (#2)
    259 Sequence TCP Duplicate ACK (#3)
    141 Sequence TCP Duplicate ACK (#4)
    69 Sequence TCP Duplicate ACK (#5)
    7 Sequence TCP Duplicate ACK (#6)
    5 Sequence TCP This frame is a (suspected) spurious retransmission
    1 Sequence TCP Duplicate ACK (#7)

    分析rtt、丢包、deplicate等等:

    $ tshark -r retrans.cap -q -z io,stat,1,”AVG(tcp.analysis.ack_rtt)tcp.analysis.ack_rtt”,”COUNT(tcp.analysis.retransmission) tcp.analysis.retransmission”,”COUNT(tcp.analysis.fast_retransmission) tcp.analysis.fast_retransmission”,”COUNT(tcp.analysis.duplicate_ack) tcp.analysis.duplicate_ack”,”COUNT(tcp.analysis.lost_segment) tcp.analysis.lost_segment”,”MIN(tcp.window_size)tcp.window_size”

    ===================================================================================
    | IO Statistics |
    | |
    | Duration: 89.892365 secs |
    | Interval: 2 secs |
    | |
    | Col 1: AVG(tcp.analysis.ack_rtt)tcp.analysis.ack_rtt |
    | 2: COUNT(tcp.analysis.retransmission) tcp.analysis.retransmission |
    | 3: COUNT(tcp.analysis.fast_retransmission) tcp.analysis.fast_retransmission |
    | 4: COUNT(tcp.analysis.duplicate_ack) tcp.analysis.duplicate_ack |
    | 5: COUNT(tcp.analysis.lost_segment) tcp.analysis.lost_segment |
    | 6: AVG(tcp.window_size)tcp.window_size |
    |---------------------------------------------------------------------------------|
    | |1 |2 |3 |4 |5 |6 | |
    | Interval | AVG | COUNT | COUNT | COUNT | COUNT | AVG | |
    |-------------------------------------------------------------| |
    | 0 <> 2 | 0.001152 | 0 | 0 | 0 | 0 | 4206 | |
    | 2 <> 4 | 0.002088 | 0 | 0 | 0 | 1 | 6931 | |
    | 4 <> 6 | 0.001512 | 0 | 0 | 0 | 0 | 7099 | |
    | 6 <> 8 | 0.002859 | 0 | 0 | 0 | 0 | 7171 | |
    | 8 <> 10 | 0.001716 | 0 | 0 | 0 | 0 | 6472 | |
    | 10 <> 12 | 0.000319 | 0 | 0 | 0 | 2 | 5575 | |
    | 12 <> 14 | 0.002030 | 0 | 0 | 0 | 0 | 6922 | |
    | 14 <> 16 | 0.003371 | 0 | 0 | 0 | 2 | 5884 | |
    | 16 <> 18 | 0.000138 | 0 | 0 | 0 | 1 | 3480 | |
    | 18 <> 20 | 0.000999 | 0 | 0 | 0 | 4 | 6665 | |
    | 20 <> 22 | 0.000682 | 0 | 0 | 41 | 2 | 5484 | |
    | 22 <> 24 | 0.002302 | 2 | 0 | 19 | 0 | 7127 | |
    | 24 <> 26 | 0.000156 | 1 | 0 | 22 | 0 | 3042 | |
    | 26 <> 28 | 0.000000 | 1 | 0 | 19 | 1 | 152 | |
    | 28 <> 30 | 0.001498 | 1 | 0 | 24 | 0 | 5615 | |
    | 30 <> 32 | 0.000235 | 0 | 0 | 44 | 0 | 1880 | |

    分析丢包、duplicate ack:

    $ tshark -r retrans.cap -q -z io,stat,5,”COUNT(tcp.analysis.retransmission) tcp.analysis.retransmission”,”COUNT(tcp.analysis.fast_retransmission) tcp.analysis.fast_retransmission”,”COUNT(tcp.analysis.duplicate_ack) tcp.analysis.duplicate_ack”,”COUNT(tcp.analysis.lost_segment) tcp.analysis.lost_segment”

    ===================================================================================
    | IO Statistics |
    | |
    | Duration: 89.892365 secs |
    | Interval: 5 secs |
    | |
    | Col 1: COUNT(tcp.analysis.retransmission) tcp.analysis.retransmission |
    | 2: COUNT(tcp.analysis.fast_retransmission) tcp.analysis.fast_retransmission |
    | 3: COUNT(tcp.analysis.duplicate_ack) tcp.analysis.duplicate_ack |
    | 4: COUNT(tcp.analysis.lost_segment) tcp.analysis.lost_segment |
    |---------------------------------------------------------------------------------|
    | |1 |2 |3 |4 | |
    | Interval | COUNT | COUNT | COUNT | COUNT | |
    |------------------------------------------| |
    | 0 <> 5 | 0 | 0 | 0 | 1 | |
    | 5 <> 10 | 0 | 0 | 0 | 0 | |
    | 10 <> 15 | 0 | 0 | 0 | 4 | |
    | 15 <> 20 | 0 | 0 | 0 | 5 | |
    | 20 <> 25 | 3 | 0 | 67 | 2 | |
    | 25 <> 30 | 2 | 0 | 58 | 1 | |
    | 30 <> 35 | 0 | 0 | 112 | 0 | |
    | 35 <> 40 | 1 | 0 | 156 | 0 | |
    | 40 <> 45 | 0 | 0 | 127 | 2 | |
    | 45 <> 50 | 1 | 0 | 91 | 0 | |
    | 50 <> 55 | 0 | 0 | 63 | 0 | |
    | 55 <> 60 | 0 | 0 | 65 | 2 | |
    | 60 <> 65 | 2 | 0 | 41 | 0 | |
    | 65 <> 70 | 3 | 0 | 34 | 2 | |
    | 70 <> 75 | 7 | 0 | 55 | 0 | |
    | 75 <> 80 | 3 | 0 | 68 | 0 | |
    | 80 <> 85 | 1 | 0 | 46 | 0 | |
    | 85 <> Dur| 0 | 0 | 30 | 0 | |
    ===================================================================================

    分析rtt 时间:

    $ tshark -r ~/ali/metrics/tcpdump/rsb2.cap -q -z io,stat,1,”MIN(tcp.analysis.ack_rtt)tcp.analysis.ack_rtt”,”MAX(tcp.analysis.ack_rtt)tcp.analysis.ack_rtt”,”AVG(tcp.analysis.ack_rtt)tcp.analysis.ack_rtt”

    ========================================================
    | IO Statistics |
    | |
    | Duration: 33.914454 secs |
    | Interval: 1 secs |
    | |
    | Col 1: MIN(tcp.analysis.ack_rtt)tcp.analysis.ack_rtt |
    | 2: MAX(tcp.analysis.ack_rtt)tcp.analysis.ack_rtt |
    | 3: AVG(tcp.analysis.ack_rtt)tcp.analysis.ack_rtt |
    |------------------------------------------------------|
    | |1 |2 |3 | |
    | Interval | MIN | MAX | AVG | |
    |-------------------------------------------| |
    | 0 <> 1 | 0.000005 | 0.248840 | 0.009615 | |
    | 1 <> 2 | 0.000004 | 0.458952 | 0.009601 | |
    | 2 <> 3 | 0.000002 | 0.251274 | 0.009340 | |
    | 3 <> 4 | 0.000006 | 0.290993 | 0.010843 | |
    | 4 <> 5 | 0.000004 | 0.390800 | 0.008995 | |
    | 5 <> 6 | 0.000008 | 0.407525 | 0.011133 | |
    | 6 <> 7 | 0.000004 | 0.239225 | 0.008763 | |
    | 7 <> 8 | 0.000003 | 0.177203 | 0.009211 | |
    | 8 <> 9 | 0.000007 | 0.265505 | 0.010294 | |
    | 9 <> 10 | 0.000007 | 0.354278 | 0.008475 | |
    | 10 <> 11 | 0.000005 | 5.337388 | 0.011211 | |
    | 11 <> 12 | 0.000004 | 0.320651 | 0.008231 | |
    | 12 <> 13 | 0.000008 | 0.272029 | 0.008526 | |
    | 13 <> 14 | 0.000005 | 0.663421 | 0.014589 | |
    | 14 <> 15 | 0.000005 | 0.277754 | 0.009128 | |
    | 15 <> 16 | 0.000002 | 0.260320 | 0.010388 | |
    | 16 <> 17 | 0.000006 | 0.429298 | 0.009155 | |
    | 17 <> 18 | 0.000005 | 0.668089 | 0.010008 | |
    | 18 <> 19 | 0.000005 | 0.452897 | 0.009574 | |
    | 19 <> 20 | 0.000006 | 0.850698 | 0.010345 | |
    | 20 <> 21 | 0.000007 | 0.270671 | 0.012368 | |
    | 21 <> 22 | 0.000005 | 0.295439 | 0.008660 | |
    | 22 <> 23 | 0.000008 | 0.710938 | 0.010321 | |
    | 23 <> 24 | 0.000003 | 0.269014 | 0.010238 | |
    | 24 <> 25 | 0.000005 | 0.287966 | 0.009604 | |
    | 25 <> 26 | 0.000009 | 0.661160 | 0.010807 | |
    | 26 <> 27 | 0.000006 | 0.310515 | 0.009439 | |
    | 27 <> 28 | 0.000003 | 0.346298 | 0.011302 | |
    | 28 <> 29 | 0.000004 | 0.375117 | 0.008333 | |
    | 29 <> 30 | 0.000006 | 1.323647 | 0.008799 | |
    | 30 <> 31 | 0.000006 | 0.283616 | 0.010187 | |
    | 31 <> 32 | 0.000007 | 0.649273 | 0.008613 | |
    | 32 <> 33 | 0.000004 | 0.440265 | 0.010663 | |
    | 33 <> Dur| 0.000004 | 0.337023 | 0.011477 | |
    ========================================================

    计算window size:

    $ tshark -r rsb-single2.cap -q -z io,stat,5,”COUNT(tcp.analysis.retransmission) tcp.analysis.retransmission”,”AVG(tcp.window_size) tcp.window_size”,”MAX(tcp.window_size) tcp.window_size”,”MIN(tcp.window_size) tcp.window_size”

    =========================================================================
    | IO Statistics |
    | |
    | Duration: 30.776061 secs |
    | Interval: 5 secs |
    | |
    | Col 1: COUNT(tcp.analysis.retransmission) tcp.analysis.retransmission |
    | 2: AVG(tcp.window_size) tcp.window_size |
    | 3: MAX(tcp.window_size) tcp.window_size |
    | 4: MIN(tcp.window_size) tcp.window_size |
    |-----------------------------------------------------------------------|
    | |1 |2 |3 |4 | |
    | Interval | COUNT | AVG | MAX | MIN | |
    |------------------------------------------| |
    | 0 <> 5 | 0 | 4753 | 15744 | 96 | |
    | 5 <> 10 | 0 | 8067 | 431616 | 96 | |
    | 10 <> 15 | 0 | 5144 | 18688 | 96 | |
    | 15 <> 20 | 0 | 11225 | 611072 | 81 | |
    | 20 <> 25 | 0 | 5104 | 24448 | 96 | |
    | 25 <> 30 | 0 | 10103 | 506880 | 96 | |
    | 30 <> Dur| 0 | 5716 | 12423 | 96 | |
    =========================================================================

    有用的命令(这些命令也都是安装WireShark就装好了的):

    capinfos rsb2.cap

    tshark -q -n -r rsb2.cap -z “conv,ip” 分析流量总况

    tshark -q -n -r rsb2.cap -z “conv,tcp” 分析每一个连接的流量、rtt、响应时间、丢包率、重传率等等

    editcap -c 100000 ./rsb2.cap rsb00.cap //把大文件rsb2.cap按每个文件100000个package切成小文件

    常用排错过滤条件:

    对于排查网络延时/应用问题有一些过滤条件是非常有用的:

    • tcp.analysis.lost_segment:表明已经在抓包中看到不连续的序列号。报文丢失会造成重复的ACK,这会导致重传。
    • tcp.analysis.duplicate_ack:显示被确认过不止一次的报文。大量的重复ACK是TCP端点之间高延时的迹象。
    • tcp.analysis.retransmission:显示抓包中的所有重传。如果重传次数不多的话还是正常的,过多重传可能有问题。这通常意味着应用性能缓慢和/或用户报文丢失。
    • tcp.analysis.window_update:将传输过程中的TCP window大小图形化。如果看到窗口大小下降为零,这意味着发送方已经退出了,并等待接收方确认所有已传送数据。这可能表明接收端已经不堪重负了。
    • tcp.analysis.bytes_in_flight:某一时间点网络上未确认字节数。未确认字节数不能超过你的TCP窗口大小(定义于最初3此TCP握手),为了最大化吞吐量你想要获得尽可能接近TCP窗口大小。如果看到连续低于TCP窗口大小,可能意味着报文丢失或路径上其他影响吞吐量的问题。
    • tcp.analysis.ack_rtt:衡量抓取的TCP报文与相应的ACK。如果这一时间间隔比较长那可能表示某种类型的网络延时(报文丢失,拥塞,等等)。

    抓包常用命令:

    #tshark 解析MySQL协议
    tshark -r ./mysql-compress.cap -o tcp.calculate_timestamps:true -T fields -e mysql.caps.cp -e frame.number -e frame.time_epoch -e frame.time_delta_displayed -e ip.src -e tcp.srcport -e tcp.dstport -e ip.dst -e tcp.time_delta -e frame.time_delta_displayed -e tcp.stream -e tcp.len -e mysql.query
    #用tcpdump抓取并保存包:
    sudo tcpdump -i eth0 port 3306 -w drds.cap
    #每隔3秒钟生成一个新文件,总共生成5个文件后(15秒后)终止抓包,然后包名也按时间规范好了
    sudo tcpdump -t -s 0 tcp port 3306 -w 'dump_%Y-%m-%d_%H:%M:%S.pcap' -G 3 -W 5 -Z root
    #每隔30分钟��成一个包并压缩
    nohup sudo tcpdump -i eth0 -t -s 0 tcp and port 3306 -w 'dump_%Y-%m-%d_%H:%M:%S.pcap' -G 1800 -W 48 -Zroot -z gzip &
    #file size 1000M
    nohup sudo tcpdump -i eth0 -t -s 0 tcp and port 3306 -w 'dump_' -C 1000 -W 300 -Z root -z gzip &
    #抓取详细SQL语句, 快速确认client发过来的具体SQL内容:
    sudo tshark -i any -f 'port 8527' -s 0 -l -w - |strings
    sudo tshark -i eth0 -d tcp.port==3306,mysql -T fields -e mysql.query 'port 3306'
    sudo tshark -i eth0 -R "ip.addr==11.163.182.137" -d tcp.port==3306,mysql -T fields -e mysql.query 'port 3306'
    sudo tshark -i eth0 -R "tcp.srcport==62877" -d tcp.port==3001,mysql -T fields -e tcp.srcport -e mysql.query 'port 3001'
    #如果RDS开启了SSL,那么抓包后的内容tshark/wireshark分析不到MySQL的具体内容,可以强制关闭:connectionProperties里加上useSSL=false
    tshark -r ./manager.cap -o tcp.calculate_timestamps:true -Y " tcp.analysis.retransmission " -T fields-e tcp.stream -e frame.number -e frame.time -e ip.src -e tcp.srcport -e tcp.dstport -e ip.dst | sort
    #分析MySQL rt,倒数第四列基本就是rt
    tshark -r gege_drds.pcap -Y " ((tcp.srcport eq 3306 ) and tcp.len>0 )" -o tcp.calculate_timestamps:true -T fields -e frame.number -e frame.time_epoch -e frame.time_delta_displayed -e ip.src -e tcp.srcport -e tcp.dstport -e ip.dst -e tcp.time_delta -e tcp.stream -e tcp.len -e tcp.analysis.ack_rtt
    #或者排序一下
    tshark -r 213_php.cap -Y "mysql.query or ( tcp.srcport==3306)" -o tcp.calculate_timestamps:true -T fields -e frame.number -e frame.time_epoch -e frame.time_delta_displayed -e ip.src -e tcp.srcport -e tcp.dstport -e ip.dst -e tcp.time_delta -e tcp.stream -e tcp.len -e mysql.query |sort -nk9 -nk1

    Wireshark 插件安装:

    插件是使用lua开发的,安装比较简单,以OS X平台为例:
    1. 将协议解析脚本copy到/Applications/Wireshark.app/Contents/Resources/share/wireshark/ 目录
    2. 编辑init.lua文件,设置disable_lua = false,确保lua支持打开
    3. 在init.lua文件末尾增加
    dofile("hsf2.lua")
    再次启动Wireshark,会对12200端口的数据流使用脚本解析,已经可以识别HSF协议了。
    补充下Windows平台下的安装,步骤类似,将hsf2.lua拷贝到wireshark的根目录,例如c:\Program Files\Wireshark\,在这个目录下也有init.lua,然后参照上面的步骤2和3。

    一个案例

    问题:客户现场不管怎么样增加应用机器,tps就是上不去,同时增加机器后,增加的机器CPU还都能被用完,但是tps没有变化(这点比较奇怪) 整体服务调用慢,数据库没有慢查询,不知道到具体时间花在哪里,各个环节都尝试过增加服务器(或提升配置),但是问题一直得不到解决

    tshark分析抓包文件数据库服务器网卡中断瓶颈导致rtt非常高,进一步导致每个Query的ResponseTime非常高(图中左边都是出问题、右边都是问题解决后的响应时间)

    下面两个图是吧tshark解析结果丢到了数据库中好用SQL可以进一步分析

    问题修复后数据库每个查询的平均响应时间从47毫秒下降到了4.5毫秒

    从wireshark中也可以看到类似的rtt不正常(超过150ms的比较多)

    从wireshark中也可以看到类似的rtt正常(99%都在10ms以内)

    tcprstat

    推荐一个快速统计rt的工具tcprstat,实测在CPU打满的高压力情况下会丢失大量请求数据,但是作为统计平均值这问题不大。支持http、mysql协议等。实际测试在每秒2万个SQL的时候,对于一台4C的机器,只能采集到70%的请求。

    或者看这个支持设置RT阈值的统计改进版

    tcprstat 统计抓包离线文件

    # -l 166.100.128.148 向目标端口80发起请求的地址
    # -p 80 发出response端口
    # -t 10 间隔10s一次汇总统计
    # -f 后面的分位置可以随便指定(90%、95%、99%等)
    tcprstat -r pts.pcap -p 80 -l 166.100.128.148 -t 10 -f "%T\t%n\t%M\t%m\t%a\t%h\t%S\t%95M\t%90a\t%95S\t%99M\t%99a\t%90S\n"

    其它工具

    https://github.com/google/packetdrill

    https://mp.weixin.qq.com/s/CcM3rINPn54Oean144kvMw

    http://beta.computer-networking.info/syllabus/default/exercises/tcp-2.html

    https://segmentfault.com/a/1190000019193928