My-Tiny.Net :: Networking with Virtual Machines
Packet Sniffing: tcpdump, ngrep, and tcpflow
We have three basic tools available for watching and capturing network traffic:
- tcpdump allows us to watch traffic and save captured packets for future analysis
- ngrep allows us to specify a regular expression to match against data payloads of packets
- tcpflow reconstructs the data streams and shows or stores each flow separately
In addition to the common functionality shown in the table, all three use the same powerful libpcap filtering language to select traffic. If no filter expression is given, all packets on the interface will be shown or saved, otherwise only packets for which expression is true will be used. The pcap-filter man page has a complete explanation of the syntax, while there is a somewhat simpler explanation below.
Print usage
information
and exitListen to
interface
[1]Not
promiscuous
mode [2]Exit after
receiving
[3]Packet
data max
bytes [4]Read from
a pcap fileWrite to
a pcap fileQuick
(quiet)
outputVerbose
outputtcpdump -h -i -p -c -s -r -w -q -vv ngrep -h -d -p -n -s -I -O -q tcpflow -h -i -p -b -r -d 0 -v Ctrl c
to stop capturing[1] ngrep and tcpflow: any can be used to capture packets from all interfaces
[2] no effect if the interface has been set to promiscuous mode elsewhere
[3] tcpdump, ngrep: number of packets // tcpflow: max bytes per flow
[4] tcpdump default: 262144 // ngrep default: 65536
The pcap ("Packet CAPture") file format is a standard that can be used with lots of packet analysis tools, including Wireshark. The filename extensions .pcap and .cap are commonly used, although these tools do not check the extension when reading capture files and don't add an extension when writing them (they uses magic numbers in the file header instead). The pcap-savefile man page has a description of the file format for those who are interested. Note that Display Options Do Not Affect the Content: a timestamp and the raw packet data including the link-layer header are always saved.
Under Linux, reading packets from a network interface requires that you have root privileges; reading a saved packet file does not require special privileges.
tcpdump is a good general tool for protocol analysis and network forensics. The tcpdump man page has a lot of options for viewing timestamps and "deltas" (timing differences, think SpongeBob), and some notable options for formatting the output when reading pcap files or viewing the live capture:
-n don't convert addresses (i.e., host addresses, port numbers, etc.) to names -e print the link-level header (to see MAC addresses for Ethernet and IEEE 802.11 for example) -A print each packet (minus its link-level header) in ASCII -x print the data of each packet in hex after the packet header -xx print the data of each packet in hex after the packet header and its link-level header -X print the data of each packet in hex and ASCII after the packet header -XX print the data of each packet in hex and ASCII after the packet header and its link-level header One quite useful general option to use by itself is -D list the network interfaces available on the system
ngrep allows us to specify a regular expression to match against data payloads of packets, in addition to the libpcap filters that select the packets we want to look at. This means we can capture only packets that contain some word or string in the data, which is something tcpdump cannot do. There are links to tutorials on regular expressions in the Utilities::Multitail page on the menu. Some notable options from the ngrep man page are:
-t print a timestamp every time a packet is matched -W byline wraps text only when a linefeed is encountered
none doesn't wrap, the entire payload is displayed on one line
single puts everything including IP and source/destination header information on one line-x print packet contents in hex and ASCII (cannot use with -W) ------ These options determine what is saved to a pcap file ------ -A print num packets of trailing context after matching a packet -e show empty packets (normally empty packets are discarded because they have no payload to search) ------ regular expression (regex) options: these also determine what is saved to a pcap file ------
The expression to match follows the rules for extended regular expressions in the GNU regex library
-v invert the match; only display packets that don't match -i ignore case -w match the expression as a word -X treat the match expression as a hexadecimal string, e.g., 'DEADBEEF' or '0xDEADBEEF'
tcpflow puts all the TCP packets into order and stores each flow in a separate plaintext (not pcap) file for later analysis. Where tcpdump shows individual packets (complete with options and flags), tcpflow understands TCP sequence numbers and reconstructs the data streams regardless of retransmissions or out-of-order delivery. You can use standard text processing tools to work with the tcpflow files rather than using libpcap tools, which is often easier depending on what you want to do.
tcpflow only creates files in the current working directory. Filenames are the source-destination IP addresses and port numbers, and there is no good way to shorten them.
Some other notable options are:
-s (strip) convert all non-printable characters to . -B force binary output, even when printing to the console -c (console) print the contents of packets to stdout, without storing any captured data to files -C print the contents of packets to stdout without the source and destination details -ce
-Ceshow each flow in color - blue: client to server; red: server to client; green: unknown
(changed to -cJ or -CJ in versions above 1.3)
libcap Filters
When it comes to syntax, flexibility means complexity. A summary of the essentials of the libpcap filter language has to include:- Parentheses and the logical operators work as expected:
not has the highest precedence;
and and or have equal precedence and
associate left to right.
Quotes can be used to hide parentheses and special characters from the shell (more convenient than escaping each one) but are not required.
- The type qualifiers host, net,
port say what kind of thing the name or number refers to.
If there is no type qualifier, host is assumed.
- The direction qualifiers src, dst
specify the source/destination.
If there is no direction qualifier, (src or dst) is assumed.
- The protocol qualifiers are ip, tcp,
udp, icmp, arp, rarp, but note that
ngrep and tcpflow may not understand all of these protocols.
If there is no protocol qualifier, all protocols consistent with the
type are assumed: for example, 'port domain' means
'ip and ((tcp or udp) port 53)'.
The protocol should be specified first on the list and does not need
and after.
- Identical qualifiers can be omitted, to save typing
tcp dst port smtp or smtps or submission is exactly the same as
tcp dst port smtp or port smtps or port submission
however
not host mon and fri is short for host fri and not host mon while
'not (host mon or fri)' is short for not host mon and not host fri
Some Examples
With no specific flags these tools will choose a default interface and put it into promiscuous mode. The localhost interface may not work exactly the same as "real" interfaces, especially with applications like postfix that use "unix domain sockets" for interprocess communication (unix domain sockets are conceptually similar to named pipes, but they must be created and used via C library system calls).
tcpdump arp and rarp
- select only arp traffic
tcpdump 'not (arp or rarp)'
- select traffic other than arp and rarp - use quotes to hide parentheses from the shell
ngrep "(GET |POST )\/.+( HTTP\/)" src host mon and dst fri port 80
- select packets with simple HTTP GET or POST requests going from mon to fri
ngrep -i "pass|USER" port 80
- select packets with pass or USER in the data, with -i to ignore case
ngrep -v "\.tinynet\.edu" udp port domain
- select udp DNS traffic, but ignore anything with .tinynet.edu in the data
The output of tcpdump is protocol dependent. The tcpdump man page has a complete explanation, including a section on capturing TCP packets with particular control flag combinations with a detailed explanation of how to use bit flags to create a libpcap filter to, for example: Select packets that have SYN and any other TCP control bit set: tcpdump 'tcp[13] & 2 == 2' Select only packets to and from port 80 that contain data but not, for example, SYN and FIN packets and ACK-only packets: tcpdump 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)' Select the start and end packets (SYN and FIN) of each TCP conversation that involves a non-local host: tcpdump 'tcp[tcpflags] & (tcp-syn|tcp-fin) != 0 and not src and dst net localnet' Select all ICMP packets that are not ping packets (echo request/reply): tcpdump 'icmp[icmptype] != icmp-echo and icmp[icmptype] != icmp-echoreply' The important point is that these filters work the same for ngrep and tcpflow too
tcpflow Examples
Here is an easy example to try out, using your Windows browser to retrieve the
monkey webpage.
- Retrieve the Monkey homepage in your browser
- Start tcpflow on the webserver with -v to see a little extra detail on the activity and -s (strip) to substitute a dot for binary characters.
Select all packets with a source or destination IP address that belongs to the webserver (replace xxx in the command below with the IP address from the Gateway DHCP).
tcpflow -v -s -i any host '(192.168.56.252 or 192.168.56.xxx)'
- A hard refresh clears your browser cache for a specific page and forces it to reload all of the page elements (inages, styles, scripts, etc.). Clicking the "Refresh" arrow on your browser address bar is not a hard refresh -- Here's how:
- Chrome, Firefox, or Edge for Windows: hold Ctrl and press F5
(If that doesn't work, try Shift + F5 or Ctrl + Shift + R)
- Chrome or Firefox for Mac: Shift + Command + R
- Safari: Option + Command + E
With a hard refresh, tcpflow will capture all of the requests and replies.
- Stop tcpflow on the webserver with Ctrl c, and
check the list of files created with
ls -1
("ls minus one" not "minus Lowercase L"). Then use mc or multitail to see the contents.
Note that it is common for clients to choose a random high-numbered port as the source port for their requests - the server simply swaps source and destination for the response. Also, browsers may open multiple TCP connections to get page elements concurrently, which speeds up the page load process.
If we change the command slightly we can get the images in their proper
binary format, so that with a little bit of editing we could open them
in an image viewer/editor
tcpflow -v -B -i any host '(192.168.56.252 or 192.168.56.xxx)'
Be sure "hard refresh" before geting a web page, and check the files with
mc or multitail to see the difference between this
and the previous example.
tcpflow filenames are the source-destination IP addresses and port numbers,
and the man page has no good way to shorten them. This script helps:
/usr/local/bin/tcpflow-names.sh
#!/bin/bash
# shorten tcpflow filenames
if [ ! -z "$1" ]; then cd $1; fi
echo "Working in $(pwd)"
tffn=$(ls -1 1*)
for fn in $(echo $tffn); do
nfn=$(echo $fn |sed 's:127.000.000.001:localhost:g'\
|sed 's:192.168.056:netA:g'\
|sed 's:192.168.066:netB:g'\
|sed 's:192.168.076:netC:g'\
|sed 's:00080:http:' |sed 's:00443:https:'\
|sed 's:00143:imap:' |sed 's:00993:imaps:'\
|sed 's:00389:ldap:' |sed 's:00636:ldaps:'\
|sed 's:00025:smtp:' |sed 's:00587:submit:')
echo "move $fn $nfn"
mv $fn $nfn
done
If you capture flows when stunnel is running and (for example) start
squirrelmail over a https:// connection, there will be a mix of 127
and 192 addresses. All of the files with 127 addresses are in plaintext,
and all of the files with 192 addresses are encrypted. It is interesting
to watch encrypted sessions on the screen in color, but not too useful
to save them to files.
Another useful strategy is to sort the files by service.
Make a new directory called
to-143 (for example) and move the
files that end with .00143 into this new directory, then move
the ones with .00143- in the middle of the filename into a new
directory called from-143. This makes it a little bit easier
to analyse the flows by matching up the
address, port number, and timestamp
to see the client request and the server response.
Here is an example that solves the problem of things on the screen
scrolling by too fast to read when we want to see the interleaved flows
in color.
- Start tcpflow on the mailserver, select all traffic with the imap
or imaps ports as source or destination,
and send the output to a file named tcpflo in the
current directory
tcpflow -ce host '192.168.76.xxx and port (143 or 993)' >tcpflo
- Log in to squirrelmail on your webserver
- Stop tcpflow with Ctrl c
- Read the contents of the file - with color! The end of this command resets the
screen colors to the default when we touch q to quit.
less -r tcpflo; echo -e "\e[00m"
Note that while less allows us to scroll up, the colors disappear because
the ANSI Escape Codes are only interpreted the first time the file is loaded. Also
note that with -ce tcpflow automatically strips out the binary characters.
So, why not just use WireShark? The best answer
is because we can use these tools create scripts to capture traffic,
and then use other interactive GUI tools (like WireShark) for more
detailed analysis. That's really what pcap files are for,
but that topic is left for your own research.
protocol analysis & signatures
http://cnds.eecs.jacobs-university.de/archive/bsc-2010-vperelman.pdf
(interactive regex tester)
https://regex101.com/
try it with this regex - RFC 2616 section 5 explains when HTTP is used
GET (http(s?):\/)?\/.+\?(.+=.+)+( HTTP\/)
and this test string
GET /index.html?x=1&y=2 HTTP/1.1
nice video using wireshark:
https://www.youtube.com/watch?v=Ohirzp33QAs
tcpflow -v -s -i any host '(192.168.56.252 or 192.168.56.xxx)'
- Chrome, Firefox, or Edge for Windows: hold Ctrl and press F5
(If that doesn't work, try Shift + F5 or Ctrl + Shift + R) - Chrome or Firefox for Mac: Shift + Command + R
- Safari: Option + Command + E
With a hard refresh, tcpflow will capture all of the requests and replies.
ls -1
("ls minus one" not "minus Lowercase L"). Then use mc or multitail to see the contents.
Note that it is common for clients to choose a random high-numbered port as the source port for their requests - the server simply swaps source and destination for the response. Also, browsers may open multiple TCP connections to get page elements concurrently, which speeds up the page load process.
tcpflow -v -B -i any host '(192.168.56.252 or 192.168.56.xxx)'
and send the output to a file named tcpflo in the current directory
tcpflow -ce host '192.168.76.xxx and port (143 or 993)' >tcpflo
less -r tcpflo; echo -e "\e[00m"