Sending and processing ARP requests/responses using BPF


By Marco W. Soijer

June 2020

ARP BPF OpenBSD Networking

Share this post

The Address Resolution Protocol (ARP) is handled by the operating system. With the command line tool arp you can see the cache and clear entries, but you can neither trigger a request in order to refresh an entry or to check its validitiy, nor resolve an IPv4 address without sending anything on the Internet layer (3) or above — even a ping involves ICMP.

So you may want to control sending ARP queries from userland, and process the incoming responses. The required interface to the link-layer on Linux and BSD systems is the Berkeley Packet Filter (BPF). Under BSD, it appears as the device /dev/bpf and can be addressed through normal read and write operations, plus some ioctl. This post describes how — by building an ARP scanner that queries all addresses in the local network. You can find the full C code at the end of the page.

The Berkeley Packet Filter (BPF)

The Berkeley Packet Filter is a pseudo device, included with basically all Linux distributions, and all BSD systems — it is also part of BSD-based macOS. BPF provides a raw interface to link-layer network functions and is basically used to build firewalls, sniffers and the like. The way how to interact with BPF differs on the various systems, however — you can find details in the documentation for OpenBSD, FreeBSD, NetBSD, Linux Kernel, or macOS, although the latter also allows more high-level ARP control.

The BSD variants all seem to share the same /dev/bpf access, so what we do here with OpenBSD, can probably be applied to the other BSDs without change.

The main process goes as follows: open the BPF device for reading and writing, attach it to the network interface on which you want to ARP scan, write the Ethernet frame with the ARP query, read the ARP responses that come back, clean up and close the device. There are two things that require a bit of additional effort: finding your own protocol (IP) and hardware (MAC) address for the chosen interface — so you can fill the Ethernet frame header correctly — and activating a packet filter, in order to receive only the ARP responses you are interested in and not everything that is on the network. Now you know where the name BPF comes from.

Opening the device

In our arpscan.c, we use the first command line argument argv[1] to pass the name of the interface on which we want to do our scan to the main function. So the most outside functionality looks like this:

Excerpt from arpscan.c
//...
171int main(int argc, char *argv[]) {
172
173 int fd;
//...
201 if (argc == 2) {
202 if ((fd = open("/dev/bpf", O_RDWR)) >= 0) {
//...
229 }
230 else
231 fprintf(stderr, "%s (open)\n", strerror(errno));
232 }
233 else
234 fprintf(stdout, "Usage:\tarpscan if\n");
235
236 return -1;
237}

With the file descriptor, BPF can be bound to the interface passed as argv[1]; the ioctl request to do so is BIOCSETIF, which takes a pointer to a struct ifreq. Furthermore, we need to create a buffer to receive the incoming frames, so we need BPF's buffer length, requested with BIOCGBLEN; and we want BPF to return any ARP frame it receives immediately, which is set with BIOCSETIF, for which we abuse the int buflen that subsequently receives the buffer size. If binding the interface is successful, it is good to check whether the interface is indeed an Ethernet one (BIOCGDLT). So inside main, we add the following:

Excerpt from arpscan.c
174 int buflen;
175 int dlt;
176 struct ifreq ir;
//...
203 strncpy(ir.ifr_name, argv[1], IFNAMSIZ);
204 buflen = 1;
205 if (ioctl(fd, BIOCSETIF, &ir) != -1
206 && ioctl(fd, BIOCIMMEDIATE, &buflen) != -1
207 && ioctl(fd, BIOCGBLEN, &buflen) != -1) {
208 if (ioctl(fd, BIOCGDLT, &dlt) != -1
209 && dlt == DLT_EN10MB) {
//...
223 }
224 else
225 fprintf(stderr, "Link type unknown or not Ethernet\n");
226 }
227 else
228 fprintf(stderr, "%s (ioctl)\n", strerror(errno));

Setting the packet filter

Now for the most interesting part: setting the filter. The ioctl request for this is BIOCSETF, which takes a pointer to a struct bpf_program, which in turn is an integer with the length of the programme, followed by an array of instructions (struct bpf_insn *).

The filter language is described with some examples on the OpenBSD BPF manual page. Think machine code, with simple instructions like loading a byte (from the frame) into the accumulator, comparing, jumping — forward only — and returning. Working at the link layer, we need to take care of the whole Ethernet frame, although upon sending, BPF automatically adds any required padding — which is needed here, as ARP messages underrun the minimum frame length of 64 octets — and the CRC32-based frame check sequence. So we are left with the 14 octets of the frame that contain the destination and source addresses as six-octet hardware addresses each plus the length of type word, and the 28 octets of the ARP message.

To have BPF filter out ARP responses, we look at two words:

  • First, within the 14-octet frame header, bytes 12 and 13 (length of type) must equal 0x0806 or ETHERTYPE_ARP, to check that the payload is an ARP message; and
  • second, within the 28-octet ARP message, bytes 6 and 7 (opcode) must equal 2 or ARPOP_REPLY, to ensure that the message is an ARP response.

Thus we add to our main function:

Excerpt from arpscan.c
182 struct bpf_insn insns[] = {
183 // Load word at octet 12
184 BPF_STMT(BPF_LD | BPF_H | BPF_ABS, 12),
185 // If not ETHERTYPE_ARP, skip next 3 (and return nothing)
186 BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, ETHERTYPE_ARP, 0, 3),
187 // Load word at octet 20
188 BPF_STMT(BPF_LD | BPF_H | BPF_ABS, 20),
189 // If not ARPOP_REPLY, skip next 1 (and return nothing)
190 BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, ARPOP_REPLY, 0, 1),
191 // Valid ARP reply received, return message
192 BPF_STMT(BPF_RET | BPF_K, sizeof(struct ether_arp) + sizeof(struct ether_header)),
193 // Return nothing
194 BPF_STMT(BPF_RET | BPF_K, 0),
195 };
196 struct bpf_program filter = {
197 sizeof insns / sizeof(insns[0]),
198 insns
199 };
//...
210 if (ioctl(fd, BIOCSETF, &filter) != -1) {
//...
220 }
221 else
222 fprintf(stderr, "Cannot set BPF rule\n");

Retrieving the interface addresses

Having to create an Ethernet frame ourselves, we need to know the hardware (MAC) and protocol (IP) addresses of the chosen interface. While we are at it, we also collect the address mask for the network, so we can determine what range of addresses to scan. The library function getifaddrs() provides a linked list of struct ifaddrs with everything we need; we only need to look for the right chunks.

Excerpt from arpscan.c
21// Find own protocol (IP) and hardware (MAC) addresses
22// Returns true iff both were found
23
24bool findownaddresses(char *interface, struct ether_addr *ownmac,
25 struct sockaddr_in *saip, struct sockaddr_in *samask) {
26
27 struct ifaddrs *ifap, *ifa;
28 struct sockaddr_dl *sdl;
29 unsigned int success = 0;
30
31 if (!getifaddrs(&ifap)) {
32
33 printf("Self\n");
34
35 for (ifa = ifap; ifa; ifa = ifa->ifa_next) {
36
37 if (!strcmp(ifa->ifa_name, interface)) {
38
39 sdl = (struct sockaddr_dl *)ifa->ifa_addr;
40
41 if (sdl->sdl_family == AF_LINK
42 && sdl->sdl_type == IFT_ETHER
43 && sdl->sdl_alen == ETHER_ADDR_LEN) {
44 memcpy((u_int8_t *)ownmac, (u_int8_t *)LLADDR(sdl), sizeof(struct ether_addr));
45 printf("MAC: %x:%x:%x:%x:%x:%x\n",
46 ownmac->ether_addr_octet[0],
47 ownmac->ether_addr_octet[1],
48 ownmac->ether_addr_octet[2],
49 ownmac->ether_addr_octet[3],
50 ownmac->ether_addr_octet[4],
51 ownmac->ether_addr_octet[5]);
52 success |= 0x01;
53 }
54 else if (sdl->sdl_family == AF_INET) {
55 saip->sin_addr.s_addr = ((struct sockaddr_in *)ifa->ifa_addr)->sin_addr.s_addr;
56 samask->sin_addr.s_addr = ((struct sockaddr_in *)ifa->ifa_netmask)->sin_addr.s_addr;
57 printf("%s, ", inet_ntoa(saip->sin_addr));
58 printf("netmask %s\n", inet_ntoa(samask->sin_addr));
59 success |= 0x02;
60 }
61 }
62 }
63 freeifaddrs(ifap);
64 }
65 else
66 fprintf(stderr, "%s (getifaddr)\n", strerror(errno));
67
68 return (success == 0x03);
69}
//...
171int main(int argc, char *argv[]) {
//...
177 struct ether_addr ownmac;
178 struct sockaddr_in saip;
179 struct sockaddr_in samask;
//...
212 if (findownaddresses(argv[1], &ownmac, &saip, &samask)) {
//...
217 }
218 else
219 fprintf(stderr, "Missing address for interface\n");
237}

Writing the queries — or: flood them!

Filling the ARP request frame is straightforward. Using ff:ff:ff:ff:ff:ff as the broadcast destination address, filling in our own IP and MAC address, and setting the opcode and length indicators is equal for all queries:

Excerpt from arpscan.c
72// Construct ethernet frame header and ARP request
73
74void prepareframe(struct ether_addr *ownmac, struct sockaddr_in *saip,
75 struct ether_header *ethhdr, struct ether_arp *etharp) {
76
77 memset((unsigned char *)&ethhdr->ether_dhost, 0xff, ETHER_ADDR_LEN);
78 memcpy((unsigned char *)&ethhdr->ether_shost, (unsigned char *)ownmac, ETHER_ADDR_LEN);
79 ethhdr->ether_type = htons(ETHERTYPE_ARP);
80
81 etharp->arp_hrd = htons(ARPHRD_ETHER);
82 etharp->arp_pro = htons(ETHERTYPE_IP);
83 etharp->arp_hln = ETHER_ADDR_LEN;
84 etharp->arp_pln = 4;
85 etharp->arp_op = htons(ARPOP_REQUEST);
86 memcpy((u_int8_t *)etharp->arp_sha, (u_int8_t *)ownmac, sizeof(struct ether_addr));
87 memcpy((u_int8_t *)etharp->arp_spa, (u_int8_t *)&(saip->sin_addr.s_addr), 4*sizeof(u_int8_t));
88 memset((u_int8_t *)etharp->arp_tha, 0, ETHER_ADDR_LEN);
89
90 return;
91}

The structures ether_header and ether_arp are defined in the calling function, which also loops through the protocol addresses to query for. The range goes from the network address — the network-mask part of our own address — through the local broadcast address, less our own address. For a /24 network, this leaves 253 addresses to query for. You may not want to apply this arpscan as it is for larger networks…

Excerpt from arpscan.c
94// Write ARP request to all monocast addresses in the network
95// (all less network, broadcast, and self)
96
97void writequeries(int fd, struct ether_addr *ownmac, struct sockaddr_in *saip, struct sockaddr_in *samask) {
98
99 unsigned char msg[sizeof(struct ether_header) + sizeof(struct ether_arp)];
100 struct ether_header *ethhdr;
101 struct ether_arp *etharp;
102 struct sockaddr_in sat;
103 uint32_t addr, addrnw, addrbc, addrown;
104 int len, addlen;
105
106 ethhdr = (struct ether_header *)msg;
107 etharp = (struct ether_arp *)(ethhdr + 1);
108
109 prepareframe(ownmac, saip, ethhdr, etharp);
110
111 addrown = ntohl(saip->sin_addr.s_addr);
112 addrnw = ntohl(saip->sin_addr.s_addr & samask->sin_addr.s_addr);
113 addrbc = addrnw + (0xffffffff - ntohl(samask->sin_addr.s_addr));
114
115 printf("\nWho has? for %u IP addresses\n", addrbc - addrnw - 2);
116
117 for (addr = addrnw + 1; addr < addrbc; addr++) {
118 if (addr != addrown) {
119 sat.sin_addr.s_addr = htonl(addr);
120 memcpy((u_int8_t *)etharp->arp_tpa, (u_int8_t *)&(sat.sin_addr.s_addr), 4*sizeof(u_int8_t));
121 len = 0;
122 while (len<(int)(sizeof(struct ether_header) + sizeof(struct ether_arp))
123 && (addlen = write(fd, (char *)ethhdr + len,
124 sizeof(struct ether_header) + sizeof(struct ether_arp) - len)) >= 0)
125 len += addlen;
126 }
127 }
128
129 return;
130}

Did I mention, you may not want to apply this on any network that is not yours? Harmless as such scans may be, people may not like it.

Receiving the responses — or: reel them in!

All we need to do now, is listen to BPF and print out the hardware and protocol addresses that come in. ARP is a simple, connectionless protocol that only works on the local network, so answers arrive quickly. What is not there within a second (actually a lot less), will not arrive at all. So we stop polling and receiving when no further reply has come in for 1000 milliseconds:

Excerpt from arpscan.c
133// Collect filter outputs as long as data received within time-out of 1 sec
134
135void collectresponses(int fd, int buflen) {
136
137 unsigned char *buffer;
138 struct bpf_hdr *bpf;
139 struct pollfd fds;
140 int len;
141
142 if ((buffer = (unsigned char *)malloc(buflen))) {
143
144 bpf = (struct bpf_hdr *)buffer;
145
146 fds.fd = fd;
147 fds.events = POLLIN;
148 fds.revents = 0;
149
150 while (poll(&fds, 1, 1000))
151 if ((len = read(fd, buffer, buflen)) > (int)sizeof(struct bpf_hdr)
152 && len >= (int)(sizeof(struct bpf_hdr) + 0x2a)
153 && buffer[bpf->bh_hdrlen + 0x12] == 0x06
154 && buffer[bpf->bh_hdrlen + 0x13] == 0x04)
155 printf("%s is at %02x:%02x:%02x:%02x:%02x:%02x\n",
156 inet_ntoa(*(struct in_addr *)(buffer + bpf->bh_hdrlen + 0x1c)),
157 buffer[bpf->bh_hdrlen + 0x16],
158 buffer[bpf->bh_hdrlen + 0x17],
159 buffer[bpf->bh_hdrlen + 0x18],
160 buffer[bpf->bh_hdrlen + 0x19],
161 buffer[bpf->bh_hdrlen + 0x1a],
162 buffer[bpf->bh_hdrlen + 0x1b]);
163
164 free(buffer);
165 }
166
167 return;
168}

You can now almost piece together all of arpscan.c. The only thing that is missing apart from the includes, is the core of our main:

Excerpt from arpscan.c
171int main(int argc, char *argv[]) {
//...
213 writequeries(fd, &ownmac, &saip, &samask);
214 collectresponses(fd, buflen);
215
216 return 0;
//...
237}

There you go. Address resolution under full control from userland.

Full code

The following C99 source was developed on OpenBSD 6.6 (patched through June 2020), compiled with clang, and run on an amd64 system with Intel NICs.

arpscan.c
1#include <stdio.h>
2#include <stdlib.h>
3#include <stdbool.h>
4#include <unistd.h>
5#include <string.h>
6#include <errno.h>
7
8#include <fcntl.h>
9#include <poll.h>
10#include <ifaddrs.h>
11#include <arpa/inet.h>
12#include <net/bpf.h>
13#include <net/if.h>
14#include <net/if_dl.h>
15#include <net/if_types.h>
16#include <netinet/in.h>
17#include <netinet/if_ether.h>
18#include <sys/ioctl.h>
19
20
21// Find own protocol (IP) and hardware (MAC) addresses
22// Returns true iff both were found
23
24bool findownaddresses(char *interface, struct ether_addr *ownmac,
25 struct sockaddr_in *saip, struct sockaddr_in *samask) {
26
27 struct ifaddrs *ifap, *ifa;
28 struct sockaddr_dl *sdl;
29 unsigned int success = 0;
30
31 if (!getifaddrs(&ifap)) {
32
33 printf("Self\n");
34
35 for (ifa = ifap; ifa; ifa = ifa->ifa_next) {
36
37 if (!strcmp(ifa->ifa_name, interface)) {
38
39 sdl = (struct sockaddr_dl *)ifa->ifa_addr;
40
41 if (sdl->sdl_family == AF_LINK
42 && sdl->sdl_type == IFT_ETHER
43 && sdl->sdl_alen == ETHER_ADDR_LEN) {
44 memcpy((u_int8_t *)ownmac, (u_int8_t *)LLADDR(sdl), sizeof(struct ether_addr));
45 printf("MAC: %x:%x:%x:%x:%x:%x\n",
46 ownmac->ether_addr_octet[0],
47 ownmac->ether_addr_octet[1],
48 ownmac->ether_addr_octet[2],
49 ownmac->ether_addr_octet[3],
50 ownmac->ether_addr_octet[4],
51 ownmac->ether_addr_octet[5]);
52 success |= 0x01;
53 }
54 else if (sdl->sdl_family == AF_INET) {
55 saip->sin_addr.s_addr = ((struct sockaddr_in *)ifa->ifa_addr)->sin_addr.s_addr;
56 samask->sin_addr.s_addr = ((struct sockaddr_in *)ifa->ifa_netmask)->sin_addr.s_addr;
57 printf("%s, ", inet_ntoa(saip->sin_addr));
58 printf("netmask %s\n", inet_ntoa(samask->sin_addr));
59 success |= 0x02;
60 }
61 }
62 }
63 freeifaddrs(ifap);
64 }
65 else
66 fprintf(stderr, "%s (getifaddr)\n", strerror(errno));
67
68 return (success == 0x03);
69}
70
71
72// Construct ethernet frame header and ARP request
73
74void prepareframe(struct ether_addr *ownmac, struct sockaddr_in *saip,
75 struct ether_header *ethhdr, struct ether_arp *etharp) {
76
77 memset((unsigned char *)&ethhdr->ether_dhost, 0xff, ETHER_ADDR_LEN);
78 memcpy((unsigned char *)&ethhdr->ether_shost, (unsigned char *)ownmac, ETHER_ADDR_LEN);
79 ethhdr->ether_type = htons(ETHERTYPE_ARP);
80
81 etharp->arp_hrd = htons(ARPHRD_ETHER);
82 etharp->arp_pro = htons(ETHERTYPE_IP);
83 etharp->arp_hln = ETHER_ADDR_LEN;
84 etharp->arp_pln = 4;
85 etharp->arp_op = htons(ARPOP_REQUEST);
86 memcpy((u_int8_t *)etharp->arp_sha, (u_int8_t *)ownmac, sizeof(struct ether_addr));
87 memcpy((u_int8_t *)etharp->arp_spa, (u_int8_t *)&(saip->sin_addr.s_addr), 4*sizeof(u_int8_t));
88 memset((u_int8_t *)etharp->arp_tha, 0, ETHER_ADDR_LEN);
89
90 return;
91}
92
93
94// Write ARP request to all monocast addresses in the network
95// (all less network, broadcast, and self)
96
97void writequeries(int fd, struct ether_addr *ownmac, struct sockaddr_in *saip, struct sockaddr_in *samask) {
98
99 unsigned char msg[sizeof(struct ether_header) + sizeof(struct ether_arp)];
100 struct ether_header *ethhdr;
101 struct ether_arp *etharp;
102 struct sockaddr_in sat;
103 uint32_t addr, addrnw, addrbc, addrown;
104 int len, addlen;
105
106 ethhdr = (struct ether_header *)msg;
107 etharp = (struct ether_arp *)(ethhdr + 1);
108
109 prepareframe(ownmac, saip, ethhdr, etharp);
110
111 addrown = ntohl(saip->sin_addr.s_addr);
112 addrnw = ntohl(saip->sin_addr.s_addr & samask->sin_addr.s_addr);
113 addrbc = addrnw + (0xffffffff - ntohl(samask->sin_addr.s_addr));
114
115 printf("\nWho has? for %u IP addresses\n", addrbc - addrnw - 2);
116
117 for (addr = addrnw + 1; addr < addrbc; addr++) {
118 if (addr != addrown) {
119 sat.sin_addr.s_addr = htonl(addr);
120 memcpy((u_int8_t *)etharp->arp_tpa, (u_int8_t *)&(sat.sin_addr.s_addr), 4*sizeof(u_int8_t));
121 len = 0;
122 while (len<(int)(sizeof(struct ether_header) + sizeof(struct ether_arp))
123 && (addlen = write(fd, (char *)ethhdr + len,
124 sizeof(struct ether_header) + sizeof(struct ether_arp) - len)) >= 0)
125 len += addlen;
126 }
127 }
128
129 return;
130}
131
132
133// Collect filter outputs as long as data received within time-out of 1 sec
134
135void collectresponses(int fd, int buflen) {
136
137 unsigned char *buffer;
138 struct bpf_hdr *bpf;
139 struct pollfd fds;
140 int len;
141
142 if ((buffer = (unsigned char *)malloc(buflen))) {
143
144 bpf = (struct bpf_hdr *)buffer;
145
146 fds.fd = fd;
147 fds.events = POLLIN;
148 fds.revents = 0;
149
150 while (poll(&fds, 1, 1000))
151 if ((len = read(fd, buffer, buflen)) > (int)sizeof(struct bpf_hdr)
152 && len >= (int)(sizeof(struct bpf_hdr) + 0x2a)
153 && buffer[bpf->bh_hdrlen + 0x12] == 0x06
154 && buffer[bpf->bh_hdrlen + 0x13] == 0x04)
155 printf("%s is at %02x:%02x:%02x:%02x:%02x:%02x\n",
156 inet_ntoa(*(struct in_addr *)(buffer + bpf->bh_hdrlen + 0x1c)),
157 buffer[bpf->bh_hdrlen + 0x16],
158 buffer[bpf->bh_hdrlen + 0x17],
159 buffer[bpf->bh_hdrlen + 0x18],
160 buffer[bpf->bh_hdrlen + 0x19],
161 buffer[bpf->bh_hdrlen + 0x1a],
162 buffer[bpf->bh_hdrlen + 0x1b]);
163
164 free(buffer);
165 }
166
167 return;
168}
169
170
171int main(int argc, char *argv[]) {
172
173 int fd;
174 int buflen;
175 int dlt;
176 struct ifreq ir;
177 struct ether_addr ownmac;
178 struct sockaddr_in saip;
179 struct sockaddr_in samask;
180
181 // BPF rule
182 struct bpf_insn insns[] = {
183 // Load word at octet 12
184 BPF_STMT(BPF_LD | BPF_H | BPF_ABS, 12),
185 // If not ETHERTYPE_ARP, skip next 3 (and return nothing)
186 BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, ETHERTYPE_ARP, 0, 3),
187 // Load word at octet 20
188 BPF_STMT(BPF_LD | BPF_H | BPF_ABS, 20),
189 // If not ARPOP_REPLY, skip next 1 (and return nothing)
190 BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, ARPOP_REPLY, 0, 1),
191 // Valid ARP reply received, return message
192 BPF_STMT(BPF_RET | BPF_K, sizeof(struct ether_arp) + sizeof(struct ether_header)),
193 // Return nothing
194 BPF_STMT(BPF_RET | BPF_K, 0),
195 };
196 struct bpf_program filter = {
197 sizeof insns / sizeof(insns[0]),
198 insns
199 };
200
201 if (argc == 2) {
202 if ((fd = open("/dev/bpf", O_RDWR)) >= 0) {
203 strncpy(ir.ifr_name, argv[1], IFNAMSIZ);
204 buflen = 1;
205 if (ioctl(fd, BIOCSETIF, &ir) != -1
206 && ioctl(fd, BIOCIMMEDIATE, &buflen) != -1
207 && ioctl(fd, BIOCGBLEN, &buflen) != -1) {
208 if (ioctl(fd, BIOCGDLT, &dlt) != -1
209 && dlt == DLT_EN10MB) {
210 if (ioctl(fd, BIOCSETF, &filter) != -1) {
211
212 if (findownaddresses(argv[1], &ownmac, &saip, &samask)) {
213 writequeries(fd, &ownmac, &saip, &samask);
214 collectresponses(fd, buflen);
215
216 return 0;
217 }
218 else
219 fprintf(stderr, "Missing address for interface\n");
220 }
221 else
222 fprintf(stderr, "Cannot set BPF rule\n");
223 }
224 else
225 fprintf(stderr, "Link type unknown or not Ethernet\n");
226 }
227 else
228 fprintf(stderr, "%s (ioctl)\n", strerror(errno));
229 }
230 else
231 fprintf(stderr, "%s (open)\n", strerror(errno));
232 }
233 else
234 fprintf(stdout, "Usage:\tarpscan if\n");
235
236 return -1;
237}