The network layer is concerned with arranging for the end-to-end delivery of data. We have seen that this is basically a matter of routing packets along a path from router to router until you reach the final destination. The link layer is concerned with how the packet is moved across one of the links in such a path. The link level packets are usually referred to as frames.
We have already seen the internet check sum used as a means for detecting errors in IP headers or transport level data packets. In addition, you have no doubt already run into parity checking as an error detection mechanism. In both cases, it is rather easy to find errors which are NOT detected.
The idea of any error detection mechanism is to add additional redundant data to any transmission. Basically, if you wish to send data D, then you compute a functional value f(D) and send the pair (D, f(D)). The receiver gets a pair (E, F) where hopefully E = D and F = f(D). The receiver calculates f(E) concludes that an error has occurred if f(E) is not equal to F. Note that either D or f(D) can be corrupted; so the test can declare some received data to be bad and some bad data to be good.
One good property of both the internet checksum and parity checks is that f(D) is small, and so there is not too much overhead in sending the pair. (Another method that might detect more errors would be to send multiple copies of D; this would not be as economical.)
One bad property of both the internet checksum and parity checks is that it is fairly easy to find errors which would not be detected. For example, we saw that both miss a number of two bit errors. Note that the goal is not to find an algorithm that makes no errors; the goal is to find one which detects most errors that are likely to occur often.
One simple way to improve parity checking is by calculating parity in multiple ways. A typical example, for each byte add one parity bit so that all 9 bits add to 0 modulo. For each block of bytes, calculate a parity bit for each of the 9 bit positions. This is referred to as a two-dimensional parity scheme -- it is the method that was used on 9 track magnetic tape for years.
Note that this method allows you to detect and correct all 1 bit errors in a block of data (and parity bits). The idea is that you can use the per-byte parity bits to determine which byte has the error, and then use the bit position parity bits to determine in which bit position the error is located. You can then flip the single bit and your error is corrected. (Note that this even corrects the error when it occurs in a parity bit -- but it only works if the error is actually a one bit error.)
Link level protocols typically use more robust error detection mechanisms than those described to date. The redundant bits are typically calculated with the aid of special hardware. One common method is the cyclic redundancy check; it is the method used, e.g. by ethernet. The CRC check adds a small number of redundant bits (e.g. 32 bits per frame for ethernet) but detects far more errors than do the earlier schemes. For example, we will see that the ethernet CRC detects all single burst errors of length at most 32 bits and all 3 bit errors in a packet. (The analysis of this is quite complex; there is a nice description in http://www.ece.wpi.edu/courses/ee535/hwk96/hwk3cd96/jstander/jstander.html.)
How can you accomplish this? The basic idea is to consider the message as a sequence of bits; these bits are either 0 or 1, and so they can be considered as the coefficients of a polynomial with coefficients which are integers modulo 2. For example, the letter A has an ascii value of 65 which base 2 is 1000001. This is the polynomial X^6+1.
Now the CRC algorithm requires another polynomial, which in the case of ethernet is
P(x) = x^32+x^26+x^23+x^22+x^16+x^12+x^11+x^10+x^8+x^7+x^5+x^4+x^2+x^1+x^0If the message is the polynomial G(x), then calculate the remainder R(x) when G(x)x32 is divided by P(x), i.e.
G(x)x^32 = Q(x)P(x) + R(x), degree(R) < 32(Of course this is done modulo 2). The idea here is that the polynomials with coefficients integers modulo 2 is well behaved algebraically, e.g. every polynomial factors uniquely, etc. In our earlier notation, G(x) is our message D and R(x) is the f(D).
So, now we have an algorithm. If this is all somewhat intimidating, then there is good news and bad news. The good news is that we can actually understand a bit of it, if not all the details, and implementing it can be done efficiently in hardware. The bad news is that the implementation in software is both inefficient and totally opaque. If you are curious, have a look at a short, if incomprehensible, implementation as a C program.
To try and understand why the CRC check is so effective:
The general idea was that algebraic properties of the particular polynomail P(x) determined that particular types of errors would not go undetected. As an exercise, check that you understand that all single bit errors would be undetected by any CRC algorithm defined by a polynomial P(x) with at least two non-zero terms.
Ethernet has become the standard multiple access link layer protocol for IP based networks. During the late 1960's Abramson at the University of Hawaii developed Aloha a multiple access satellite based network. The idea is that land based systems would broadcast to the satellite which would relay the signal back to the ground based systems. Each sender could transmit at will; if two senders sent packets at almost the same time, they could interfere with each other and this interference would be seen as a garbled radio signal. When collisions were detected, the packet would be resent. Analysis of the algorithm showed that the system would saturate at about the 18 percent utilization level. An alternative, in which transmissions were in fixed time slots allowed for this maximum utilization level to be doubled.
Metcalfe and Boggs at Xerox Parc modified this algorithm for use as a lan protocol. The basic algorithm is often described as CSMA/CD.
+-----------+--------------+---------+-------+-------------+------+ |Preamble(8)|Destination(6)|Source(6)|Type(2)|Data(46-1500)|FCS(4)| +-----------+--------------+---------+-------+-------------+------+
The first bit of the address sent (i.e. the least significant bit of the first byte) is 0 for unicast and 1 for multi-cast.
After the FCS, the channel must remain idle for 96 bit times (12 byte times). It is permissible for dribble bits to occur in the byte after the FCS. These bits are hardware anomalies and are just discarded. Note that it is important that the carrier be gone before the first byte after the FCS is finished; otherwise, you will misjudge the length of the data, as the header does not have a length field.
The IEEE 802.3 frame is similar, but it has a more complicated way of specifying the Type, as a result only allows for up to 1497 bytes of data. In addition, it allows for locally allocated addresses -- the next to least significant bit of the first byte of the address is 1; for DIX, this bit is always 0, i.e. all addresses are global (except for the broadcast address, which is all 48 bits set to 1).
The retry time has the following properties:
Clearly 512 bit times is important to the media access control system. Note that this is also, the same as the amount of time it takes to send a minimally sized frame, 14 bytes of header + 46 bytes of data + 4 bytes of check sum amount to 64 bytes or 512 bits.
This magic number comes about because one needs to be able to detect the a collision before the frame is completely sent. Let's see how this works. In the worst case, the collision would be between senders at opposite ends of the ethernet. Sender A would start sending, the signal would propagate down the wire to and arrive just at the instant that Sender B would start sending; the collision signal (i.e. high voltage level on the wire) would propagate back to sender A. So, in the worst case, the signal would have to propagate the entire length of the ethernet and back, and all this before Sender A is done sending the frame.
The minimum frame is 512 bits (46 data bytes). One bit time is 100 ns for 10 MB ether or 10 ns for 100 MB ether. So, in 512 bit times, assuming a propagation speed of 0.6 times the speed of light in a vacuum, would allow the signal to move approximately 9 km. on a 10 MB ethernet. Since the signal has to move twice the length of the ethernet, this would mean that the ethernet could not be longer than about 4500 meters. Actually, life is not this simple; tbe standard limits old thick ethernet to be no longer than 500 meters in any segment of wire, and that these segments need to be connected with repeaters -- it allows for 5 segments, which is 2500 meters. (One hasn't taken into account the delays at the repeaters, but hasn't taken advantage of the additional 64 bit times for the preamble.) So, there are additional complications, and the rules for how you can connect together ethernet segments are quite complex; hopefully, however, this conveys the general idea of why the slot time is chosen as it is.
This is a subtle bit of unfairness in the ethernet algorithm. One would hope that the algorithm were fair in that a single node should not be able to monopolize more than its fair share of the bandwidth. However, if there is a sender who always has data to send, then the following anomaly (called channel capture is possible.
What we have described so far is the standard CSMA/CD behavior of half duplex ethernet. Nowadays, however, a common setup is not this at all. For example, in the INSLAB, one has a 100 MB ethernet switch. The switch is actually a bridge. The effect is that each machine plugged into the switch actually has its own private ethernet, with just the switch and the machine on it. Furthermore, the connection is by a patch cable which has separate channels for sending and receiving data. This means that you can send and receive data at the same time, i.e. you have a full duplex connection. Under these circumstances, ethernet does not use the media access control protocol we described above. Rather, each sender simply sends whenever it wants to -- except that it still needs to wait for the IFG between frames. Effectively, you have two 100 MB contention free channels, a point-to-point connection, with just the IFG as the only additional overhead.
When the traffic is between two of the machines connected to the same switch, the switch acts in store-and-forward mode, like a router. So, the frame is received and stored in buffers until it can be sent out another of its ports. In principle, it would be possible to overflow the buffers in the switch. However, the switch is constructed with enough buffer space, so that it can handle full speed traffic at 100 MB/sec in each direction on about 20 of its ports simultaneously without overflowing its buffers. (Of course, this means that you have actually optimally addressed the packets -- so that you are not asking any more than 100 MB/sec at any of the ports. If you direct all the traffic to a single port, it will overflow quite quickly.)
The switch also allows for cut-through mode. In this mode, the packet is forwarded immediately (when the destination port is free) -- so as soon as the destination is known (note that the destination address comes first in the frame), the bytes are routed to the output port instead of waiting until the frame is fully received. This optimization does mean that you cannot change your mind if you should notice at the end of the frame that the FCS does not match; so you could be forwarding incorrect data.
The basic algorithm for CSMA/CD has remained almost identical to the original ethernet specification, now almost three decades old. During this time, we have moved from 10 MBS to 100 MBS and now to 1000 MBS in speed. Remember that the main point was that we had to make sure that packets were sized appropriately so that collisions could be detected before the sender was done sending a packet. So, at 10 MB, the frame was required to hold at least 46 bytes of data (i.e. 512 bits in all); this allowed for ethernets which were approximately 2.5 km in size. In moving to 100 MB, it was decided to keep the old minimum frame size. This meant that the maximum sized ethernet would have to shrink by about a factor of 10; so 100 MB ethernets cannot be larger than about 210 meters. With gigabit ether, the speed again increased by a factor of 10. If one kept the same minimum frame size, one would be down to about 20 meters for an ethernet. Although this is adequate for most connections within a machine room, it would mean that IEEE standard structured wiring for commercial buildings would not be far too long -- it allows for 100 meters from a hub to an office workspace, and so requires 200 meters between the two farthest workstations.
On the other hand, changing the minimum frame size is not attractive either; it means that you cannot easily bridge from 100 MB to gigabit ethernet. (You would have to extend all the short packets to the gigabit minimum.) So, the problem was how do you keep the same frame structure and yet allow for gigabit speeds?
Recall that, with 10 MB ether, the minimum ethernet frame size has to be accomodated by the higher level protocols. The approach with IP is to pad the actual IP datagram with additional bytes extending it out to 46 bytes of payload in the ether frame. This means that IP needs a length field in its header so that it can determine where the packet ends. (Although, in principle possible, one does not attempt to pack small IP packets within a maximal sized ethernet frame; but instead encourages the use of longer packets.) This is the basic idea of what is done with gigabit ether, but it is done outside the frame. You add padding after the end of the packet by keeping the carrier on -- for a minimum of 512 bytes (4096 bit times) for a minimum sized packet. This means that a minimum sized frame uses an additional 3584 bit times of channel time. These additional bits are non-data bits and are referred to as extension bits. The good news is that you maintain the original frame structure even though your speed is increased by a factor of 10.
But this is a bit illusory -- basically, in the case of minimum sized frames, we increase the speed by a factor of 10, but increase the overhead so that the extension bits are 7 times as long as the entire original frame. So, for small frames, we have less than doubled our speed. For frames larger than 4096 bits, there is no additional overhead; also, for full duplex, there is no additional overhead as the extension bits are NOT required there.
To address our poor performance with small packets, we allow for packing of the frames -- Recall that after each frame, the 10 MB (and 100 MB) specification requires that one turn off carrier and only start on the next frame when you have sensed no carrier for at least 96 bit times (the interframe gap IFG). The reason for this is to make it so that hosts are obliged to share the ethernet channel; but notice that if fair sharing is the ability to pass a fair share of the total number of data bytes, then this is unfair to users of small frames. The gigabit standard addresses this with frame bursts.
With gigabit ethernet, you are allowed to send a frame and, at the end fo the frame, keep the carrier on and go ahead and send additional frames separated by IFG amount of idle time (but with the carrier left on to prevent others from contending for the channel.) So, the idea is:
Notice that the protocol requires you to use extension bits for the first frame even though you COULD have been sending additional frames. This is to done so that one always knows that a legal collision always collides with the first packet -- so there is no question as to which of the packets need to be resent.
In order to actually send an IP packet across an ethernet, it needs to be encapsulated in an ethernet frame. To do so, one needs to know the ethernet destination address. This is referred to as the Neighbor Discovery problem. Finding out the ethernet address of hosts on a directly connected ethernet is the purpose of Address Resolution Protocol (ARP). For the original description, see RFC 826.
It is reasonable for each host to remember certain ethernet addresses, e.g.
its own ethernet address is queried from its interface card and remembered.
Also, finding the ethernet address associated with a neighbor host needs to
be done for each outgoing IP packet; so it needs to be efficient. As a result,
recently used mappings are kept in a cache on each host. One can obtain a
listing of the current contents of that cache with the arp -a
command (for Unix or Windows). Here is typical output:
Interface: 128.163.130.79 on Interface 0x1000003 Internet Address Physical Address Type 128.163.130.1 00-e0-fe-14-68-c0 dynamic 128.163.130.20 00-50-da-64-0e-ff dynamic 128.163.130.21 00-60-08-31-f4-14 dynamic 128.163.130.22 00-01-02-64-7c-dd dynamic 128.163.130.23 00-50-04-aa-2d-10 dynamic 128.163.130.25 00-50-da-64-0e-ff dynamic 128.163.130.31 00-50-04-9a-0d-6b dynamic 128.163.130.32 00-02-b3-25-0a-bb dynamicThe command also allows you to add and delete entries by hand from the cache. All the entries here are marked dynamic which means that they will time out and be deleted from the cache if they are unused for a while. The exact timing is controlled by registry entries on Windows systems, but the default behavior is:
arp
command are static entries; they never time out.
Here is the basic ARP algorithm:
The reason that the requestor sends its own addressing information is
The ARP protocol uses an ethernet frame; i.e. it is NOT an IP packet. It uses ethernet type 0x806 in the DIX header. The ethernet broadcast address is ff-ff-ff-ff-ff-ff. Here is the format of the body of the frame:
Length | Field |
2 | Hardware Address Space (ethernet = 1) |
1 | Hardware Address Length in bytes |
1 | Protocol Address Length in bytes |
2 | Opcode (request=1, response=2) |
n | Sender's Hardware Address |
m | Sender's Protocol Address |
n | Receiver's Hardware Address |
m | Receiver's Protocol Address |
The receiver's hardware address is typically not filled in for the request packet, as its value is unknown. The RFC suggests that one might use the broadcast address there if convenient.
DHCP is used to distribute IP numbers to machines on the network. When a host receives an offer from a DHCP server, it checks to see whether or not it appears to already be in use on the network by sending out a so-called gratuitous ARP. This is a request in which the target and the sender is the same. The idea is that any system using IP address (referred to as the defender) will send a reply -- the reply will inform the offender that the IP number is actually in use. In this case, the DHCP server will be sent a DECLINE message from the client; the server will take the number out of service.
Exercise: Actually, there is one more step. The defender will itself send a gratuitous ARP itself. This is called Address Conflict Exchange. Explain why this additional step is needed. What would be the consequence if this ARP were not heard?
Normally, the host whose IP address is being resolved should be the one which answers the ARP request. However, there are situations in which it is convenient to let other hosts act as a proxy answering the ARP request on behalf of another host; in this case we refer to the protocol as Proxy ARP.
As an example of a situation in which proxy ARP is useful, consider a host A which is connected to a LAN, but also has a modem through which machines connect using PPP. If B is connected through PPP, then A can be a proxy for B -- the end result is that packets destined for A will be encapulated in frames addressed to A. In this case, B does not see the ARP request packets because it is not on the ethernet, and, in fact, it does not even have an ethernet address.
Proxy ARP can be used by network gateways as well in order to
make it appear as if machines are working in an unsubnetted environment.
This is what is done here at U.K. Notice that the routing table on
the machines in INSLAB show that all of 128.163.0.0/16 is directly connected
to the local ethernet. What is actually happening is that the router
is acting as a proxy for all the machines which are not on the local ethernet.
To see that this is indeed happening, analyze the following output of
the arp
command, taken shortly after I had sent a ping to
the two main campus nameservers as well as to www.corel.com.
Interface: 128.163.130.79 on Interface 0x1000003 Internet Address Physical Address Type 128.163.1.6 00-e0-fe-14-68-c0 dynamic 128.163.3.10 00-e0-fe-14-68-c0 dynamic 128.163.130.1 00-e0-fe-14-68-c0 dynamic 128.163.130.20 00-50-da-64-0e-ff dynamic 128.163.130.21 00-60-08-31-f4-14 dynamic 128.163.130.22 00-01-02-64-7c-dd dynamic 128.163.130.23 00-50-04-aa-2d-10 dynamic 128.163.130.25 00-50-da-64-0e-ff dynamic 128.163.130.31 00-50-04-9a-0d-6b dynamic 128.163.130.32 00-02-b3-25-0a-bb dynamicFor more details, see RFC 1027.
The purpose of this section is to give the flavor of how one might implement an ARP module in the operating system. Our implementation will omit important details needed for an actual implementation, but hopefully will be detailed enough to show the basic lay of the land. Here are some facets of the environment:
One of the problems which every operating system has to address is the problem of how to handle a wide variety of i/o devices. Typically, the program must accomodate both hardware that has not even been invented when the operating system version was written. This is typically done by virtualizing the devices: you extract a certain number of standard operations like read and write. One maintains a table of devices which includes, for each device, a pointer to the code that implements each virtual operation for the device; operating system code references the device through this table. The code that implements the virtual operations is called the device driver; new devices can be added to the system because the operating system typically can dynamically link a new device driver into itself by loading the code into its address space and updating the table. At the same time, various higher level structures may also be created.
In our particular case, one of the higher level structures is the
network interface. One normally expects that the operating system
would contain a network interface table, which would normally
be a linked list in order to allow it to be dynamic. The table would
have a row for each interface, one for each network card or modem as well
as one for the LOOPBACK interface.
You will get an idea of some of the publicly
viewable parts of the network interface when you use GetIfTable
for your fourth programming assignment. Each row of the table might
be described as a netif structure. Here are typical fields you might
expect:
Suppose an ethernet packet is input by the network interface card. It requests an interrupt, the interrupt service routine checks for errors, allocates memory for a structure it can use to add the packet to the appropriate interface's input queue, makes the entry, and schedules the remaining input handling. When additional interrupts are handled and all higher priority tasks are finished, the interface input handler is called. This routine's job is to demultiplex the ether stream: In our case, it will examine the ethernet Type field in the frame header and call the appropriate handler -- e.g. a separate routine for arp packets, one for IP packets, etc. If there is no handler for the particular type, the memory used for the frame needs to be deallocated; in all cases, statistics in the netif structure need to be updated.
Now we are ready for the actual ARP implementation. The ARP cache can be implemented in a number of ways:
What is stored in each ARP cache entry is defined by an arpEntry struct. Here are some fields you might expect (given that you know about the MIB_IPNETROW typedef):
You will need several utility subroutines:
We are now ready for the generic output routine, which would be called by e.g. the IP implementation when it wants to send a packet:
int netOutput(NETIF *netif, IPPACKET *pIP, int length) { if (netif->state != UP) { freeMemory(pIP); return ERROR; } if (netif specifies LOCALHOST) { locally deliver the packet; return SUCCESS; } else if(pIP specifies a broadcast packet) { Add frame to the Output queue (don't forget to lock) of netif using the ethernet broadcast address ff-ff-ff-ff-ff-ff. return SUCCESS; } else { ARPENTRY *pEntry = ARPLookup(pIP->nextHopAddr, pIP->type, netif); if (pEntry->state == DYNAMIC || pEntry->State == STATIC) { Add frame to the Output queue of netif using the Mac address in eEntry. } else if (pEntry->state == INVALID) { if (output queue is FULL) { freeMemory(pIP); } else { Add frame to the pEntry's output queue } } else { Allocate an ARPENTRY, add it to the table initializing it with this frame in its Output queue. Call SendARP } return SUCCESS; } }
Here is the routine for handling ARP packets; it is called from the ethernet demultiplexer routine.
int ARPInput (NETIF *netif, ETHERFRAME *pEther) { if (Packet does not specify ethernet or does not specify IP) { freeMemory(pEther); return SUCCESS; } if (Packet is Request packet) { if (Packet is requesting local address) { if (sender and target IP are identical) { Send ARP Response frame to defend IP number Send Gratuitous ARP Request frame } else { Call AddARP to remember the source address Send ARP Response Packet } } else { Call ARPLookup to see if the source address is in the cache If so, update the entry with the source Mac Address } } If (Packet is a Response packet) { Call ARPLookup If found, update entry and re-queue all packets in entry's output queue to the netif output queue. } freeMemory(pEther) return SUCCESS; }
In addition, one also needs a timer routine which is called to update the ARP cache. It decrements various time counters in the entries, discarding the entries when appropriate. It should be run once every second.
void ARPTimer(int ticks) { for (Every ARP cache entry) { entry.TTL -= ticks; if (entry.TTL < 0) { clean up and freeMemory(&entry); } else if (entry.type == ILLEGAL) { entry.resendTimer -= ticks; if (entry.resendTimer <= 0) { if (entry.resendCnt < MAX_RESENDS) { ARPSend(&entry); entry.resendTimer = RETRY_TIME; entry.resendCount++; } else { clean up and freeMemory(&entry); } } } } RescheduleARPTimer(); }
Suppose we have multiple LAN's, say all running ethernet. Working at the network level, we have connected them using routers, which are also referred to as layer 3 switches. The router basically copies an IP packet from one network to the other. On the other hand, we might connect them with repeaters (a.k.a. a layer 1 switches). These are devices which simply copy bits from one LAN to the other. This means that if two nodes send frames at the same time, they will collide, even if the nodes are on differen LAN's. We say that the two LAN's form a single collision domain. Multiport repeaters are often referred to as hubs. The third possibility is to connect the LAN's with a bridge (a.k.a. a layer 2 switch). In this case, we are copying link layer frames (i.e. ethernet frames) from one LAN to the other. A layer 2 switch reads in an ethernet frame, buffering it until it can be transmitted onto the other LAN. Because of this store-and-forward model, the two LAN's form separate collision domains. Multiport layer two switches are often referred to as simply switches.
Bridges can help alleviate a number of limitations of the CSMA/CD LAN:
The rest of this section will describe bridges. It is convenient to break the functionality into three levels: basic bridges, learning bridges, and transparent bridges. The functionality of all three is standardized as in 802.1, and implemented in all modern bridges.
Most nodes on an ethernet are configured to listen for packets with a small number of destination addresses -- e.g. the MAC address of the interface card, the broadcast address and a small number of multicast ranges. For diagnostic purposes, however, one sometimes listens for and handles all the frames that pass the node; this is called operating in promiscuous mode. The basic bridge simply listens promiscuously on each interface, storing each frame until it can be transmitted on its other interfaces (in accordance with media access protocol). The description holds for both 2 port as well as multiport (hub) bridges.
The basic bridge addresses the physical size limitation and (for token ring) the limitation on the number of nodes. However, it does not help with the bandwidth limitation because every frame is transferred to every port. (The buffering capability of the bridge can help slightly with bandwidth in the case of simultaneous data bursts.)
A frame need only be transferred from the LAN of the source node to the LAN of the destination node. In particular, if the source and destination are on the same LAN, the frame need not be handled by the bridge at all. This could be handled by statically configuring the bridges so that they know precisely where each MAC address is located. However, it is often possible with a simple algorithm for the bridge to learn the location of the MAC addresses.
The learning bridge operates as follows:
The learning bridge works well if the LAN's are connected with bridges in a loop-free topology -- i.e. in a tree. However, it fails spectacularly if there are loops in the network. Here is an example:
One of the problems of a simple learning bridge is that they must be connected in a loop free topology. In particular, this means that one add redundant bridges to improve reliability. The way to resolve the problem is with the so-called transparent bridge -- this is a bridge that implements the learning behavior as well as the so-called spanning tree algorithm. A collection of such bridges autoconfigure themselves in a plug-and-play manner so that there are no loops; if they are connected, certain of the bridges will become dormant in order for the others to form a minimal cost spanning tree. More specifically:
To accomplish all this, the bridges spend special configuration frames out each of their ports. These configuration frames are multicast out the ports -- the address is one such that the frames will be received by all bridges on the same LAN as the port, and such that no bridge will forward the frame off the LAN. Here is the format of the body of the frame:
Length | Field |
2 | Protocol Id (0) |
1 | Version (0) |
1 | Message Type (0) |
1 | Flags |
8 | Proposed root bridge Id |
4 | cost (to proposed root bridge) |
8 | sending bridge Id |
2 | sending port Id |
2 | Message age |
2 | Maximum age |
2 | Hello time |
2 | forward delay |
The basic ingredients are:
Here is the basic algorithm:
Configuration Updates are sent out regularly:
Finally, we should consider the interaction between the spanning tree algorithm and the learning bridge function. The learning bridge cached entries do need to time out -- but they only become invalid because of a topology change. So, typical timeout values are 15 minutes -- about how long it take to do a physical bridge move. However, the topology can change simply because of a change in the spanning tree. For this type of change, 15 minutes are too long to have bad cache entries. So, one needs to have a way to tell bridges to time out cache entries faster when the spanning tree changes. This is accomplished by Topology Change Notification frames:
Length | Field |
2 | Protocol Identifier (0) |
1 | Version (0) |
1 | Message Type (128) |
We will restrict ourselves to asynchronous byte oriented protocols, specifically SLIP and PPP -- the first because it is so simple, and the second because it is such a common protocol.
Details are available in RFC 1055. This is a very simple protocol that provides nothing more than frame delimitation. This is done by adding an END character (ASCII 0xc0) right before and right after the IP packet. Since the packet might also contain, END characters, one has to escape them with a character stuffing format:
The RFC 1055 defines a MTU of 1006 characters, which is typical for SLIP implementations. Some implementations, like in Windows allows for 1500 characters, so that IP packets running over ethernets do not need to be fragmented in order to pass over the serial line.
In addition, there is a protocol for compressing TCP/IP headers. For details, see RFC 1042.
This is a rather complex protocol described in RFC 1661. It includes:
PPP encapsulation is based on HDLC (High-level Data Link Control, used originally for synchronous serial connections). This explains some of the extraneous fields which can occur in the frame -- beyond this, one has packet delimiting done with byte stuffing as in SLIP and a Protocol field to allow for multiple protocol families on the same connection. Here is the frame:
Length | Field |
1 byte | Flag 0x7E |
1 byte | Address 0xFF |
1 byte | Control 0x03 |
2 bytes | Protocol |
Variable | IP Datagram |
2 bytes | Frame Check Sequence |
1 byte | Flag 0x7E |
ch
in the frame are replaced by 0x7D followed by
ch | 0x10
-- i.e. change a character with ascii value less than 32 decimal
by setting the 5th bit. (This last part is done only if negotiated via
the LCP; it avoids problems because control characters may be used for
serial line flow control.)
Because the standard frame given above includes a certain amount of redundant data, one can negotiate with LCP to avoid sending it. A typical frame would then look like:
Length | Field |
1 byte | Flag 0x7E |
1 bytes | Protocol |
Variable | IP Datagram |
2 bytes | Frame Check Sequence |
1 byte | Flag 0x7E |
The MTU is by default 1500 bytes.
As an interesting aside, it is possible to combine multiple PPP links so they are treated as a single IP connection. So, for example, one could combine two analogue phone lines to provide a service at twice the effective bandwidth. This is called PPP Multilink Protocol and is documented in RFC 1990.