[rrg] ETR Address Forwarding (EAF) for IPv4 - Bill's critique
Robin Whittle
rw at firstpr.com.au
Sun Dec 21 17:53:32 PST 2008
Hi Bill,
Thanks for your 24 November critique of my proposal for forwarding
IPv4 packets based on 30 bits in the existing header:
Summary of architectural solution space - Ivip still isn't properly
covered
http://www.irtf.org/pipermail/rrg/2008-November/000261.html
>> ETR Address Forwarding (EAF) - for IPv4
>> http://tools.ietf.org/html/draft-whittle-ivip4-etr-addr-forw
>
> That's a bit broken. You can't discard the fragment offset and MF
> bit.
Since we are redesigning the Internet for the long-term, I think we
can do whatever we like.
> I'm pretty sure hosts are allowed to pre-fragment the packets
> and then set the DF bit on each fragment.
Can anyone discuss how common this is?
So rather than have the application send smaller packets, the sending
host's stack (or perhaps the application?!) burdens the receiving
host with the task of reassembly, makes the communication session
more vulnerable to packet loss, but does not expect routers to
further fragment the packets.
Is this allowed? RFC 791 states (p7):
The originating protocol module of a complete datagram sets the
more-fragments flag to zero and the fragment offset to zero.
This may have been superseded, but I think it means the sending host
is supposed to send a single packet, not to send fragments.
The same situation could occur at the ITR if the sending host sent a
long, single, DF=0 packet, and a router between it and the ITR
fragmented it, setting DF=1.
Can anyone suggest how common this is?
The text following that quoted above from RFC 791 doesn't say
anything about a router setting fragments to DF=1.
> Hosts routinely fragment UDP and ICMP packets that are too large
> for the wire, even if they don't set the DF bit. These packets need
> to successfully cross the core.
Can anyone discuss how common this is? What protocols and
applications do this?
I think the concept of fragmented packets, either fragmented in the
sending host, or by the network, is wrong. It places too much
storage and computational burden on the receiving host and makes the
whole system unreasonably sensitive to the loss of a single packet.
In particular, I think it is completely unreasonable for a host to
send a packet which may be too long for the PMTU, expecting the
network to fragment it and the receiving host to reassemble it, while
refusing to accept any message from the network that the packet was
too big.
The IPv6 designers evidently held the same views.
RFC 1191 was developed in 1990 to provide a much better alternative
to hosts expecting routers to fragment their too-long packets. By
the time we implement a scalable routing solution, it will be over
two decades after RFC 1191, which works fine (except when networks
unreasonably filter ICMP Packet To Big messages).
IPv6 (1996) doesn't support fragmented packets from hosts, or hosts
sending packets to the network expecting the network to fragment them.
EAF has major advantages over encapsulation, and I can't see a way of
implementing EAF while accepting fragmented packets, or DF=0 packets
above some agreed length, such as 1470 bytes. The "1470" constant
would be chosen so that all ITRs and ETRs could send such length
bytes without any PMTU problems. Google servers regularly send DF=0
packets of up to 1470 bytes today:
http://www.firstpr.com.au/ip/ivip/ipv4-bits/actual-packets.html
Nor can I see a way of supporting DF=0 packets longer than about 1450
or 1470 bytes with encapsulation, without a lot of extra trouble in
my IPTM approach to handling the PMTUD problems of map-encap:
http://www.firstpr.com.au/ip/ivip/pmtud-frag/
The ITR could use synthetic probe packets to determine the PMTU to
the ETR, and then fragment the too-long DF=0 packet itself, before
encapsulating the fragments and tunneling them to the ETR. The ETR
would decapsulate the fragments and the receiving host would need to
reassemble them. However, this is costly and unreliable.
So I can't see how an IPv4 core-edge separation solution - either
using encapsulation or EAF - could support long fragmentable packets
as the Net has to date.
These are the restrictions on Ivip for IPv4:
Encapsulation, with IPTM:
DF=1 Efficiently handles any size packet, with any PMTU from
~1500 to ~9000 and beyond. No data loss: the packet is
either delivered to the ETR and the ITR adjusts upwards
its lower boundary to the zone of uncertainty about the
PMTU, or if the packet hits a PMTU limit, the ITR gets
the PTB message so the ITR learns some thing about the
PMTU, lowering its upper limit to the zone of
uncertainty about the PMTU, and generates a PTB to the
sending host.
DF=0 Ideally, for simplicity, the ITR should drop packets
longer than some constant, such as ~1200 bytes.
(Maybe more like ~1470 bytes?)
Longer DF=0 packets could be fragmented by the ITR, or
encapsulated and sent to the ETR (by using synthetic
probe packets to determine the PMTU beforehand) but this
is costly, less reliable and allows delinquent
applications to continue their late-80s style
me-generation antisocial behavior.
Fragmented packets . . .
I don't think I have fully considered the ITR receiving
these - but at present think it would be undesirable to
add any complexity to IPTM to handle such packets when
they are long enough to potentially exceed PMTU limits,
once encapsulated. I think this sort of host behavior
should not be supported.
EAF - ETR Address Forwarding:
DF=1 Should work fine for any packet length, since RFC 1191
PMTUD should work fine with the routers en-route to
the ETR, and with the standard sending host RFC 1191
implementation.
DF=0 ITR does not attempt to send DF=0 packets longer than
some constant, such as 1470 bytes. Such packets are
dropped. Shorter packets are sent, and the ETR
reconstructs them as DF=0 packets, so if there is
a PMTUD limit of less than 1470 bytes in a router
between the ETR and the destination host, then the
packet will fragmented there.
Fragments:
ITRs will drop them. Sending hosts should not send
fragments, or DF=0 packets which are long enough to be
fragmented en-route to the ITR.
These restrictions are less onerous than the only alternatives I can see:
1 - Change host stack, apps and Internet service from IPv4 to IPv6.
(IPv6 doesn't support fragments or fragmentable packets
either.)
2 - Greatly complexify IPTM or EAF to handle the few apps which
send fragments or too long fragmented packets - which would
involve undesirable complexity and which could not deliver
the packets with acceptable reliability and costs.
> There's also no point carrying the DF bit if you're not going to
> carry the fragment offset. By definition those packets are always
> DF=1.
In EAF, the DF flag of the original packet is carried so that DF=0
packets shorter than some constant, such as 1470 bytes, can be
reconstructed by the ETR with DF=0. This enables them to be
fragmented, if necessary, between the ETR and the destination host.
> You could rely on the L2 checksum to maintain packet integrity.
> Many if not most core links are a form of ethernet today anyway,
> and the rest could theoretically be upgraded. But you're going to
> need to find some more bits somewhere; 16 isn't enough. I suppose
> you could steal 4 bits from the header length since there's not a
> lot of point in passing anything in the core with IP options
> attached anyway. Just send an "administratively prohibited"
> message if someone tries to send a packet with options. But that
> still only buys you 20 bits.
The IPv6 header doesn't have a checksum - so EAF should be just as
robust as IPv6.
> ICMP errors are gonna be a bit hairy.
The upgraded routers in the DFZ also need upgraded firmware to
reconstruct the original packet format, just as an ETR does, when the
packet hits a PMTU limit. This is necessary to generate the correct
PTB to the sending host. That function is surely defined in
firmware, not fixed in hardware, and since it doesn't happen too
often, the extra steps in reconstructing the packet should not be too
much of a burden on the router.
As far as I know, there shouldn't be any ICMP problems. If you can
point out potential problems in greater detail, that would be great.
This PMTUD stuff is a real headache. I am keen to hear of any
critiques, suggestions etc.
Regards
- Robin
More information about the rrg
mailing list