[rrg] Why a host-based solution does not necessarily add signalling load

Shane Amante shane at castlepoint.net
Wed Jan 21 21:28:22 PST 2009


Robin,

I find this whole discussion of a separate reachability probing system  
to be fraught with a number of economic, security, more importantly,  
operational problems.

On Jan 21, 2009, at 8:21 PM, Robin Whittle wrote:
> A further reduction in probing, and greater flexibility in deciding
> which ETR the traffic packets will be tunneled to can be achieved by
> the Ivip approach of using a completely separate reachability probing
> system - separate from ITRs and from sending hosts.  The end-user
> would either do this, or have some company do it for them.  The
> probing could be done from multiple points all over the Net, to the
> ETRs or more likely through the ETRs to routers and/or hosts in the
> end-user network.

Really?  There is no way companies are going to allow one of these 3rd- 
party "probing/reachability companies", even if they have a paid  
relationship with them, to probe through their ETR's to their internal  
network.  Who's to say that the probing company (or companies) doesn't  
get hacked and they're used as a launching point to attack the inside  
network of a company using them for services?



> Most likely, that company's specialised probing system would be
> configured to make the decision on how best to map the destination
> network's micronets - and would change the mapping within seconds,
> according to whatever criteria were specified by the administrators
> of the destination network.  Within a few seconds, all ITRs in the
> world handling packets addressed to these micronets would be
> tunneling these packets to the ETR chosen in the new mapping decision.
>
> This is a generalised DFZ->destination-network probing approach,
> since the probing servers would, broadly speaking, be in the DFZ.
> This is the only scalable approach, since the same level of probing
> would still occur if 100,000 ITRs were sending packets to the
> destination network as if one, none or a few were sending packets.
>
> The key to using a separate, dedicated, reachability probing system
> (quite outside the Ivip system itself, and so which can be made to
> work on any principles, any protocols etc. which suit the destination
> network being probed) is Ivip's real-time mapping distribution
> system.  This tells all the ITRs which need to know which ETR to
> tunnel the packets to.  This greatly simplifies the ITR and ETR
> design and separates out reachability testing and the resulting
> decision-making from the core-edge-separation system itself. (LISP,
> APT, TRRP and Six/One Router monolithically integrate them.)

This won't fly operationally.  First, I can't envision an economic  
model that would cause someone or, better yet, multiple probing/ 
reachability companies to be launched as you envision.  Ultimately,  
you're talking about not only a lot of fixed costs for servers and  
such, but more importantly the OpEx for colocation costs and bandwidth  
that company would be burning to send out millions, billions or  
trillions of "probes" to everyone's ETR on the planet.

Second, these probes from a 3rd-party probing/reachability company  
will not reliably tell end-user networks that there is, in fact, known  
"good" connectivity to an ETR, because of ECMP and/or LAG being used  
along certain paths within SP's networks and not others.  More  
specifically, ECMP and/or LAG are load-balancing mechanisms that SP's  
widely use to scale the physical BW between nodes in their network,  
(e.g.: to scale BW between a city-pair to 100G and [much] larger by  
logically bonding together multiple OC-192's, etc. into a single  
"bundle").  The key part of those technologies is core routers use IP  
header information, (L3 addresses and/or L4 port information), as  
input keys to their load-hashing algorithms to determine the  
particular output link in a LAG or ECMP "bundle" that a particular  
flow goes on.  The problem observed in operational networks today is  
that "soft-failures" cause a particular link in a bundle to stop  
forwarding traffic, which goes unnoticed to IGP's (OSPF or IS-IS) or  
BGP and, unfortunately, results in blackholing of customer traffic  
until the problem is isolated and "bad" link in the bundle is taken  
out of service.  Unfortunately, there aren't tools to diagnose this  
problem today so it's very difficult and time-consuming to  
troubleshoot & resolve these problems.  More to the point of why your  
proposal of 3rd-party probing companies will not work is:
- When these 3rd-party probing companies are transmitting probes  
toward an ETR they're going to be unable to construct *identical* EID  
within RLOC packets that would cause them to be hashed and push on the  
same links that are bound for that customer's ETR.  IOW, the 3rd-party  
probe company *will* get false-positives that will either: a)  leave  
the ETR's in service when there is a bad link blackholing traffic on a  
parallel path unseen by the 3rd-party company; or, b) falsely take an  
ETR out-of-service because the 3rd-party probing company saw traffic  
being blackholed on a link that may, in fact, carry very little or  
know end-user traffic toward that ETR.
- Economically, it's infeasible for this imaginary probing company, or  
set of probing companies, to be able to deploy servers to every POP in  
every ISP to ensure full coverage of these LAG & ECMP paths ... or,  
for that matter, a decent percentage of each operator's network that  
it can reasonably approximate reachability across all paths in that  
every ISP's networks.
- As an operator, I don't have much (any?) faith in trusting others to  
determine connectivity (or lack thereof) to/from my ETR's.  I trust my  
network and the configuration of my ITR's/ETR's to determine that.   
Furthermore, when someone goes wrong (e.g.: probes aren't returned) I  
can easily login to those devices (since I own them) and quickly  
troubleshoot the problem and restore service how *I* deem appropriate  
for my network.

So, to summarize, *if* we have to do probing, the only operationally  
and economically feasible method is either to/from the ITR's/ETR's or  
the end-systems themselves.  (More on this just below).


> So having something other than the ITRs doing the probing and making
> the decisions does involve some extra complexity - a real-time
> mapping system.  I believe this is a small price to pay for the
> greater flexibility, more robust probing (all the way to the
> destination network, not just to the ETRs), greater simplicity in
> ITRs and ETRs, reduction in probing traffic etc.  Also, it enables
> real-time control of ETR address for incoming TE.

This discussion does raise an interesting architectural point with  
respect to all map-and-encap solutions, which depend on reachability  
probing between ITR's and ETR's.  Specifically, do these reachability  
probes faithfully represent end-user (host-machine originated and  
terminated) flows so they are travelling along the same path as host- 
to-host data flows?  I would argue that they do not, (without  
substantially increasing the probing traffic load/bandwidth to  
"search" through all possible paths used by end-user traffic).

OTOH, solutions that either:
a) do proactive probing originating from hosts at something at or just  
below the TCP/UDP layer; or, better yet,
b) piggyback ***reachability*** detection along with active TCP/UDP  
traffic flow (cf: TCP ACK's) between hosts
... would faithfully represent that a given path really is working.

FWIW, I don't think there's a straightforward solution for map-and- 
encap solutions to the aforementioned probing over LAG/ECMP paths  
problem.  If there is, I'm all ears.  OTOH, if there's not a solution,  
I do not think we should gloss over it and, instead, I would strongly  
recommend we should point this out out very clearly within an  
"Operational Considerations" section of any/all map-and-encaps  
protocol specs or an overall/all-encompassing companion "Operational  
Considerations" document noting this and other 'challenges' that map- 
and-encaps can't feasibly address.  Said document(s) may prove helpful  
in evaluating the final set of candidate solutions that the RRG will  
offer its recommendation on.

-shane


More information about the rrg mailing list