September 2008 Archives

Изя всё

|
В начале прошлого года я написал несколько постов про PBT, выразив сомнения, что эта конструкция будет жить. В принципе, судьба этого чуда враждебной техники была понятна еще в апреле этого года, когда BT - главный потребитель этого несчастья - передумал. Поэтому, сегодняшняя новость о том, что Nortel - главный производитель этого несчастья - спешит избавиться от соответствующего подразделения, не была шоком.

Думаю, что на этом можно ставить точку. Конечно, стандартизаторы таки достандартизируют свой, как они его называют, PBB-TE - стандартизационные комитеты вообще не очень обращают внимание на реальность. Но, думаю, что никто в здравом уме не станет это деплоить. Мои соболезнования тем, кто повелся на нортелевскую пропаганду и успел на этом построиться.

Disqus

|
По наводке уважаемого maxss проинтегрировал в блог Disqus для комментов. maxss же написал замечательную инструкцию по регистрации в Disqus. Я надеюсь, что комментировать станет удобней. Прошу тестировать.

Напоминаю, что в соотвествии с Планом, я через некоторое время перестану кросспостить в ЖЖ вообще. Так что призываю всех своих читателей подписываться на фид, либо, если хочется читать в ЖЖ-шной ленте, подписываться на ЖЖ-шную синдикацию.

Overcoming peer fraud

|
Just for a change I'm going to try to blog in [admittedly broken] English. Hey, Petr, I'm jealous about your postings in internetworkexpert.com blog and the feedback you're getting, and I'm going to catch up. Just kidding. ;)

Next, I'm going to use Junos examples to illustrate this posting instead of IOS. I've finally got my olives in qemu working nicely on my home PC, and I'm anxious to put it to use.

The problem

A couple of weeks ago my LJ-friend visir has asked an interesting question: "Assume we have a client, which also peers at the same IX as we do. Then the client can defraud us. They can announce /22 on their direct link to us and one or two /23 through the IX route server. Now traffic from the Internet to that prefix flows to us, and takes exit at the IX to the client. Our client enjoys a free ride stealing bandwidth from us. How do you deal with this problem?"

If you think about it, it may sound somewhat weird. We're basically peering with our own customer. Well, that can and does happen in the real world. So we have to come up with a solution.

In fact, this problem with peering isn't new. It was mentioned a few times on various mailing lists and there was a presentation at NANOG 38 in 2006, describing a whole set of possible tricks a peer can play on you. I suggest reading the presentation before continuing with this posting. Please note, that it is not limited to peering; even your upstream can do nasty things to you.

Speaking of possible solutions, I'd like to point out that having a written and signed agreement is an absolute must when dealing with service stealing issues. Yes, it applies to peering as well, not only to Internet transit. Do have a peering agreement, which clearly states what a peer can do. No agreement - no ground for complaints. It leads us to observation 0: being promiscuous in your relationships is asking for trouble (you probably knew that before). The observation basically rules out peering with route server on IXs in favor of direct peering. Sure, direct peering is not always possible, so weigh your risks carefully when engaging in an IX peering.

That was a short comment on administrative side of the issue. Now to the technical one.

Luckily, there're technical ways to mitigate the threat, so you don't have to rely on administrative measures only.

Technical solution boils down to enforcing certain routing policies. The rules are:
1. Traffic ingressing from an upstream must flow only to a customer.
2. Traffic ingressing from a peer must flow only to a customer.
3. Traffic ingressing from a customer must flow to a customer if a customer route exists, otherwise it can flow to upstream or peer.

How can we enforce the policy? Clearly, control-plane-only solution is not sufficient (you've read NANOG presentation above, haven't you?). We have to somehow couple forwarding plane operations to control plane policy.

Other solutions

There're quite a few known solutions to this issue.

One from Juniper-land: DSU/SCU and their usage in firewall policies. Basically DSU/SCU is a method of grouping traffic on destination or source address. We could group all traffic in three classes: customer, upstream and peer, and then employ class-based filtering to enforce our rules.

Another one is from Cisco-land: QPPB. There's a quite cool method described in details in another NANOG presentation: http://www.nanog.org/meetings/nanog42/presentations/DavidSmith-PeeringPolicyEnforcement.pdf.

These solutions can deal with the problem, but there's a catch. They are designed to drop traffic when a peer tries to defraud us. This is a problem, because even unintentional misconfiguration (say, a sloppy attempt at traffic engineering) will result in service disruption. It would be nice, if we could handle it better and somehow ignore fraudulent/erroneous routes.

You've probably noticed in abovementioned presentations references to separate routers carrying partial routes. Yes, it can solve the problem, but it is uneconomical.

One can employ virtualization, i.e. logical routers, VRFs etc. But it is still uneconomical. You need a separate VRF per peer, if you don't want your peers to talk to each other through you. And a separate VRF per upstream, if you want enforce routing policy on traffic, coming from upstream providers too (remember, upstreams can also play games on you). Just imagine the amount of route replication between VRFs. It not only taxes on router memory and CPU resources, it also wastes precious FIB space - keeping several BGP full views each in separate VRF is very, very expensive.

Still, virtualization (or compartmentalization, if you will) is a very powerful method to deal with our problem. Let's try to design a practical method to use it.

The problem demonstrated

This is our playgroud where we're going to test our findings.

peer-fraud.png

Nothing fancy in the setup. IS-IS, LDP, iBGP full mesh between border routers, and eBGP to external autonomous systems. eBGP export policies are configured as usual: customer prefixes are announced to upstream ISP, local prefixes are announced everywhere, upstream and peer prefixes are announced only to customers.

AS 4 is a customer, AS 3 is a peer (a well behaved one), AS 100 is an upstream ISP. AS 2 is a customer, which is trying to defraud us. AS 2 has customer link to router BORDER-3 and a peering link to BORDER-2. AS 2 announces 2.0.0.0/8 route over the customer link and 2.0.0.0/9 over the peering link.

Let's see the issue in action.

dg@UPSTREAM> show route 2.0.0.1 terse

inet.0: 10 destinations, 10 routes (10 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

A Destination        P Prf   Metric 1   Metric 2  Next hop        AS path
* 2.0.0.0/8          B 170        100            >100.0.1.1       1 2 I

dg@UPSTREAM> traceroute 2.0.0.1    
traceroute to 2.0.0.1 (2.0.0.1), 30 hops max, 40 byte packets
 1  100.0.1.1 (100.0.1.1)  0.761 ms  96.959 ms  100.964 ms
 2  1.0.255.0 (1.0.255.0)  158.851 ms  104.946 ms  106.054 ms
     MPLS Label=100016 CoS=0 TTL=1 S=1
 3  1.0.255.3 (1.0.255.3)  201.972 ms  104.960 ms  105.964 ms
 4  2.0.1.0 (2.0.1.0)  105.393 ms  52.344 ms  154.138 ms
 5  2.0.1.0 (2.0.1.0)  149.144 ms !H  105.044 ms !H  106.142 ms !H

You can see that traffic exits on the peering link to AS 2. Bad, bad, bad. Same with traffic from customer (AS 4), and even from a peer (AS 3).

Design and implementation

If you look at our rules again, you can see that a packet must be treated differently, depending on where it is coming from. That's observation 1.

We must make routing decisions (i.e. select egress point) right on ingress point, and once we've selected the egress point, we must not change it while the packet travels through our network. That's observation 2.

How do we implement this?

We want to make different routing decisions based on what the ingress link is: a customer, an upstream, or a peer. This implies multiple routing tables, right? But haven't we concluded that keeping separate routing tables for each upstream or peer is out of the question? Not exactly so. Keeping lots of copies of lots of routes is wrong; having multiple routing tables is ok, if we can populate them in an efficient way.

Let's see what additional routing/forwarding tables do we need and what routes they need to carry. Let's take a routing table to forward traffic coming from an upstream ISP. It needs to carry only customers' routes, locally originated routes, but not peer or upstream routes. Same with routing table for a peer. Do we need separate routing tables for each upstream or peer? No. If a routing table has only customer and local routes, then it is ok to use only one instance to forward traffic from all upstreams and peers connected to a router.

How can this be achieved? The idea is to have an external entity to run BGP signaling to our master routing instance (global routing table in Cisco speak), populate a restricted routing instance with needed routes from master routing instance, and then forward traffic from that entity within that restricted routing instance. We can do that with so called FBF (Filter Based Forwarding).

interfaces {
    fxp2 {          
        unit 0 {    
            family inet {
                filter {
                    input restricted;
                }   
            }       
        }
    }
}

routing-options {
    auto-export;
}

policy-options {
    policy-statement client-and-local-only {
        term local {
            from community local;
            then accept;
        }
        term client {
            from community client;
            then accept;
        }
        then reject;
    }
    policy-statement restricted-import {
        term needed-routes {
            from {  
                instance master;
                policy client-and-local-only;
            }       
            then accept;
        }           
        then reject;
    }               
}

firewall {
    filter restricted {
        term bgp-passive {
            from {
                destination-port bgp;
            }
            then accept;
        }
        term bgp-active {
            from {
                source-port bgp;
            }       
            then accept;
        }           
        term other {
            then routing-instance restricted;
        }           
    }               
}

routing-instances {
    restricted {    
        instance-type forwarding;
        routing-options {
            static {
                route 0.0.0.0/0 discard;
            }       
            instance-import restricted-import;
        }           
    }            
}

Traffic coming from a customer is slightly different. Remember rule 3: ... must flow to a customer if a customer route exists, otherwise it can flow to upstream or peer.

What we need to do in this case is to look up a customer route first, and if there's no such route, then we fallback to looking up a peer or an upstream route. So we make another routing instance, which again carries customer and local prefixes, and also add a fallback route to it. Again, a single such routing instance is sufficient to forward traffic from all customers.

Configuration is very similar to previuosly shown, but instead of default discard route we configure lookup in master routing table.

routing-instances {
    restricted-fallback {
        instance-type forwarding;
        routing-options {
            static {
                route 0.0.0.0/0 next-table inet.0;
            }       
            instance-import restricted-import;
        }           
    }               
}

It is pretty economical way to use multiple routing tables. We only have to replicate customer and local routes and we have to do that only to two extra tables, regardless of number of external commections on the router.

Let's now tackle with observation 2. Let me show why it is important. This is traceroute after we implemented restricted routing tables.

dg@UPSTREAM> traceroute 2.0.0.1 source 100.0.0.0    
traceroute to 2.0.0.1 (2.0.0.1) from 100.0.0.0, 30 hops max, 40 byte packets
 1  100.0.1.1 (100.0.1.1)  0.893 ms  99.335 ms  96.004 ms
 2  1.0.255.0 (1.0.255.0)  96.225 ms  240.356 ms  95.819 ms
     MPLS Label=100000 CoS=0 TTL=1 S=1
 3  1.0.255.5 (1.0.255.5)  95.861 ms  95.646 ms  96.030 ms
 4  1.0.255.4 (1.0.255.4)  192.635 ms  95.277 ms  144.631 ms
     MPLS Label=100016 CoS=0 TTL=1 S=1
 5  1.0.255.3 (1.0.255.3)  144.381 ms  148.101 ms  144.356 ms
 6  2.0.1.0 (2.0.1.0)  144.571 ms  100.139 ms  96.159 ms
 7  2.0.1.0 (2.0.1.0)  139.571 ms !H  196.818 ms !H  95.966 ms !H

See, a packet from upstream comes in to BORDER-1, BORDER-1 forwards it to BORDER-3 over LDP-signaled LSP, then BORDER-3 makes it's own routing decision and reroutes the packet to BORDER-2, and the packet exits to the peering link to AS2. We haven't achieved anything!

The problem is with penultimate hop popping. With PHP, a packet arrives at the egress router striped of MPLS headers, and the egress router makes routing decision based on destination IP address. This is not what we want. There're chances that the egress router will make a different routing decision and forward the packet to a different exit point. How can this be avoided? The answer is simple: use one more label. PHP will pop the top label, and the egress router will forward the packet based on the remaining label without looking at IP header. This can be easily achieved with labeled IPv4 unicast address family on our iBGP sessions.

protocols {
    bgp {               
        group ibgp {
            type internal;
            local-address 1.0.0.1;
            family inet {
                labeled-unicast;
            }
            neighbor 1.0.0.2;
            neighbor 1.0.0.3;
        }
    }
}

There's small catch though. We must enable family mpls on our external interfaces, otherwise this setup doesn't work (Is this olive's artifact? I don't have enough real equipment to verify it). This opens our network for label spoofing attack. Thus we must attach a filtering policy to family mpls on our external interfaces.

interfaces {
    fxp1 {          
        unit 0 {
            family mpls {
                filter {
                    input-list drop-mpls;
                }
            }
        }
    }
}

firewall {
    family mpls {
        filter drop-mpls {
            term all {
                then discard;
            }
        }
    }
}

Let's see how is all works together.

dg@UPSTREAM> traceroute 2.0.0.1
traceroute to 2.0.0.1 (2.0.0.1) from 100.0.0.0, 30 hops max, 40 byte packets
 1  100.0.1.1 (100.0.1.1)  1.103 ms  98.689 ms  100.819 ms
 2  1.0.255.0 (1.0.255.0)  100.305 ms  240.509 ms  100.759 ms
     MPLS Label=100000 CoS=0 TTL=1 S=0
     MPLS Label=100064 CoS=0 TTL=1 S=1
 3  1.0.255.5 (1.0.255.5)  51.932 ms  192.196 ms  105.548 ms
     MPLS Label=100064 CoS=0 TTL=1 S=1
 4  1.0.2.1 (1.0.2.1)  148.946 ms  100.330 ms  100.693 ms
 5  1.0.2.1 (1.0.2.1)  96.588 ms !H  153.207 ms !H  101.018 ms !H

dg@CUSTOMER> traceroute 2.0.0.1    
traceroute to 2.0.0.1 (2.0.0.1) from 4.0.0.0, 30 hops max, 40 byte packets
 1  1.0.4.0 (1.0.4.0)  0.744 ms  100.479 ms  102.213 ms
 2  1.0.255.0 (1.0.255.0)  102.067 ms  152.851 ms  101.834 ms
     MPLS Label=100000 CoS=0 TTL=1 S=0
     MPLS Label=100064 CoS=0 TTL=1 S=1
 3  1.0.255.5 (1.0.255.5)  101.457 ms  152.631 ms  153.512 ms
     MPLS Label=100064 CoS=0 TTL=1 S=1
 4  1.0.2.1 (1.0.2.1)  101.715 ms  50.052 ms  153.585 ms
 5  1.0.2.1 (1.0.2.1)  102.267 ms !H  100.998 ms !H  153.793 ms !H

dg@PEER> traceroute 2.0.0.1
traceroute to 2.0.0.1 (2.0.0.1) from 3.0.0.0, 30 hops max, 40 byte packets
 1  1.0.3.0 (1.0.3.0)  0.662 ms  90.742 ms  87.542 ms
 2  1.0.255.2 (1.0.255.2)  91.884 ms  215.552 ms  91.635 ms
     MPLS Label=100000 CoS=0 TTL=1 S=0
     MPLS Label=100064 CoS=0 TTL=1 S=1
 3  1.0.255.5 (1.0.255.5)  45.099 ms  157.433 ms  125.233 ms
     MPLS Label=100064 CoS=0 TTL=1 S=1
 4  1.0.2.1 (1.0.2.1)  132.577 ms  132.641 ms  91.394 ms
 5  1.0.2.1 (1.0.2.1)  103.895 ms !H  145.040 ms !H  133.256 ms !H

Now traffic coming in from customers, upstreams or peers takes the correct route. The fraudulent routes from AS2 are simply ignored.

Acknowledgements

Many thanks to Andrew Lomaka, who has pointed out this problem to me few years ago, shared lots of operational wisdom pertaining to the issue, answered many of my silly questions, and also helped to debug the configuration I described above.


Pages

Archives

Sign In