Checking the health of your Internet Router
Between one router and another, a BGP speaker sets up a conversation with another BGP speaker to exchange internet routing information. All the prefixes are exchanged, imported, analysed and if necessary adjusted to provide good cost-to-quality connectivity. Sometimes this process goes wrong, and customers experience BGP session ‘flapping’ or instability. Every now and then we receive questions about this, so we thought we would share a recent case, along with our advice.
The Case of the Flapping BGP Session
The following example, based on a recent customer case, gives an insight into what can go wrong:
Let’s analyse what this means in terms of the number of prefixes that the network’s router has to process:
Based on these figures we can calculate the total number of prefixes:
We looked at the router and found the problem. Many popular enterprise and service-provider routers are only capable of handling 4 million entries in the RIB (routing information base) and a million entries in the FIB (forwarding information base). Because the last customer BGP session went over the 4 million RIB capacity level, the router RIB was being overloaded.
Due to the RIB overload the BGP session was continuously resetting while trying to reload all the prefixes, resulting in an unstable ‘flapping’ BGP session.
We were able to limit the amount of prefixes in order not to exceed 4 million as an interim fix. For Join Transit we were able to change the session to a default-only route. In the medium-term this then allowed the customer to upgrade to a more modern router platform with higher capacity.
Global Prefix Growth
This issue is becoming more common due to the sharp recent growth in the number of prefixes for global connectivity. This is largely due to a shortage of IPv4 address space. More and longer address blocks are being announced with smaller prefixes, and the IP-space is becoming more and more fragmented. At the same time there is overall growth in the number of IPv6 prefixes as IPv6 is deployed more widely.
Other Reasons for Flapping
There are many reasons for flapping, including
To diagnose, we start with troubleshooting based on logging/counters from both sides. This often helps reveal the root of the problem.
Working around the problem
Setting up new BGP sessions which will be certain to go over the maximum RIB number capacity of a router can have unpredictable results. The RIB and FIB router specifications are generally set out in product and service specifications. If not, you can raise a trouble ticket with your supplier/vendor to ascertain details including RIB and FIB specifications.
If you are unsure about this, to avoid flapping BGP sessions we suggest you get in touch with us to ask what we can do to limit the size of your RIB. There are things we can do to mitigate the problem while you make a case or wait for the CAPEX needed for new router platforms.