<?xml version="1.0" encoding="US-ASCII"?>
<!-- This template is for creating an Internet Draft using xml2rfc,
    which is available here: http://xml.resource.org. -->

<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC2629 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2629.xml">
<!ENTITY RFC3277 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3277.xml">
<!ENTITY RFC3719 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3719.xml">
<!ENTITY RFC4271 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4271.xml">
<!ENTITY RFC5120 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5120.xml">
<!ENTITY RFC5301 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5301.xml">
<!ENTITY RFC5303 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5303.xml">
<!ENTITY RFC5304 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5304.xml">
<!ENTITY RFC5305 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5305.xml">
<!ENTITY RFC5308 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5308.xml">
<!ENTITY RFC5309 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5309.xml">
<!ENTITY RFC5311 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5311.xml">
<!ENTITY RFC5316 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5316.xml">
<!ENTITY RFC5440 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5440.xml">
<!ENTITY RFC5449 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5449.xml">
<!ENTITY RFC5614 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5614.xml">
<!ENTITY RFC6232 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6232.xml">
<!ENTITY RFC7182 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7182.xml">
<!ENTITY RFC7356 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7356.xml">
<!ENTITY RFC7921 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7921.xml">
<!ENTITY RFC7981 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7981.xml">
]>

<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc strict="yes" ?>
<?rfc toc="yes"?>
<?rfc tocdepth="4"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes" ?>
<?rfc compact="yes" ?>
<?rfc subcompact="no" ?>
<rfc category="info" docName="draft-white-openfabric-04" ipr="trust200902">

<!-- ***** FRONT MATTER ***** -->

<front>

<title>IS-IS Support for Openfabric</title>

<author initials='R.' surname='White' fullname='Russ White' role='editor'>
<organization>LinkedIn</organization>
<address>
<email>russ@riw.us</email>
</address>
</author>

<author initials='S.' surname='Zandi' fullname='Shawn Zandi' role='editor'>
<organization>LinkedIn</organization>
<address>
<email>szandi@linkedin.com</email>
</address>
</author>

<date/>

<abstract>
<t>Spine and leaf topologies are widely used in hyperscale and cloud scale networks. In most of these networks, configuration is automated, but difficult, and topology information is extracted through broad based connections. Policy is often integrated into the control plane, as well, making configuration, management, and troubleshooting difficult. Openfabric is an adaptation of an existing, widely deployed link state protocol, Intermediate System to Intermediate System (IS-IS) that is designed to:</t>

<t>
<list style="symbols">
<t>Provide a full view of the topology from a single point in the network to simplify operations</t>
<t>Minimize configuration of each Intermediate System (IS) (also called a router or switch) in the network</t>
<t>Optimize the operation of IS-IS within a spine and leaf fabric to enable scaling</t>
</list>
</t>

<t>This document begins with an overview of openfabric, including a description of what may be removed from IS-IS to enable scaling. The document then describes an optimized adjacency formation process; an optimized flooding scheme; some thoughts on the operation of openfabric, metrics, and aggregation; and finally a description of the changes to the IS-IS protocol required for openfabric.</t>

</abstract>

</front>

<middle>

<!-- 1 -->
<section title="Introduction" toc="default">

<!-- 2 -->
<section title="Goals" toc="default">

<t>Spine and leaf fabrics are often used in large scale data centers; in this application, they are commonly called a fabric because of their regular structure and predictable forwarding and convergence properties. This document describes modifications to the IS-IS protocol to enable it to run efficiently on a large scale spine and leaf fabric, openfabric. The goals of this control plane are:</t>

<t>
<list style="symbols">
<t>Provide a full view of the topology from a single point in the network to simplify operations</t>
<t>Minimize configuration of each IS in the network</t>
<t>Optimize the operation of IS-IS within a spine and leaf fabric to enable scaling</t>
</list>
</t>

</section> <!-- end of goals -->

<!-- 2 -->
<section title="Contributors" toc="default">

<t>The following people have contributed to this draft: Nikos Triantafillis (reflected flooding optimization), Ivan Pepelnjak (three stage fabric modifications), Hannes Gredler (do not reflood optimizations), Les Ginsberg (capabilities encoding, circuit local reflooding), Naiming Shen (capabilities encoding, circuit local reflooding), Uma Chunduri (failure mode suggestions, flooding), Nick Russo, and Rodny Molina.</t>

<t>See <xref target="RFC5449" />, <xref target="RFC5614" />, and <xref target="RFC7182" /> for similar solutions in the Mobile Ad Hoc Networking (MANET) solution space.</t>

</section> <!-- end of contributors -->

<!-- 2 -->
<section title="Simplification" toc="default">

<t>In building any scalable system, it is often best to begin by removing what is not needed. In this spirit, openfabric implementations MAY remove the following from IS-IS:</t>

<t>
<list style="symbols">
<t>External metrics. There is no need for external metrics in large scale spine and leaf fabrics; it is assumed that metrics will be properly configured by the operator to account for the correct order of route preference at any route redistribution point.</t>
<t>Tags and traffic engineering processing. Openfabric is only designed to provide topology and reachability information. It is not designed to provide for traffic engineering, route preference through tags, or other policy mechanisms. It is assumed that all routing policy will be provided through an overlay system which communicates directly with each IS in the fabric, such as <xref target="RFC5440">PCEP</xref> or <xref target="RFC7921">I2RS</xref>. Traffic engineering is assumed to be provided through <xref target="I-D.ietf-spring-segment-routing">Segment Routing (SR)</xref>.</t>
</list>
</t>

</section> <!-- end of simplification -->

<!-- 2 -->
<section title="Additions and Requirements" toc="default">

<t>To create a scalable link state fabric, openfabric includes the following:</t>

<t>
<list style="symbols">
<t>A slightly modified adjacency formation process.</t>
<t>Mechanisms for determining which tier within a spine and leaf fabric in which the IS is located.</t>
<t>A mechanism that reduces flooding to the minimum possible, while still ensuring complete database synchronization among the intermediate systems within the fabric.</t>
</list>
</t>

<t>Three general requirements are placed here; more specific requirements are considered in the following sections. Openfabric implementations:</t>

<t>
<list style="symbols">
<t>MUST support <xref target="RFC5301" /> and enable hostname advertisement by default if a hostname is configured on the intermediate system.</t>
<t>SHOULD support <xref target="RFC6232" />, purge originator identification for IS-IS.</t>
<t>MUST NOT be mixed with standard IS-IS implementations in operational deployments. Openfabric and standard IS-IS implementations SHOULD be treated as two separate protocols.</t>
</list>
</t>

</section> <!-- end of additions -->

<!-- 2 -->
<section title="Sample Network" toc="default">

<t>The following spine and leaf fabric will be used to describe these modifications.</t>

<figure align="center" anchor="is-model">
<artwork align="left"><![CDATA[
+----+ +----+ +----+ +----+ +----+ +----+
| 1A | | 1B | | 1C | | 1D | | 1E | | 1F | (T0)
+----+ +----+ +----+ +----+ +----+ +----+

+----+ +----+ +----+ +----+ +----+ +----+
| 2A | | 2B | | 2C | | 2D | | 2E | | 2F | (T1)
+----+ +----+ +----+ +----+ +----+ +----+

+----+ +----+ +----+ +----+ +----+ +----+
| 3A | | 3B | | 3C | | 3D | | 3E | | 3F | (T2)
+----+ +----+ +----+ +----+ +----+ +----+

+----+ +----+ +----+ +----+ +----+ +----+
| 4A | | 4B | | 4C | | 4D | | 4E | | 4F | (T1)
+----+ +----+ +----+ +----+ +----+ +----+

+----+ +----+ +----+ +----+ +----+ +----+
| 5A | | 5B | | 5C | | 5D | | 5E | | 5F | (T0)
+----+ +----+ +----+ +----+ +----+ +----+
]]></artwork>
</figure>

<t>To reduce confusion (spine and leaf fabrics are difficult to draw in plain text art), this diagram does not contain the connections between devices. The reader should assume that each device in a given layer is connected to every device in the layer above it. For instance:</t>

<t>
<list style="symbols">
<t>5A is connected to 4A, 4B, 4C, 4D, 4E, and 4F</t>
<t>5B is connected to 4A, 4B, 4C, 4D, 4E, and 4F</t>
<t>4A is connected to 3A, 3B, 3C, 3D, 3E, 3F, 5A, 5B, 5C, 5D, 5E, and 5F</t>
<t>4B is connected to 3A, 3B, 3C, 3D, 3E, 3F, 5A, 5B, 5C, 5D, 5E, and 5F</t>
<t>etc.</t>
</list>
</t>

<t>The tiers or stages of the fabric are also marked for easier reference. T0 is assumed to be connected to application servers, or rather they are Top of Rack (ToR) intermediate systems. The remaining tiers, T1 and T2, are connected only to the fabric itself.  Note there are no "cross links," or "east west" links in the illustrated fabric. The fabric locality detection mechanism described here will not work if there are cross links running east/west through the fabric. Locality detection may be possible in such a fabric; this is an area for further study.</t>

</section> <!-- end of sample network -->

</section> <!-- End of the introduction section -->

<!-- 1 -->
<section title="Modified Adjacency Formation" toc="default">

<t>Because Openfabric operates in a tightly controlled data center environment, various modifications can be made to the IS-IS neighbor formation process to increase efficencicy and simplify the protocol. Specifically, Openfabric implementations SHOULD support <xref target="RFC3719" />, section 4, hello padding for IS-IS. Variable hello padding SHOULD NOT be used, as data center fabrics are built using high speed links on which padded hellos will have little performance impact. Further modifications to the neighbor formation process are considered in the following sections.</t>

<!-- 2 -->
<section title="Level 2 Adjacencies Only" toc="default">

<t>Openfabric is designed to work in a single flooding domain over a single data center fabric at the scale of thousands of routers with hundreds of thousands of routes (so a moderate scale in router and route count terms). Because of the way Openfabric optimizes operation in this environment, it is not necessary nor desirable to build multiple flooding domains. For instance, the flooding optimizations described later this document require a full view of the topology, as does any proposed overlay to inject policy into the forwarding plane. In light of this, the following changes SHOULD BE to IS-IS implemetations to support Openfabric:</t>

<t>
<list style="symbols">
<t>IIH PDU 16 (level 2 broadcast circuit hello) should be the only IIH PDU type transmitted (see section 9.6 of [ISO10589] and section 4.1 of <xref target="RFC5309" />)</t>
<t>In IIH PDU 16 (level 2 broadcast circuit hello), the Circuit Type field should be set to 2 (see section 9.6 of [ISO10589])</t>
<t>Support for IIH PDU 15 (level 1 broadcast hello) should be removed (see section 9.5 of [ISO10589])</t>
<t>Support for IIH PDU 17 (point-to-pint hello) should be removed (see section 9.7 of [ISO10589])</t>
</list>
</t>

</section> <!-- end of level 2 adjacencies -->

<!-- 2 -->
<section title="Point-to-point Adjacencies" toc="default">

<t>Data center network fabrics only contain point-to-point links; because of this, there is no reason to support any broadcast link types, nor to support the Designated Intermediate System processing, including pseudonode creation. In light ot his, processing related to sections 7.2.3 (broadcast networks), 7.3.8 (generation of level 1 pseudonode LSPs), 7.3.10 (generation of level 2 pseudonode LSPs), and section 8.4.5 (LAN designated intermediate systems) in [ISO10589] SHOULD BE removed.</t>

</section> <!-- end of point-to-point adjacencies -->

<!-- 2 -->
<section title="Three Way Handshake Support" toc="default">

<t>It is important that two way connectivity be established before synchronizing the link state database, or routing through a link in a data center fabric. To reject optical failures that cause a one way connection between two routers, fabricDC must support the three way handshake mechanism described in <xref target="RFC5303" />.</t>

</section> <!-- end of point-to-point adjacencies -->

<!-- 2 -->
<section title="Adjacency Formation Optimization" toc="default">

<t>While adjacency formation is not considered particularly burdensome in IS-IS, it is still useful to reduce the amount of state transferred across the network when connecting a new IS to the fabric. Any such optimization is bound to present a tradeoff between several factors; the mechanism described here increases the amount of time required to form adjacencies slightly in order to reduce the total state carried across the network. The process is:</t>

<t>
<list style="symbols">
<t>An IS connected to the fabric will send hellos on all links.</t>
<t>The IS will only complete the three-way handshake with one newly discovered neighbor; this would normally be the first neighbor which sends the newly connected intermediate system's ID back in the three-way handshake process.</t>
<t>The IS will complete its database exchange with this one newly adjacent neighbor.</t>
<t>Once this process is completed, the IS will continue processing the remaining neighbors as normal.</t>
</list>
</t>

<t>This process allows each IS newly added to the fabric to exchange a full table once; a very minimal amount of information will be transferred with the remaining neighbors to reach full synchronization.</t>

</section> <!-- end of adjacency formation optimization -->

</section> <!-- end of modified adjacency formation -->

<!-- 1 -->
<section title="Advertisement of Reachability Information" toc="default">

<t>IS-IS describes the topology in two different sets of TLVs; the first describes the set of neighbors connected to an IS, the second describes the set of reachable destination connected to an IS. There are two different forms of both of these descriptions, one of which carries what are widely called narrow metrics, the other of which carries what are widely called wide metrics. In a tightly controlled data center fabric implementation, such as the ones Openfabric is designed to support, no IS that supports narrow metrics will ever be deployed or supported; hence there is no reason to support any metric type other than wide metrics.</t> 

<t>
<list style="symbols">
<t>The Level 2 Link State PDU (type 20 in section 9.9 of [ISO10589]) and the scoped flooding PDU (type 10 in section 3.1 of <xref target="RFC7356" />) SHOULD BE the only PDU types used to carry link state information in a Openfabric implementation</t>
<t>Processing related to the Level 1 Link State PDU (type 18) MAY BE removed from Openfabric implementations (see section 9.8 of [ISO10589])</t>
<t>Neighbor reachability MUST BE carried in TLV type 22 (see section 3 of <xref target="RFC5305" />)</t>
<t>IPv4 reachability SHOULD BE carried in TLV type 135 (see section 4 of <xref target="RFC5305" />), or TLV type 235 for multitopology implementations (see <xref target="RFC5120" />)</t>
<t>IPv6 reachability SHOULD BE carried in TLV type 236 (see <xref target="RFC5308" />), or TLV type 237 for multitopology implemenations (see <xref target="RFC5120" />)</t>
<t>Processing related to the neighbor reachability TLV (type 2, see sections 9.8 and 9.9 of [ISO10589]) SHOULD BE removed</t>
<t>Processing related to the narrow metric IP reachability TLV (types 128 and 130) SHOULD BE removed</t>
</list>
</t>

<t>In order to support segment routing, Openfabric needs to be able to support the advertisement of a Prefix-SID tied to a local loopback address assigned to the IS. The configuration of the label to advertise MAY BE manually configured for the moment or determined through autoconfiguration. A Prefix-SID SHOULD BE advertised if a local label is configured using the Prefix Segment Identifier sub-TLV (see section 2.1 of <xref target="I-D.ietf-isis-segment-routing-extensions" />).</t>

</section> <!-- end of modified adjacency formation -->

<!-- 1 -->
<section title="Determining and Advertising Location on the Fabric" toc="default">

<t>The tier to which a IS is connected is useful to enable autoconfiguration of intermediate systems connected to the fabric and to reduce flooding. Once the tier of an intermediate system within the fabric has been determined, it MUST be advertised using the 4 bit Tier field described in section 3.3 of <xref target="I-D.shen-isis-spine-leaf-ext" />. This section describes two mechanisms for determining the tier at which a IS is connected in the fabric in several steps.</t>

<!-- 2 -->
<section title="Calcuating Tier Number with a Fixed T0" toc="default">

<t>The first method begins with one of the T0 intermediate systems advertising its location in the fabric. This information can either be obtained through:</t>

<t>
<list style="symbols">
<t>A single T0 intermediate system is manually configured to advertise 0x00 in their IS reachability tier sub-TLV, indicating they are at the edge of the fabric (a ToR IS).</t>
<t>The T0 intermediate systems detect they are T0 through the presence connected hosts (i.e. through a request for address assignment or some other means). If such detection is used, and the IS determines it is located at T0, it should advertise 0x00 in its IS reachability tier sub-TLV.</t>
</list>
</t>

<t>The second method above SHOULD be used with care, as it may not be secure, and it may not work in all data center environments. For instance, if a host is mistakenly (or intentionally, as a form of attack) attached to a spine IS, or a request for address assignment is transmitted to a spine IS during the bootup phase of the device or fabric, it is possible to cause a spine IS to advertise itself as a T0. Unless the autodetection of the T0 devices is secured, the manual mechanism SHOULD BE used (configuring at least one T0 device manually).</t> 

<t> Given at least one T0 device is advertising its tier number, the remaining intermediate systems calculate their tier number as follows:</t>

<t>
<list style="symbols">
<t>The local IS calculates an SPT (using SPF) setting the cost of every link to 1; this effectively calculates a topology only view of the network, without considering any configured link costs</t>
<t>Find the closest IS advertising a tier number of 0 in the Spine Leaf extension sub-TLV; call this node A, and set FD to this cost</t>
<t>Calculate an SPT (using SPF) from the perspective of A (above), and setting the cost of every link to 1; the maximum cost to any node should be 2 for a 3 stage fabric, 4 for a 5 stage fabric, etc.</t>
<t>Choose any node that is a maximum metric from A (above); call this IS B</t>
<t>Find the cost to B on the locally calculated SPT from the first step; call this TD</t>
<t>Calculate the tier number of the local node by subtracting FD from TD</t>
</list>
</t>

<t>In the example network, assume 5A is manually configured as a T0, and is advertising its tier number. From here:</t>

<t>
<list style="symbols">
<t>From 1A the path to 5A is 4 hops; this is FD</t>
<t>Run SPF from the perspective of 5A with all link metrics set to 1</t>
<t>The maximum path length is 4; 1F is one such node; set this node to B, and set TD to 4</t>
<t>TD - FD is 0 at 1A, so 1A is T0, or a ToR</t>
</list>
</t>

<t>This process will work for any spine and leaf fabric without "cross links."</t>

</section> <!-- end of defined T0 -->

<!-- 2 -->
<section title="Calculating the Tier Number in a Five Stage Spine and Leaf" toc="default">

<t>In some fabrics, it is possible to calculate which intermediate systems are at T0 using a modified Shortest Path First (SPF) calculation. Specifically, if the fabric is configured in five stages, as shown in the example network, and is not some form of butterfly, Benes, or a three stage fabric, it is possible to calcualte if an IS is at T0 using the following process:</t>

<t>
<list style="symbols">
<t>Calculate a Shortest Path Tree (SPT) for the entire network with all link metrics set to 1; this has the effect of calculating a tree based only on hop count</t>
<t>Find one node that is the farthest from the local node in the resulting tree; call this node F, and the distance to this node FD</t>
<t>Calculate an SPT for the entire network with all link metrics set to 1 from the perspective of F; call this TD</t>
</list>
</t>

<t>If FD == TD, and TD >= 4, this is a greater than three stage fabric; the local device SHOULD advertise 0x00 in its IS reachability tier sub-TLV. For instance, in the diagram above, 1A would:</t>

<t>
<list style="symbols">
<t>Calculate an SPT with all link metrics set to 1; on this SPT, 5A through 5F would all have a distance of 4</t>
<t>Select one of these nodes as F; assume 5F is chosen as F</t>
<t>Set FD to 4, the distance to 5F</t>
<t>Run SPF from the perspective of 5F with all link metrics set to 1</t>
<t>Set TD to 4, the cost from 5F to 1A</t>
<t>TD - FD == 0, so 1A is at T0, and is a ToR</t>
</list>
</t>

<t>For the remaining intermediate systems to determine which tier they are situated on, they perform the following calculation:</t>

<t>
<list style="symbols">
<t>Calculate a Shortest Path Tree (SPT) for the entire network with all link metrics set to 1; this has the effect of calculating a tree based only on hop count</t>
<t>Find one node that is the farthest from the local node in the resulting tree; call this node F, and the distance to this node FD</t>
<t>Calculate an SPT for the entire network with all link metrics set to 1 from the perspective of F; call this TD</t>
</list>
</t>

<t>The IS SHOULD advertise (TD - FD) in its IS reachability tier sub-TLV.</t>

<t>For example, in the above five stage fabric, 3B would:</t>

<t>
<list style="symbols">
<t>Calculate an SPT with all link metrics set to 1; on this SPT, 5A through 5F and 1A through 1F would all have a cost of 2</t>
<t>Select one of these nodes as F; assume 5F is chosen as F</t>
<t>Set FD to 2, the distance to 5F</t>
<t>Run SPF from the perspective of 5F with all link metrics set to 1</t>
<t>Set TD to 4, the cost from 5F to 1A</t>
<t>TD - FD == 2, so 1A is at T2, and is a spine switch</t>
</list>
</t>

</section> <!-- end of calcuated T0 -->
</section> <!-- end of fabric location calculation -->

<!-- 1 -->
<section title="Flooding Optimization" toc="default">

<t>Flooding is perhaps the most challenging scaling issue for a link state protocol running on a dense, large scale fabric. To reduce the flooding of link state information in the form of Link State Protocol Data Units (LSPs), Openfabric takes advantage of information already available in the link state protocol, the list of the local intermediate system's neighbor's neighbors, and the fabric locality computed above. The following tables are required to compute a set of reflooders:</t>

<t>
<list style="symbols">
<t>Neighbor List (NL) list: The set of neighbors</t>
<t>Neighbor's Neighbors (NN) list: The set of neighbor's neighbors; this can be calculated by running SPF truncated to two hops</t>
<t>Do Not Reflood (DNR) list: The set of neighbors who should have LSPs (or fragments) who should not reflood LSPs</t>
<t>Reflood (RF) list: The set of neighbors who should flood LSPs (or fragments) to their adjacent neighbors to ensure synchronization</t>
</list>
</t>

<t>NL is set to contain all neighbors, and sorted deterministically (for instance, from the highest IS identifier to the lowest). All intermediate systems within a single fabric SHOULD use the same mechanism for sorting the NL list. NN is set to contain all neighbor's neighbors, or all intermediate systems that are two hops away, as determined by performing a truncated SPF. The DNR and RF tables are initially empty. To begin, the following steps are taken to reduce the size of NN and NL:</t>

<t>
<list style="symbols">
<t>Move any IS in NL with its tier (or fabric location) set to T0 to DNR</t>
<t>Remove all intermediate systems from NL and NN that in the shortest path to the IS that originated the LSP</t>
</list>
</t>

<t>Then, for every IS in NL:</t>

<t>
<list style="symbols">
<t>If the current entry in NL is connected to any entries in NN:
<list style="symbols">
<t>Move the IS  to RF</t>
<t>Remove the intermediate systems connected to the IS from NN</t>
</list></t>
<t>Else move the IS to DNR</t>
</list>
</t>

<t>When flooding, LSPs transmitted to adjacent neighbors on the RF list will be transmitted normally. Adjacent intermediate systems on this list will reflood received LSPs into the next stage of the topology, ensuring database synchronization. LSPs transmitted to adjacent neighbors on the DNR list, however, MUST be transmitted using a circuit scope PDU as described in <xref target="RFC7356" />.</t>

<!-- 2 -->
<section title="Flooding Failures" toc="default">

<t>It is possible in some failure modes for flooding to be incomplete because of the flooding optimizations outlined. Specifically, if a reflooder fails, or is somehow disconnected from all the links across which it should be reflooding, it is possible an LSP is only partially flooded through the fabric. To prevent such situations, any IS receiving an LSP transmitted using DNR SHOULD:</t>

<t>
<list style="symbols">
<t>Set a short timer; the default should be less than one second</t>
<t>When the timer expires, send a Complete Sequence Number Packet (CSNP) to all neighbors</t>
<t>Process any Partial Sequence Number Packets (PSNPs) as required to resynchronize</t>
<t>If a resynchronization is required, notify the network operator through a network management system</t>
</list>
</t>

</section> <!-- end of flooding failures -->

</section> <!-- end of flooding optimization -->

<!-- 1 -->
<section title="Other Optimizations" toc="default">

<!-- 2 -->
<section title="Transit Link Reachability" toc="default">

<t>In order to reduce the amount of control plane state carried on large scale spine and leaf fabrics, openfabric implementations SHOULD NOT advertise reachability for transit links. These links MAY remain unnumbered, as IS-IS does not require layer 3 IP addresses to operate. Each IS SHOULD be configured with a single loopback address, which is assigned an IPv6 address, to provide reachability to intermediate systems which make up the fabric.</t>

</section> <!-- end of transit link reachability section -->

<!-- 2 -->
<section title="Transiting T0 Intermediate Systems" toc="default">

<t>In data center fabrics, ToR intermediate systems SHOULD NOT be used to transit between two T1 (or above) spine intermediate systems. The simplest way to prevent this is to set the <xref target="RFC3277">overload bit</xref> for all the LSPs originated from T0 intermediate systems. However, this solution would have the unfortunate side effect of causing all reachability beyond any T0 IS to have the same metric, and many implementations treat a set overload bit as a metric of 0xFFFF in calculating the Shortest Path Tree (SPT). This document proposes an alternate solution which preserves the leaf node metric, while still avoiding transiting T0 intermediate systems.</t>

<t>Specifically, all T0 intermediate systems SHOULD advertise their metric to reach any T1 adjacent neighbor with a cost of 0XFFE. T1 intermediate systems, on the other hand, will advertise T0 intermediate systems with the actual interface cost used to reach the T0 IS. Hence, links connecting T0 and T1 intermediate systems will be advertised with an asymmetric cost that discourages transiting T0 intermediate systems, while leaving reachability to the destinations attached to T0 devices the same.</t>

</section> <!-- end of metric modifications for T0 -->
</section> <!-- end of section on other optimizations -->

<!-- 1 -->
<section title="Openfabric and Route Aggregation" toc="default">

<t>While schemes may be designed so reachability information can be aggregated in Openfabric deployments, this is not a recommended configuraiton.</t>

</section> <!-- end of aggregation section -->

<!-- 1 -->
<section title="Security Considerations" toc="default">

<t>This document outlines modifications to the IS-IS protocol for operation on large scale data center fabrics. While it does add new TLVs, and some local processing changes, it does not add any new security vulnerabilities to the operation of IS-IS. However, openfabric implementations SHOULD implement IS-IS cryptographic authentication, as described in <xref target="RFC5304" />, and should enable other security measures in accordance with best common practices for the IS-IS protocol.</t>

<t>If T0 intermediate systems are auto-detected using information outside Openfabric, it is possible to attack the calucations used for flooding reduction and auto-configuration of intermediate systems. For instance, if a request for an address pool is used as an indicator of an attached host, and hence receiving such a request causes an intermediate system to advertise itself as T0, it is possible for an attacker (or a simple mistake) to cause auto-configuration to fail. Any such auto-detection mechanims SHOULD BE secured using appropriate techniques, as described by any protocols or mechanisms used.</t>

</section> <!-- end of security considerations -->

</middle>

<back>

<references title="Normative References">

&RFC2119;
&RFC2629;
&RFC5120;
&RFC5301;
&RFC5303;
&RFC5305;
&RFC5308;
&RFC5309;
&RFC5311;
&RFC5316;
&RFC7356;
&RFC7981;

<?rfc include="reference.I-D.shen-isis-spine-leaf-ext.xml"?>

<reference anchor="ISO10589">
  <front>
    <title>Intermediate system to Intermediate system intra-domain
           routeing information exchange protocol for use in conjunction with
           the protocol for providing the connectionless-mode Network Service
           (ISO 8473)</title>

    <author>
      <organization abbrev="ISO">International Organization for Standardization</organization>
    </author>

    <date month="Nov" year="2002"/>
  </front>

  <seriesInfo name="ISO/IEC" value="10589:2002, Second Edition"/>
</reference>

</references> <!-- end of normative references -->

<references title="Informative References">

&RFC3277;
&RFC3719;
&RFC4271;
&RFC5304;
&RFC5440;
&RFC5449;
&RFC5614;
&RFC6232;
&RFC7182;
&RFC7921;

<?rfc include="reference.I-D.ietf-spring-segment-routing.xml"?>
<?rfc include="reference.I-D.ietf-isis-segment-routing-extensions.xml"?>

</references> <!-- end of informative references -->

</back>

</rfc>
