<?xml version="1.0" encoding="US-ASCII"?>
<?xml-stylesheet type='text/xsl'
href='http://xml.resource.org/authoring/rfc2629.xslt' ?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc1981 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.1981.xml">
<!ENTITY rfc2119 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY rfc2131 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2131.xml">
<!ENTITY rfc2132 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2132.xml">
<!ENTITY rfc3775 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3775.xml">
<!ENTITY DSMIP PUBLIC "" "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-mext-nemo-v4traversal.xml">
<!ENTITY HIOPT PUBLIC "" "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-mip6-hiopt.xml">
]> 	
     
<?rfc iprnotified="yes" ?>
<?rfc toc="yes" ?>
<?rfc symrefs="yes" ?>
<rfc category="bcp" ipr="trust200902" docName='draft-ietf-nvo3-vmm-00.txt'> 


<front> 

    
    <title abbrev="VM Mobility Solution"> 
    Virtual Machine Mobility Protocol for L2 and L3 Overlay Networks
    </title>
    

   
    <author initials="B.S." surname="Sarikaya" fullname="Behcet Sarikaya">
    <organization>Huawei USA</organization>
    <address>
    <postal>
    <street>5340 Legacy Dr. Building 3</street>
    <street></street>
    <city>Plano</city> <region>TX</region> <code>75024</code>
    </postal>
    
    <email>sarikaya@ieee.org</email>
    </address>
    </author>   
    
        <author initials="L.D." surname="Dunbar" fullname="Linda Dunbar">
    <organization>Huawei USA</organization>
    <address>
    <postal>
    <street>5340 Legacy Dr. Building 3</street>
    <street></street>
    <city>Plano</city> <region>TX</region> <code>75024</code>
    </postal>
    
    <email>linda.dunbar@huawei.com</email>
    </address>
    </author>   

	     <author initials="B.K." surname="Khasnabish" fullname="Bhumip Khasnabish">
    <organization>ZTE (TX) Inc.</organization>
    <address>
    <postal>
    <street>55 Madison Avenue, Suite 160</street>
    <street></street>
    <city>Morristown</city> <region>NJ</region> <code>07960</code>
    </postal>
    
    <email>vumip1@gmail.com, bhumip.khasnabish@ztetx.com</email>
    </address>
    </author>   

       
		    <author fullname="Tom Herbert" initials="T.H." surname="Herbert">
      <organization>Quantonium</organization>

      <address>
        <postal>
          <street></street>

          <street></street>

          <city></city>

          <region></region>

          <code></code>
        </postal>

        <email>tom@herbertland.com</email>
      </address>
    </author>
    
          
		    <author fullname="Saumya Dikshit" initials="S.D." surname="Dikshit">
      <organization>Cisco Systems</organization>

      <address>
        <postal>
          <street>Cessna Business Park</street>

          <street></street>

          <city>Bangalore, Karnataka</city>

          <region>India</region>

          <code>560 087</code>
        </postal>

        <email>sadikshi@cisco.com</email>
      </address>
    </author>
    

   <date  year="2017"/>

   <workgroup></workgroup>

   <abstract>
   <t> 
	This document describes  a  virtual machine mobility protocol commonly used in  data centers built with overlay-based network virtualization approach. For layer 2, it is  based on using a Network Virtualization Authority (NVA)-Network Virtualization Edge (NVE) protocol to update Address Resolution Protocol (ARP) table or neighbor cache entries at the NVA and the source NVEs tunneling in-flight packets to the destination NVE after the virtual machine moves from source NVE to the destination NVE. For Layer 3, it is based on address and connection migration after the move.
	
   </t>
   </abstract>

</front>

<middle>

  
  <?rfc compact="yes" ?>  
  
   <section title='Introduction'>
   
      
   	<t>
    	Data center networks are being increasingly used by telecom operators as well as by
    	enterprises. In this document we are interested in overlay-based data center networks supporting multitenancy. These networks are organized as one large Layer 2 network geographically distributed in several buildings. 

In some cases geographical distribution can span across Layer 2 boundaries. In that case need arises for connectivity between Layer 2 boundaries which can be achieved by the network virtualization edge (NVE)  functioning as Layer 3 gateway routing across bridging domain such as in Warehouse Scale Computers (WSC).



   	</t>
   	<t>
   	
   	
   	  	</t> <t>
   	  	Virtualization which is being used in almost all of today's data centers enables many virtual machines to run on a single physical computer or compute server. Virtual machines (VM) need hypervisor running on the physical compute server to provide them shared processor/memory/storage.  Network connectivity is provided by the network virtualization edge (NVE) <xref target="I-D.ietf-nvo3-arch"/>, <xref target="I-D.ietf-nvo3-nve-nva-cp-req"/>. Being able to move VMs dynamically, or live migration, from one server to another allows for dynamic load balancing or work distribution and thus it is a highly desirable feature <xref target="RFC7364"/>. 
   	  	</t>
   	  	
   	  	<t>
   	  	 There are many challenges and requirements related to
   migration, mobility, and interconnection of Virtual Machines (VMs)and
   Virtual Network Elements (VNEs). Retaining IP addresses after a move is a key requirement <xref target="RFC7364"/>. Such a requirement is needed in order to maintain existing transport connections.
   	  	</t>
		<t>
		In L3 based data networks, retaining IP addresses after a move is simply not possible. This introduces complexity in IP address management and as a result transport connections need to be reestablished.
		</t>
   <t>
  	In view of many virtual machine mobility schemes that exist today, there is a desire to define a standard control plane protocol for virtual machine mobility. The protocol should be based on IPv4 or IPv6. In this document we specify such a protocol for Layer 2 and Layer 3 data networks.
    
   </t>
 		<t>
      
		</t>
        <t>
   	
   	</t>
      	<t>
   	
   	</t>
 
   	<t>
   	
   	</t>

   	
   	
  </section> 
	<?rfc compact="yes" ?>
   <section title='Conventions and Terminology'>
     
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119">RFC 2119</xref> and <xref target="I-D.ietf-nvo3-arch"/>.</t>
   
   <t>This document uses the terminology defined in   <xref target="RFC7364"/>. In addition we make the following definitions: 
   </t>
   <t>
   Tasks.
   Tasks are the generalization of virtual machines. Tasks in containers that can be migrated correspond to the virtual machines that can be migrated. We use task and virtual machine interchangeably in this document.
   </t>
   <t>
   
   </t>
   <t>
   Hot VM Mobility.
   A given VM could be moved from one server to another in running state.
   </t> 
   <t>
   Warm VM Mobility.
   In case of warm VM mobility, the VM states are mirrored to the secondary server (or domain) at a predefined (configurable) regular intervals. This reduces the overheads and complexity but this may also lead to a situation when both servers may not contain the exact same data (state information)
   </t>
   
   <t>
   Cold VM Mobility.
   A given VM could be moved from one server to another in stopped or
   suspended state.
   </t>
   <t>
   Source NVE refers to the old NVE where packets were forwarded to before migration.
   </t>
   	<t>
   Destination NVE refers to the new NVE after migration.


   	</t>
   

    
   </section>
 
 	 
   
   
  <?rfc compact="yes" ?>    
 
    <section anchor="reqs" title="Requirements">
 		 <t>
  		This section states requirements on data center network virtual machine mobility.
 		 </t>
 		 
  		 <t>
  		 Data center network SHOULD support virtual machine mobility in IPv6. 
 		 </t>
 		 <t>
 		 IPv4 SHOULD also be supported in virtual machine mobility.
 		 </t>
 		 
 		 		
 		 		 <t>
 		 		
  
 		 </t>
 		 		 <t>
 		 		
 		 </t>
 		  <t>
  		Virtual machine mobility protocol MAY support host routes to accomplish virtualization. 
  		</t>
  		<t>Virtual machine mobility protocol SHOULD not support triangular routing except for handling packets in flight. 
  		</t>
  		<t>
  		Virtual machine mobility protocol SHOULD not need to use tunneling except for handling packets in flight.
 		 </t>
 		 		 <t>
  		
 		 </t>
 		 		 <t>
 		 		 
  
 		 </t>
 		 		 <t>
 		 		
 		 </t>
  
  
  </section>
   <?rfc compact="yes" ?>  
   <section title='Overview of the protocol'>
   		<t>
   		Layer 2 and Layer 3 protocols are described next. In the following sections, we examine more advanced features.
  		 </t>
   <section title="VM Migration">
   <t>
    				Being able to move Virtual Machines dynamically,  from one server to another allows for dynamic load balancing or work distribution and thus it is a highly desirable feature. In  a Layer-2 based data center approach, virtual machine moving to another server does not change its IP address. Because of this an IP based virtual machine mobility protocol is not needed. However, when a virtual machine moves, NVEs need to change their caches associating VM Layer 2 or Medium Access Control (MAC) address with NVE's IP address. Such a change enables NVE to send outgoing MAC frames addressed to the virtual machine.   VM movement across
 Layer 3 boundaries  is not typical but the same solution applies if the VM moves in the same link such as in WSCs.

    				</t>
    				
   <t> 
   Virtual machine moves from its source  NVE to a new, destination NVE. The move is initiated by the source NVE and is in the same L2 link, the virtual machine IP address(es) do not change but this virtual machine is now under a new NVE, previously communicating NVEs will continue to send their packets to the source NVE. Address Resolution Protocol (ARP) cache in IPv4 <xref target="RFC0826"/> or neighbor cache in IPv6 <xref target="RFC4861"/> in the NVEs need to be updated.
     </t>
     		<t>
     		 It takes a few seconds for a VM to move from its source
   NVE to the new destination one.  During this period, a tunnel is needed
   so that source NVE forwards packets to the destination NVE.
     		</t>
  	
  	<t>
  	In IPv4, the virtual machine immediately after the move sends a gratuitous ARP request message containing its IPv4 and Layer 2 or MAC address in its new NVE, destination NVE. This message's destination address is  the broadcast address. 
  	
  	NVE receives this message. NVE should update VM's ARP entry in the central directory at the NVA. NVE asks  NVA to update its mappings to record IPv4 address of VM
   along with MAC address of VM, and NVE IPv4 address.
  	An NVE-to-NVA protocol is used for this purpose <xref target="I-D.ietf-nvo3-arch"/>.  
  	</t> 
  	<t>
  	Reverse ARP (RARP) which enables the host to discover its IPv4 address when it boots from a local server <xref target="RFC0903"/> is not used by VMs because the VM already knows its IPv4 address. IPv4/v6 address is assigned to a newly created VM, possibly using Dynamic Host Configuration Protocol (DHCP). 
  	There are some vendor deployments (diskless systems or systems without configuration
 files) wherein VM users, i.e. end-user clients ask for the same MAC address upon
 migration. This can be achieved by the clients sending RARP request reverse message which carries 
  the old MAC address looking for
 an IP address allocation. The server, in this case the new NVE needs to communicate with NVA, just like in the gratuitous ARP case to ensure that the same IPv4 address is assigned to the VM. NVA uses the MAC address as the key in the search of ARP cache to find the IP address and informs this to the new NVE which in turns sends RARP reply reverse message. This completes IP address assignment to the migrating VM. 
  	</t>
  	<t>
  	
  	 All  NVEs communicating with this virtual machine uses the old ARP entry. If any VM  in those NVEs  need to talk to the new VM in the destination NVE, it uses the old ARP entry.
  	 Thus the packets are delivered to the source NVE. The source NVE MUST    tunnel these in-flight packets to the destination NVE.
  	  
  	</t>
  	<t>
  	When an ARP entry in those VMs    times out,  their corresponding NVEs  should access the NVA for an update.
  	</t>
  	
		<t>
		IPv6 operation is slightly different:
		</t>
  		<t>
  		In IPv6, the virtual machine immediately after the move sends an unsolicited neighbor advertisement message containing its IPv6 address and Layer-2 MAC address in its new NVE, the destination NVE. This message is sent to the IPv6 Solicited Node Multicast Address corresponding to the target address which is VM's IPv6 address. 
  		
  		NVE receives this message. NVE should update VM's neighbor cache entry in the central directory at the NVA.
  		
  		IPv6 address of VM, MAC address of VM and NVE IPv6 address are recorded to the entry. 
  		An NVE-to-NVA protocol is used for this purpose <xref target="I-D.ietf-nvo3-arch"/>.
  		
  		
  		</t>
  		<t>
  		All  NVEs communicating with this virtual machine uses the old neighbor cache  entry. If any VM  in those NVEs  need to talk to the new VM in the destination NVE, it uses the old neighbor cache entry.
  	 Thus the packets are delivered to the source NVE. The source NVE MUST    tunnel these in-flight packets to the destination NVE.
  	  
  	</t>
  	<t>
  	When a neighbor cache entry in those VMs    times out,  their corresponding NVEs  should access the NVA for an update.
  	</t>
  	</section>
  	<section title="Task Migration">
  		<t>
  		Virtualization in L2 based data center networks becomes quickly prohibitive because ARP/neighbor
 caches don't scale. Scaling can be accomplished seamlessly in L3 data center networks by  just giving each virtual network an IP
 subnet and a default route that points to NVE. This means no explosion
 of ARP/ neighbor cache in guests (just one ARP/ neighbor cache entry for default route) and we
 do not need to have Ethernet header in encapsulation <xref target="RFC7348"/> which saves at least 16
 bytes.  
   </t>		
   		<t>
   		In L3 based data center networks, since IP address of the task has to change after move, an IP based task migration protocol is  needed. The protocol mostly used is the identifier locator addressing or ILA <xref target="I-D.herbert-nvo3-ila"/>.  Address and connection migration introduce complications in task migration protocol as we discuss below. Especially informing the communicating hosts of the migration becomes a major issue.
   		Also, in L3 based networks, because    broadcasting is not available, 
  		multicast of neighbor solicitations in IPv6 would
 need to be emulated.
   		</t>
   		<t>
   		
   		</t>
   		<t>
   		Task migration involves the following steps:
   		</t>
   		<t>
   		 Stop running the task.
   		 </t>
   		 <t>
      		Package the runtime state of the job. 
         		</t>
         		<t>
       Send the runtime state of the task to the destination NVE where the
         task is to run.
         </t>
         <t>
       Instantiate the task's state on the new machine.
       </t>
       <t>
       Start the tasks for the task continuing from the point at which
         it was stopped.
   		</t>
   		<t>
   		Address migration and connection migration in moving tasks are addressed next. 
   		</t>
   		<section anchor="amcm" title="Address and Connection Migration in Task Migration">
    				<t>
		  Address migration is achieved as follows:
		</t>
    				<t>
		Configure IPv4/v6 address on the target host.
				</t>
				<t>
       Suspend use of the address on the old host. This includes
         handling established connections. A state
         may be established to drop packets or send ICMPv4 or ICMPv6 destination  unreachable message when packets to the migrated address are received.
		</t>
				<t>
			  Push the
         new mapping to hosts. Communicating hosts will learn of the new mapping via a control
         plane either by participation in a protocol for mapping
         propagation or by getting the new mapping from a central database such as Domain Name System (DNS). 
		</t>
				<t>
			Connection migration involves reestablishing existing TCP connections of the task in the new place. 
		</t>
			<t>
		The simplest course of action is to drop TCP connections across a
   migration. Since migrations should be relatively rare events, it is
   conceivable that TCP connections could be automatically closed in the
   network stack during a migration event. If the applications running
   are known to handle this gracefully (i.e. reopen dropped connections)
   then this may be viable.
		</t>
		<t>
		More involved approach to connection migration entails pausing the connection, packaging
   connection state and sending to target, instantiating connection
   state in the peer stack, and restarting the connection. From the time
   the connection is paused to the time it is running again in the new
   stack, packets received for the connection should be silently
   dropped. For some period of time, the old stack will need to keep a
   record of the migrated connection. If it receives a packet, it should
   either silently drop the packet or forward it to the new location, similarly as in <xref target="flight"/>.
		</t>
    		</section>
   	</section>
   
   </section>
 


	<section anchor="flight" title="Handling Packets in Flight">
   				  		<t>
   		Source hypervisor may receive packets from the virtual machine's ongoing communications and these packets should not be lost and they should be sent to the destination hypervisor to be delivered to the virtual machine. The steps involved in handling packets in flight are as follows:
   		</t>
   		
   		 <t><list style="hanging">
   		 <t hangText="Preparation Step">
   		
   		  It takes some time, possibly a few seconds for a VM to move from its source
   hypervisor to a new destination one.  During this period, a tunnel needs to be established
   so that the source NVE forwards packets to the destination NVE.
   
			
			</t>

  		<t hangText="Tunnel Establishment - IPv6">
        	
        	Inflight packets are tunneled to the destination NVE using the encapsulation protocol such as VXLAN in IPv6. Source NVE gets destination NVE address from NVA  in the request to move the virtual machine.
   
   			 </t>
        			<t hangText="Tunnel Establishment - IPv4">
        	
        		Inflight packets are tunneled to the destination NVE using the encapsulation protocol such as VXLAN in IPv4. Source NVE gets destination NVE address from NVA when NVA requests NVE to move the virtual machine. 
   
   				 </t>
        					<t hangText="Tunneling Packets - IPv6">
        	
        	IPv6 packets are received for the migrating virtual machine  encapsulated in an IPv6 header at the source NVE. Destination NVE decapsulates the packet and sends IPv6 packet to the migrating VM.
        	 </t>
        	
        	<t hangText="Tunneling Packets - IPv4">
        	 
   		  		
   				IPv4 packets are received for the migrating virtual machine  encapsulated in an IPv4 header at the source NVE. Destination NVE decapsulates the packet and sends IPv4 packet to the migrating VM.
   				        	  </t>
   				        	<t hangText="Stop Tunneling Packets">
        	
        	When source NVE stops receiving packets destined to the virtual machine that has just moved to the destination NVE.	
   					 </t>
   </list></t>
   		
   			<t>
   		
   		
   			</t>
   		
   		</section>

	
		
    		 <?rfc compact="yes" ?>
    		<section anchor="tunnel" title="Moving Local State of VM">
    			<t>
			
			</t>
				<t>
    	  	After VM mobility related signaling (VM Mobility Registration Request/Reply), the virtual machine state needs to be transferred to the destination Hypervisor. The state includes its memory and file system. Source NVE opens a TCP connection with destination NVE over which VM's memory state is transferred.
    		</t>
    	
    			<t>
			File system or local storage  is more complicated to transfer. The transfer should ensure consistency, i.e. the VM at the destination should find the same file system it had at the source. Precopying is a commonly used technique for transferring the file system. First the whole disk image is transferred while VM continues to run. After the VM is moved any changes in the file system are packaged together and sent to the destination Hypervisor which reflects these changes to the file system locally at the destination.
			</t>

		
    		</section>
    		
    		 <?rfc compact="yes" ?>
    		<section anchor="hotcold" title="Handling of Hot, Warm and Cold Virtual Machine Mobility">
    		
    				<t>
    				Cold Virtual Machine mobility is facilitated by the VM initially sending an ARP or Neighbor Discovery message at the destination NVE but the source NVE not receiving any packets inflight.   
		  Cold VM mobility also allows all previous source NVEs and all communicating NVEs to time out ARP/neighbor cache entries of the VM and then get NVA to push to NVEs or get NVEs to pull the updated ARP/neighbor cache entry from NVA.
		  </t>
		  <t>
		  The VMs that are used for cold standby receive scheduled backup information but less frequently than that would be for warm standby option. Therefore, the cold mobility option can be used for non-critical applications and services.   
			</t>
			<t>
In cases of warm standby option, the backup VMs receive backup information at regular intervals. The duration of the interval determines the warmth of the standby option. The larger the duration, the less warm (and hence cold) the standby option becomes.
			</t>
			<t>
In case of hot standby option, the VMs in both primary and secondary domains have identical information and can provide services simultaneously as in load-share mode of operation. If the VMs in the primary domain fails, there is no need to actively move the VMs to the secondary domain because the VMs in the secondary domain already contains identical information. The hot standby option is the  most costly mechanism for providing redundancy, and hence this option is utilized only for mission-critical applications and services.        

		</t>
		
    		</section>
    		
    		
    		 <?rfc compact="yes" ?>
    		<section anchor="VMoper" title="Virtual Machine Operation">
    		<t>
    		Virtual machines are not involved in any mobility signalling. Once VM moves to the destination NVE, VM IP address does not change and VM should be able to continue to receive packets to its  address(es). This happens in hot VM mobility scenarios. 
    		</t>
    			<t>
    		Virtual machine sends a gratuitous Address Resolution Protocol or  unsolicited Neighbor Advertisement  message upstream after each move.
    		</t>
    		
    		<section anchor="bumip" title="Virtual Machine Lifecycle Management">
    		<t>
    		Managing the lifecycle of VM includes creating a VM with all of the required resources, and managing them seamlessly as the VM migrates from one service to another during its lifetime. The on-boarding process includes the following steps:
    		</t>
    		   <t><list style="numbers">
      		<t>
      		Sending an allowed (authorized/authenticated) request to Network Virtualization
   Authority (NVA) in an acceptable format with mandatory/optional virtualized resources {cpu, memory, storage, process/thread support, etc.} and interface information
      		
      		</t>
      		
      		<t>
      		
      		Receiving an acknowledgement from the NVA regarding availability and usability of virtualized resources and interface package 
      		</t> 
      		
      		<t>
      		Sending a confirmation message to the  NVA with request for approval to adapt/adjust/modify the virtualized resources and interface package  for utilization in a service.
      		</t>
      
    		
    		   </list> </t>
    		</section>
    		
    		</section>
    		
		
		

  <?rfc compact="yes" ?>
  
  <section title='Security Considerations'>
  <t>  Security threats for the data and control plane are discussed in <xref target="I-D.ietf-nvo3-arch"/>. There are several issues
 in a multi-tenant environment that create problems. In L2 based data center networks, lack of security in VXLAN, corruption of VNI can lead to
 delivery to wrong tenant. Also, ARP in IPv4 and ND in IPv6 are not secure especially
 if we accept gratuitous versions. When these are done over a UDP
 encapsulation, like VXLAN, the problem is worse since it is trivial
 for a non trusted application to spoof UDP packets. 
 </t>
 
 		<t>
 		In L3 based data center networks, the problem of address spoofing may arise. As a result the destinations may contain untrusted hosts. This usually happens in cases like the virtual machines running third part applications. This requires the usage of stronger security mechanisms. 
 		
 		</t>
    </section> 
   
    	<?rfc compact="yes" ?>
	<section anchor='iana' title="IANA Considerations">
	<t>
	This document makes no request to IANA.
   </t>
     
	</section>

  <?rfc compact="yes" ?>  

  <section title='Acknowledgements'>
  <t>
    The authors are grateful to Qiang Zu, Andrew Malis  for helpful comments. 
  </t>
  </section>

			<section anchor="log" title="Change Log">
			<t>
			<list style="symbols">
			<t>
			submitted version -00 as a working group draft after adoption
			</t>
			
			</list>
			</t>
		
		</section>
 </middle>


 <back>
 

 <references title='Normative References'> 
 			<?rfc include='reference.RFC.0826'?>
 			<?rfc include='reference.RFC.0903'?>
 		    <?rfc include='reference.RFC.2119'?>  
  		         <?rfc include='reference.RFC.2629'?>
  		          <?rfc include='reference.RFC.4861'?>  
  		            
 				
 				
 				
 				
 				
 				

 				<?rfc include='reference.I-D.ietf-nvo3-arch'?>
 				<?rfc include='reference.RFC.7348'?>
 				<?rfc include='reference.RFC.7364'?>
 				
 </references>


 <references title='Informative references'>
 
  	
  		
    	
    	   	<?rfc include='reference.I-D.ietf-nvo3-nve-nva-cp-req'?>
    	  	
   	   	
   	   	
   	   	<?rfc include='reference.I-D.herbert-nvo3-ila'?>
   	   	
   		
  </references>
  
  <?rfc compact="no" ?>
 
 </back> 
 
 </rfc>
