Network Working Group D. Bryant Request for Comments: 2166 3Com Corp Category: Informational P. Brittain Data Connection Ltd. June 1997 APPN Implementer's Workshop Closed Pages Document DLSw v2.0 Enhancements Status of this Memo This memo provides information for the Internet community. This memo does not specify an Internet standard of any kind. Distribution of this memo is unlimited. Abstract This document specifies - a set of extensions to RFC 1795 designed to improve the scalability of DLSw - clarifications to RFC 1795 in the light of the implementation experience to-date. It is assumed that the reader is familiar with DLSw and RFC 1795. No effort has been made to explain these existing protocols or associated terminology. This document was developed in the DLSw Related Interest Group (RIG) of the APPN Implementers Workshop (AIW). If you would like to participate in future DLSw discussions, please subscribe to the DLSw RIG mailing lists by sending a mail to majordomo@raleigh.ibm.com specifying 'subscribe aiw-dlsw' as the body of the message. Table of Contents 1. INTRODUCTION ................................................ 3 2. HALT REASON CODES............................................ 3 3. SCOPE OF SCALABILITY ENHANCEMENTS............................ 4 4. OVERVIEW OF SCALABILITY ENHANCEMENTS......................... 6 5. MULTICAST GROUPS AND ADDRESSING.............................. 7 5.1 USING MULTICAST GROUPS...................................... 8 5.2 DLSW MULTICAST ADDRESSES.................................... 8 6. DLSW MESSAGE TRANSPORTS...................................... 8 6.1 TCP/IP CONNECTIONS ON DEMAND................................ 9 6.1.1 TCP CONNECTIONS ON DEMAND RACE CONDITIONS................ 9 Bryant & Brittain Informational [Page 1] RFC 2166 APPN Implementer's Workshop June 1997 6.2 SINGLE SESSION TCP/IP CONNECTIONS........................... 9 6.2.1 EXPEDITED SINGLE SESSION TCP/IP CONNECTIONS.............. 10 6.2.1.1 TCP PORT NUMBERS...................................... 10 6.2.1.2 TCP CONNECTION SETUP.................................. 10 6.2.1.3 SINGLE SESSION SETUP RACE CONDITIONS.................. 10 6.2.1.4 TCP CONNECTIONS WITH NON-MULTICAST CAPABLE DLSW PEERS. 11 6.3 UDP DATAGRAMS............................................... 12 6.3.1 VENDOR SPECIFIC FUNCTIONS OVER UDP....................... 12 6.3.2 UNICAST UDP DATAGRAMS.................................... 12 6.3.3 MULTICAST UDP DATAGRAMS.................................. 13 6.4 UNICAST UDP DATAGRAMS IN LIEU OF IP MULTICAST............... 13 6.5 TCP TRANSPORT............................................... 14 7. MIGRATION SUPPORT............................................ 14 7.1 CAPABILITIES EXCHANGE....................................... 14 7.2 CONNECTING TO NON-MULTICAST CAPABLE NODES................... 15 7.3 COMMUNICATING WITH MULTICAST CAPABLE NODES.................. 15 8. SNA SUPPORT.................................................. 16 8.1 ADDRESS RESOLUTION.......................................... 16 8.2 EXPLORER FRAMES............................................. 16 8.3 CIRCUIT SETUP............................................... 17 8.4 EXAMPLE SNA SSP MESSAGE SEQUENCE............................ 17 8.5 UDP RELIABILITY............................................. 19 8.5.1 RETRIES.................................................. 19 9. NETBIOS...................................................... 20 9.1 ADDRESS RESOLUTION.......................................... 21 9.2 EXPLORER FRAMES............................................. 21 9.3 CIRCUIT SETUP............................................... 21 9.4 EXAMPLE NETBIOS SSP MESSAGE SEQUENCE........................ 22 9.5 MULTICAST RELIABILITY AND RETRIES........................... 24 10. SEQUENCING.................................................. 24 11. FRAME FORMATS............................................... 25 11.1 MULTICAST CAPABILITIES CONTROL VECTOR...................... 25 11.1.1 DLSW CAPABILITIES NEGATIVE RESPONSE..................... 26 11.2 UDP PACKETS................................................ 26 11.3 VENDOR SPECIFIC UDP PACKETS................................ 27 12. COMPLIANCE STATEMENT........................................ 28 13. SECURITY CONSIDERATIONS..................................... 29 14. ACKNOWLEDGEMENTS............................................ 29 15. AUTHORS' ADDRESSES.......................................... 30 16. APPENDIX - CLARIFICATIONS TO RFC 1795....................... 31 Bryant & Brittain Informational [Page 2] RFC 2166 APPN Implementer's Workshop June 1997 1. Introduction This document defines v2.0 of Data Link Switching (DLSw) in the form of a set of enhancements to RFC 1795. These enhancements are designed to be fully backward compatible with existing RFC 1795 implementations. As a compatible set of enhancements to RFC 1795, this document does not replace or supersede RFC 1795. The bulk of these enhancements address scalability issues in DLSw v1.0. Reason codes have also been added to the HALT_DL and HALT_DL_NOACK SSP messages in order to improve the diagnostic information available. Finally, the appendix to this document lists a number of clarifications to RFC 1795 where the implementation experience to- date has shown that the original RFC was ambiguous or unclear. These clarifications should be read alongside RFC 1795 to obtain a full specification of the base v1.0 DLSw standard. 2. HALT Reason codes RFC 1795 provides no mechanism for a DLSw to communicate to its peer the reason for dropping a circuit. DLSw v2.0 adds reason code fields to the HALT_DL and HALT_DL_NOACK SSP messages to carry this information. The reason code is carried as 6 bytes of data after the existing SSP header. The format of these bytes is as shown below. Byte Description 0-1 Generic HALT reason code in byte normal format 2-5 Vendor-specific detailed reason code The generic HALT reason code takes one of the following decimal values (which are chosen to match the disconnect reason codes specified in the DLSw MIB). 1 - Unknown error 2 - Received DISC from end-station 3 - Detected DLC error with end-station 4 - Circuit-level protocol error (e.g., pacing) 5 - Operator-initiated (mgt station or local console) The vendor-specific detailed reason code may take any value. Bryant & Brittain Informational [Page 3] RFC 2166 APPN Implementer's Workshop June 1997 All V2.0 DLSws must include this information on all HALT_DL and HALT_DL_NOACK messages sent to v2.0 DLSw peers. For backwards compatibility with RFC 1795, DLSw V2.0 implementations must also accept a HALT_DL or HALT_DL_NOACK message received from a DLSw peer that does not carry this information (i.e. RFC 1795 format for these SSP messages). 3. Scope of Scalability Enhancements The DLSw Scalability group of the AIW identified a number of scalability issues associated with existing DLSw protocols as defined in RFC 1795: - Administration RFC 1795 implies the need to define the transport address of all DLSw peers at each DLSw. In highly meshed situations (such as those often found in NetBIOS networks), the resultant administrative burden is undesirable. - Address Resolution RFC 1795 defines point to point TCP (or other reliable transport protocol) connections between DLSw peers. When attempting to discover the location of an unknown resource, a DLSw sends an address resolution packet to each DLSw peer over these connections. In highly meshed configurations, this can result in a very large number of packets in the transport network. Although each packet is sent individually to each DLSw peer, they are each identical in nature. Thus the transport network is burdened with excessive numbers of identical packets. Since the transport network is most commonly a wide area network, where bandwidth is considered a precious resource, this packet duplication is undesirable. - Broadcast Packets In addition to the address resolution packets described above, RFC 1795 also propagates NetBIOS broadcast packets into the transport network. The UI frames of NetBIOS are sent as LAN broadcast packets. RFC 1795 propagates these packets over the point to point transport connections to each DLSw peer. In the same manner as above, this creates a large number of identical packets in the transport network, and hence is undesirable. Since NetBIOS UI frames can be sent by applications, it is difficult to predict or control the rate and quantity of such traffic. This compounds the undesirability of the existing RFC 1795 propagation method for these packets. Bryant & Brittain Informational [Page 4] RFC 2166 APPN Implementer's Workshop June 1997 - TCP (transport connection) Overhead As defined in RFC 1795, each DLSw maintains a transport connection to its DLSw peers. Each transport connection guarantees in order packet delivery. This is accomplished using acknowledgment and sequencing algorithms which require both CPU and memory at the DLSw endpoints in direct proportion to the number transport connections. The DLSw Scalability group has identified two scenarios where the number of transport connections can become significant resulting in excessive overhead and corresponding equipment costs (memory and CPU). The first scenario is found in highly meshed DLSw configurations where the number of transport connections approximates n2 (where n is the number of DLSw peers). This is typically found in DLSw networks supporting NetBIOS. The second scenario is found in networks where many remote locations communicate to few central sites. In this case, the central sites must support n transport connections (where n is the number of remote sites). In both scenarios the resultant transport connection overhead is considered undesirable depending upon the value of n. - LLC2 overhead RFC 1795 specifies that each DLSw provides local termination for the LLC2 (SDLC or other SNA reliable data link protocol) sessions traversing the SSP. Because these reliable data links provide guaranteed in order packet delivery, the memory and CPU overhead of maintaining these connections can also become significant. This is particularly undesirable in the second scenario described above, because the number of reliable connections maintained at the central site is the aggregate of the connections maintained at each remote site. It is not the intent of this document to address all the undesirable scalability issues associated with RFC 1795. This paper identifies protocol enhancements to RFC 1795 using the inherent multicast capabilities of the underlying transport network to improve the scalability of RFC 1795. It is believed that the enhancements defined, herein, address many of the issues identified above, such as administration, address resolution, broadcast packets, and, to a lesser extent, transport overhead. This paper does not address LLC2 overhead. Subsequent efforts by the AIW and/or DLSw Scalability group may address the unresolved scalability issues. Bryant & Brittain Informational [Page 5] RFC 2166 APPN Implementer's Workshop June 1997 While it is the intent of this paper to accommodate all transport protocols as best as is possible, it is recognized that the multicast capabilities of many protocols is not yet well defined, understood, or implemented. Since TCP is the most prevalent DLSw transport protocol in use today, the DLSw Scalability group has chosen to focus its definition around IP based multicast services. This document only addresses the implementation detail of IP based multicast services. This proposal does not consider the impacts of IPv6 as this was considered too far from widespread use at the time of writing. 4. Overview of Scalability Enhancements This paper describes the use of multicast services within the transport network to improve the scalability of DLSw based networking. There are only a few main components of this proposal: - Single session TCP connections RFC 1795 defines a negotiation protocol for DLSw peers to choose either two unidirectional or one bi-directional TCP connection. DLSws implementing the enhancements described in this document must support and use(whenever required and possible)a single bi- directional TCP connection between DLSw peers. That is to say that the single tunnel negotiation support of RFC 1795 is a prerequisite function to this set of enhancements. Use of two unidirectional TCP connections is only allowed (and required)for migration purposes when communicating with DLSw peers that do not implement these enhancements. This document also specifies a faster method for bringing up a single TCP connection between two DLSw peers than the negotiation used in RFC 1795. This faster method, detailed in section 6.2.1, must be used where both peers are known to support DLSw v2.0. - TCP connections on demand Two DLSw peers using these enhancements will only establish a TCP connection when necessary. SSP connections to DLSw peers which do not implement these enhancements are assumed to be established by the means defined in RFC 1795. DLSws implementing v2.0 utilize UDP based transport services to send address resolution packets (CANUREACH_ex, NETBIOS_NQ_ex, etc.). If a positive response is received, then a TCP connection is only established to the associated DLSw peer if one does not already exist. Correspondingly, TCP connections are brought down when there are no circuits to a DLSw peer for an implementation defined period of time. Bryant & Brittain Informational [Page 6] RFC 2166 APPN Implementer's Workshop June 1997 - Address resolution through UDP The main thrust of this paper is to utilize non-reliable transport and the inherent efficiencies of multicast protocols whenever possible and applicable to reduce network overhead. Accordingly, the address resolution protocols of SNA and NetBIOS are sent over the non-reliable transport of IP, namely UDP. In addition, IP multicast/unicast services are used whenever address resolution packets must be sent to multiple destinations. This avoids the need to maintain TCP SSP connections between two DLSw peers when no circuits are active. CANUREACH_ex and ICANREACH_ex packets can be sent to all the appropriate DLSw peers without the need for pre- configured peers or pre-established TCP/IP connections. In addition, most multicast services (including TCP's MOSPF, DVMRP, MIP, etc.) replicate and propagate messages only as necessary to deliver to all multicast members. This avoids duplication and excessive bandwidth consumption in the transport network. To further optimize the use of WAN resources, address resolution responses are sent in a directed fashion (i.e., unicast) via UDP transport whenever possible. This avoids the need to setup or maintain TCP connections when they are not required. It also avoids the bandwidth costs associated with broadcasting. Note: It is also permitted to send some address resolution traffic over existing TCP connections. The conditions under which this is permitted are detailed in section 7. - NetBIOS broadcasts over UDP In the same manner as above, NetBIOS broadcast packets are sent via UDP (unicast and multicast) whenever possible and appropriate. This avoids the need to establish TCP connections between DLSw peers when there are no circuits required. In addition, bandwidth in the transport network is conserved by utilizing the efficiencies inherent to multicast service implementation. Details covering identification of these packets and proper propagation methods are described in section 10. 5. Multicast Groups and Addressing IP multicast services provides an unreliable datagram oriented delivery service to multiple parties. Communication is accomplished by sending and/or listening to specific 'multicast' addresses. When a given node sends a packet to a specific address (defined to be within the multicast address range), the IP network (unreliably) delivers the packet to every node listening on that address. Bryant & Brittain Informational [Page 7] RFC 2166 APPN Implementer's Workshop June 1997 Thus, DLSws can make use of this service by simply sending and receiving (i.e., listening for) packets on the appropriate multicast addresses. With careful planning and implementation, networks can be effectively partitioned and network overhead controlled by sending and listening on different addresses groups. It is not the intent of this paper to define or describe the techniques by which this can be accomplished. It is expected that the networking industry (vendors and end users alike) will determine the most appropriate ways to make use of the functions provided by use of DLSw multicast transport services. 5.1 Using Multicast Groups The multicast addressing as described above can be effectively used to limit the amount of broadcast/multicast traffic in the network. It is not the intent of this document to describe how individual DLSw/SSP implementations would assign or choose group addresses. The specifics of how this is done and exposed to the end user is an issue for the specific implementor. In order to provide for multivendor interoperability and simplicity of configuration, however, this paper defines a single IP multicast address, 224.0.10.000, to be used as a default DLSw multicast address. If a given implementation chooses to provide a default multicast address, it is recommended this address be used. In addition, this address should be used for both transmitting and receiving of multicast SSP messages. Implementation of a default multicast address is not, however, required. 5.2 DLSw Multicast Addresses For the purpose of long term interoperability, the AIW has secured a block of IP multicast addresses to be used with DLSw. These addresses are listed below: Address Range Purpose -------------------------------------------------------------------- 224.0.10.000 Default multicast address 224.0.10.001-191 User defined DLSw multicast groups 224.0.10.192-255 Reserved for future use by the DLSw RIG in DLSw enhancements 6. DLSw Message Transports With the introduction of DLSw Multicast Protocols, SSP messages are now sent over two distinct transport mechanisms: TCP/IP connections and UDP services. Furthermore, the UDP datagrams can be sent to two different kinds of IP addresses: unique IP addresses (generally associated with a specific DLSw), and multicast IP addresses (generally associated with a group of DLSw peers). Bryant & Brittain Informational [Page 8] RFC 2166 APPN Implementer's Workshop June 1997 6.1 TCP/IP Connections on Demand As is the case in RFC 1795, TCP/IP connections are established between DLSw peers. Unlike RFC 1795, however, TCP/IP connections are only established to carry reliable circuit data (i.e., LLC2 based circuits). Accordingly, a TCP/IP connection is only established to a given DLSw peer when the first circuit to that DLSw is required (i.e., the origin DLSw must send a CANUREACH_CS to a target DLSw peer and there is no existing TCP connection between the two). In addition, the TCP/IP connection is brought down an implementation defined amount of time after the last active (not pending) circuit has terminated. In this way, the overhead associated with maintaining TCP connections is minimized. With the advent of TCP connections on demand, the activation and deactivation of TCP connections becomes a normal occurrence as opposed to the exception event it constitutes in RFC 1795. For this reason, it is recommended that implementations carefully consider the value of SNMP traps for this condition. 6.1.1 TCP Connections on Demand Race Conditions Non-circuit based SSP packetsn (e.g.,CANUREACH_ex, etc.) may still be sent/received over TCP connections after all circuits have been terminated. Taking this into account implementations should still gracefully terminate these TCP connections once the connection is no longer supporting circuits. This may require an implementation to retransmit request frames over UDP when no response to a TCP based unicast request is received and the TCP connection is brought down. This is not required in the case of multicast requests as these are received over the multicast transport mechanism. 6.2 Single Session TCP/IP Connections RFC 1795 defines the use of two unidirectional TCP/IP sessions between any pair of DLSw peers using read port number 2065 and write port number 2067. Additionally, RFC 1795 allows for implementations to optionally use only one bi-directional TCP/IP session. Using one TCP/IP session between DLSw peers is believed to significantly improve the performance and scalability of DLSw protocols. Performance is improved because TCP/IP acknowledgments are much more likely to be piggy-backed on real data when TCP/IP sessions are used bi-directionally. Scalability is improved because fewer TCP control blocks, state machines, and associated message buffers are required. For these reasons, the DLSw enhancements defined in this paper REQUIRE the use of single session TCP/IP sessions. Bryant & Brittain Informational [Page 9] RFC 2166 APPN Implementer's Workshop June 1997 Accordingly, DLSws implementing these enhancements must carry the TCP Connections Control Vector in their Capabilities Exchange. In addition, the TCP Connections Control Vector must indicate support for 1 connection. 6.2.1 Expedited Single Session TCP/IP Connections In RFC 1795, single session TCP/IP connections are accomplished by first establishing two uni-directional TCP connections, exchanging capabilities, and then bringing down one of the connections. In order to avoid the unnecessary flows and time delays associated with this process, a new single session bi-directional TCP/IP connection establishment algorithm is defined. 6.2.1.1 TCP Port Numbers DLSws implementing these enhancements will use a TCP destination port of 2067 (as opposed to RFC 1795 which uses 2065) for single session TCP connections. The source port will be a random port number using the established TCP norms which exclude the possibility of either 2065 or 2067. 6.2.1.2 TCP Connection Setup DLSw peers implementing these enhancements will establish a single session TCP connection whenever the associated peer is known to support this capability. To do this, the initiating DLSw simply sends a TCP setup request to destination port 2067. The receiving DLSw responds accordingly and the TCP three way handshake ensues. Once this handshake has completed, each DLSw is notified and the DLSw capabilities exchange ensues. As in RFC 1795, no flows may take place until the capabilities exchange completes. 6.2.1.3 Single Session Setup Race Conditions The new expedited single session setup procedure described above opens up the possibility of a race condition that occurs when two DLSw peers attempt to setup single session TCP connections to each other at the same time. To avoid the establishment of two TCP connections, the following rules are applied when establishing expedited single session TCP connections: 1.If an inbound TCP connect indication is received on port 2067 while an outbound TCP connect request (on port 2067) to the same DLSw (IP address) is in process or outstanding, the DLSw with the higher IP address will close or reject the connection from the DLSw with the lower IP address. Bryant & Brittain Informational [Page 10] RFC 2166 APPN Implementer's Workshop June 1997 2.To further expedite the process, the DLSw with the lower IP address may choose (implementation option) to close its connection request to the DLSw with the higher address when this condition is detected. 3.If the DLSw with the lower IP address has already sent its capabilities exchange request on its connection to the DLSw with the higher IP address, it must resend its capabilities exchange request over the remaining TCP connection from its DLSw peer (with the higher IP address). 4.The DLSw with the higher IP address must ignore any capabilities exchange request received over the TCP connection to be terminated (the one from the DLSw with the lower IP address). 6.2.1.4 TCP Connections with Non-Multicast Capable DLSw peers During periods of migration, it is possible that TCP connections between multicast capable and non-multicast capable DLSw peers will occur. It is also possible that multicast capable DLSws may attempt to establish TCP connections with partners of unknown capabilities (e.g., statically defined peers). To handle these conditions the following additional rules apply to expedited single session TCP connection setup: 1.If the capability of a DLSw peer is not known, an implementation may choose to send the initial TCP connect request to either port 2067 (expedited single session setup) or port 2065 (standard RFC 1795 TCP setup). 2.If a multicast capable DLSw receives an inbound TCP connect request on port 2065 while processing an outbound request on 2067 to the same DLSw, the sending DLSw will terminate its 2067 request and respond as defined in RFC 1795 with an outbound 2065 request (standard RFC 1795 TCP setup). 3.If a multicast capable DLSw receives an indication that the DLSw peer is not multicast capable (the port 2067 setup request times out or a port not recognized rejection is received), it will send another connection request using port 2065 and the standard RFC 1795 session setup protocol. Bryant & Brittain Informational [Page 11] RFC 2166 APPN Implementer's Workshop June 1997 6.3 UDP Datagrams As mentioned above, UDP datagrams can be sent two different ways: unicast (e.g., sent to a single unique IP address) or multicast (i.e., sent to an IP multicast address). Throughout this document, the term UDP datagram will be used to refer to SSP messages sent over UDP, while unicast and multicast SSP messages will refer to the specific type/method of UDP packet transport. In either case, standard UDP services are used to transport these packets. In order to properly parse the inbound UDP packets and deliver them to the SSP state machines, all DLSw UDP packets will use the destination port of 2067. In addition, the checksum function of UDP remains optional for DLSw SSP messages. It is believed that the inherent CRC capabilities of all data link transports will adequately protect SSP packets during transmission. And the incremental exposure to intermediate nodal data corruption is negligible. For further information on UDP packet