Negotiating Simultaneous Modalities in Real-Time Communications

In certain applications it is of interest to indicate a need for, or the availability of, transformed version of the contents of a media stream in another media, while still also providing the original. The application of this indication may for example be for rapid subtitling of speech either manually or automatically. It may also be sign language interpretation of speech, or spoken interpretation of sign language when both the original and the interpretation is delivered to the user. A mechanism for language negotiation in real-time communications is introduced in . This specification extends the mechanism with an indication that a transformation of language contents in one modality is desired in a different modality. Negotition of transformations of the language contents simultaneously with the original contents can be accomplished by using this indication in the context of language negotiation in real-time communications . The same indication is used for indication of preparedness to send language contents in one modality simultaneously with same content in a different modality The indication is based on the "t" extension of , specified in RFC 6497 .

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in .

The mechanism specified here extends the language negotiation mechanism specified in with a mechanism for indicating request for, or availability of, original content and transformed form of original language content in another modality in the same transmission direction. The indication should be provided in language tags of the 'hlang-send' or 'hlang-recv' SDP attribute values specified in . When the transformed language is to be requested or provided simultaneously with the original, this condition should be indicated by using the "t" extension to BCP47 as specified in RFC 6497 , by attaching a "t" subtag on the language tag for the language that is expected to be provided in a transformed modality. Briefly, the 't' extension consists of the string "-t" followed by the source language subtag. Example: "en-t-en" is an English transform of an English source (in another modality). On reception of an indication in a 'hlang-recv' attribute, a language tag with the 't' extension and also a 'hang-recv' attribute for another modality with the same language without the "t" extension, the answering party should interpret this as a request to send both the original and the transformed content. On reception of an indication in a 'hlang-send' attribute a language tag with the 't' extension and also a 'hang-send' attribute for another modality with the same language without the "t" extension, the answering party should interpret this as the preparedness of the offering party to send both the original and the transformed content simultaneously. The media that the "t" extension is attached to should only be interpreted as an expectation for how the transformation will be made. Conditions in the real established session MAY cause the original and transformation to swap roles from what the subtags indicated.

Indication of a request for reception of multiple simultaneous modalities by the "t" extension in an offer by 'hlang-recv' attributes should be interpreted as a request to receive these modalities simultaneously. The answering party MAY satisfy this request by providing the requested simultaneous modalities. This should be indicated in the answer by the "t" extension in the 'hlang-send' SDP attributes. If the answering party had no possibility to provide the simultaneous modalities, then no "t" extensions should be indicated in the 'hlang-send' attribute values with the same original language. Indication of availability of providing simultaneous modality of an original language should be indicated by the "t" extension in the 'hlang-send' attribute in the offer. The answering party SHOULD indicate its interest to receive the offered simultaneous modalities by including the "t" extension in 'hlang-recv' attributes in the approriate media specifications in the answer. If the answering party is only interested in receiving one of the offered modalities, then the language tag should only be provided in the corresponding 'hlang-recv' attribute in the answer. If an answering party prefers to receive simultaneous modalities of an original language content that was not offered in the 'hlang-send' attribute in the offer, then the answering party MAY anyway include the preferred language and modality with the "t" extension in the answer. The answering user may then observe in the language exchange in the beginning of the session if the request for simultaneous modalities could be satisfied. For cases when a more formal indication of the satisfaction of the request, the answering party SHOULD request an update of the session and include the request for reception of multiple simultaneous modalities in the 'hlang-recv' attributes. The indications of multiple simultaneous modalities MAY be combined with other preference indications defined for the application of the 'hlang-' attributes.

It is not possible to use the "t" extention to indicate an alternative language for selection in a different modality than the original language that is also included in a 'hlang-' attribute. Implementations SHOULD always interpret such indications as indications for simultaneous modality. If interpretation as alternative languages to select from is desired, the "t" extension SHOULD be omitted.

A request for a written English subtitling to be received by the caller in the text stream created from a spoken English source in the audio stream. The caller also indicates a preference to speak English: m=audio 49250 RTP/AVP 20 a=hlang-recv:en a=hlang-send:en* m=text 45020 RTP/AVP 103 104 a=hlang-recv:en-t-en An acknowledgement of the request: m=audio 49250 RTP/AVP 20 a=hlang-send:en a=hlang-recv:en m=text 45020 RTP/AVP 103 104 a=hlang-send:en-t-en In the session, the caller will receive both spoken English and written English. The caller will send English speech. An alternative response from a party that cannot satisfy the request, but only provide spoken English: m=audio 49250 RTP/AVP 20 a=hlang-send:en a=hlang-recv:en m=text 45020 RTP/AVP 103 104

No IANA considerations. This specification reuses already registered entities.

Some users may regard their language and modality preference details to be sensitive and requiring privacy and security measures. This fact should be considered when implementing the mechanism specified in this document. The security considerations are common with .