<menu id="w8yyk"><menu id="w8yyk"></menu></menu>
  • <dd id="w8yyk"><nav id="w8yyk"></nav></dd>
    <menu id="w8yyk"></menu>
    <menu id="w8yyk"><code id="w8yyk"></code></menu>
    <menu id="w8yyk"></menu>
    <xmp id="w8yyk">
    <xmp id="w8yyk"><nav id="w8yyk"></nav>
  • 網站首頁 > 物聯資訊 > 技術分享

    H.264視頻的RTP荷載格式

    2016-09-28 00:00:00 廣州睿豐德信息科技有限公司 閱讀
    睿豐德科技 專注RFID識別技術和條碼識別技術與管理軟件的集成項目。質量追溯系統、MES系統、金蝶與條碼系統對接、用友與條碼系統對接

    Status of This Memo


       This document specifies an Internet standards track protocol for the
       Internet community, and requests discussion and suggestions for
       improvements.  Please refer to the current edition of the "Internet
       Official Protocol Standards" (STD 1) for the standardization state
       and status of this protocol.  Distribution of this memo is unlimited.


    Copyright Notice


       Copyright (C) The Internet Society (2005).


    Abstract


       This memo describes an RTP Payload format for the ITU-T
       Recommendation H.264 video codec and the technically identical
       ISO/IEC International Standard 14496-10 video codec.  The RTP payload
       format allows for packetization of one or more Network Abstraction
       Layer Units (NALUs), produced by an H.264 video encoder, in each RTP
       payload.  The payload format has wide applicability, as it supports
       applications from simple low bit-rate conversational usage, to
       Internet video streaming with interleaved transmission, to high bit-
       rate video-on-demand.


    目錄


       1.  介紹        ........................................  3
           1.1.  H.264 Codec    ...............................  3
           1.2.  參數集概念         ...........................  4
           1.3.  網絡抽象層單元類型............................  5
       2.  約定       .........................................  6
       3.  范圍 ...............................................  6
       4.  定義和縮寫         .................................  6
           4.1.  定義     .....................................  6
       5.  RTP 荷載格式   .....................................  8
           5.1.  RTP 頭的使用..................................  8
           5.2.  RTP荷載格式的公共使用           .............. 11
           5.3.  NAL單言字節的用法 ............................ 12
           5.4.  打包方式  .................................... 14
           5.5.  解碼順序號  (DON)............................. 15
           5.6.  單個NAL單元包................................. 18
           5.7.  復合包       ................................. 18
           5.8.  分片單元 (FUs) ............................... 27
       6.  分包規則         ................................... 31
           6.1.  公共分包規則    .............................. 31
           6.2.  單個NAL單元方式............................... 32
           6.3.  非交錯方式     ............................... 32
           6.4.  交錯方式       ............................... 33
       7.  打包過程 (信息)             ........................ 33
           7.1.  單NAL單元和非交錯方式         ................ 33
           7.2.  交錯方式       ............................... 34
           7.3.  附加的打包原則              .................. 36
       8.  荷載格式參數     ................................... 37
           8.1.  MIME 注冊 .................................... 37
           8.2.  SDP 參數...................................... 52
           8.3.  例子.......................................... 58
           8.4.  參數集考慮        ............................ 60
       9.  安全考慮     ....................................... 62
       10. 擁塞控制............................................ 63
       11. IANA考慮 ........................................... 64
       12. 信息化附錄: 應用例子            .................... 65
           12.1. 根據ITU-T H.241 附錄A的視頻電話............... 65
           12.2. 沒有分片數據分區,沒有NAL單元聚合的視頻電話... 65
           12.3. 使用NAL單元聚合交錯打包的視頻電話............. 66
           12.4. 使用數據分區的視頻電話      .................. 66
           12.5. 使用FU和向前糾錯的視頻電話和流................ 67
           12.6. 低位率流    .................................. 69
           12.7. 視頻流中健壯的包調度             ............. 70
       13. 信息化附錄:解碼順序號的原理                    ..... 71
           13.1. 介紹.......................................... 71
           13.2. 多圖像片斷交錯的例子             ............. 71
           13.3. 健壯包調度的例子          .................... 73
           13.4. 冗余編碼片斷健壯傳輸調度的例子................ 77
           13.5. 其它設計可能的提醒         ................... 77
       14. 致謝  .............................................. 78
       15. 參考 ............................................... 78
           15.1. 標準化參考.................................... 78
           15.2. 參考性的參考.................................. 79
       作者地址................................................ 81
       完全版權聲明  .......................................... 83


    1.  介紹


    1.1.  H.264 Codec


       本文指定一個RTP荷載規范用于ITU-T H.264 視頻編碼標準(ISO/IEC 14496 Part 10 [2])(兩個都稱為高級視頻編碼
       AVC).  H.264建議在2005年5月被ITU-T采納, 草案規范對于公共回顧可用[8]. 本文H.264 縮寫用于codec和標準,但是
       本文等價于采納 ISO/IEC相似的編碼標準.


       H.264 視頻 codec又非常廣泛的應用覆蓋所有格式的數字壓縮視頻格式,從低帶寬的Internet流應用到HDTV廣播和數字
       影院應用。和當前的技術狀態比較, 整個H.264的性能被報告節省50%的位率。例如,數字衛星TV質量被報告在1.5 Mbit/s,
       就可以實現,而當前的MPEG 2的操作點在大約3.5 Mbit/s [9].


       該codec規范自己概念上區分[1]視頻編碼層(VCL)和網絡抽象層(NAL). VCL包含Codec的信令處理功能;以及如轉換,量化,
       運動補償預測機制;以及循環過濾器。他遵從今天大多數視頻codec的一般概念,基于宏快的編碼器,使用基于運動補償的
       圖像間預測和殘余信號的轉換編碼。VCL編碼器輸出片斷: 一個位串包含整數數目宏快的宏塊數據,以及片斷頭信息(包含
       片斷內第一個宏快的空間地址, 初始量化參數以及相似信息). 片斷內的宏快按照掃描順序安排,除非指定一個不同的宏塊
       分配,通過使用被稱為靈活宏塊順序語法Flexible Macroblock Ordering syntax.圖像內的預測只用于一個片斷內部。更多
       信息在[9]提供.


       (NAL)編碼器封裝VCL編碼器輸出的片斷到網絡抽象層單元(NAL units),它適合于通過包網路傳輸或用于面向包的多路復用
       環境。H.264的附錄B定義封裝過程傳輸這樣的NAL單元通過面向字節流的網絡。本文檔范圍, 附錄 B 不相關的。 
       NAL使用NAL單元. 一個NAL單元由一字節的頭和荷載字節串組成。 頭指示NAL單元的類型, 是否有位錯誤或語法沖突在NAL
       單元荷載中,以及對于解碼過程該NAL單元相對重要性的信息。本RTP荷載規范被設計成不了解NAL單元荷載的位串。


       H.264的一個主要特性是傳輸時間,解碼時間,圖像以及片斷采樣演示時間完全的解耦合。H.264中指定的解碼過程是不知道
       時間的, 并且H.264語法沒有運送如跳過幀數目(在早期視頻壓縮標準,時間參考格式中是普遍的)信息.同時,有的NAL單元
       影響許多圖像,因此固有的是無時間性的。因為這樣的原因,處理RTP時戳要求對于采樣或演示時間沒有定義或者在傳輸時間
       不知道的NAL單元進行一些特殊的考慮。 


    1.2.  參數集概念


       H.264一個非常基本的設計概念是產生自包含包, 使得如RFC2429的頭重復或MPEG-4的頭擴展編碼(HEC)[11]機制變得不必要。
       這是通過從媒體流解耦合不止一個片斷的相對信息來實現的。高層meta信息應該可靠/異步的發送,事先不和包含片斷包的RTP
       包流發送。(對于沒有通過帶外傳輸信道發送本信息的應用,通過帶內發送本信息也提供了手段)。高層參數的組合被稱為參數集。
       H.264規范包括兩類參數集:順序參數集和圖像參數集。一個活動順序參數集在一個編碼視頻序列中保持不變,一個活動圖像參數集
       在一個編碼圖像里保持不變。順序和圖像參數集結構包含如圖像大小,采用的可選的編碼模式,宏塊到片斷組映射等信息。


       為了改變圖像參數(如圖像大小)而不用同步傳送參數集修改給片斷包流,編碼器和解碼器可以維護不止一個順序和圖像參數集的
       列表。每個片斷頭包含一個碼字指示使用的順序和圖像參數集。


       本機制允許從包流中解耦合參數集的傳輸,通過外部手段傳輸他們(即,作為能力交換的副作用),或通過一個(可靠或不可靠)控制協議
       他們從沒有被傳送但是被應用設計規范修復甚至是可能的。


    1.3.  網絡抽象層單元類型


       可以在[12], [13],[14]中找到關于NAL設計的學習信息.


       所有NAL單元有一個單個NAL單元類型字節,他也作為本RTP荷載格式的荷載頭.后面立即跟隨NAL單元的荷載。


       NAL單元類型字節的語法語義在[1]中指定,但是NAL單元類型的基本屬性總結如下。NAL單元類型字節格式如下:
       
          +---------------+
          |0|1|2|3|4|5|6|7|
          +-+-+-+-+-+-+-+-+
          |F|NRI|  Type   |
          +---------------+


       NAL單元類型字節部件的語義在H.264規范中制定, 簡要描述如下.


       F: 1 bit
          forbidden_zero_bit.  H.264規范聲明設置為1指示語法違例。


       NRI: 2 bits
          nal_ref_idc.  00值指示NAL單元的不用于幀間圖像預測的重構參考圖像。這樣的NAL單元可以被丟棄而不用冒參考
          圖像完整性的風險。大于0的值指示NAL單元的解碼要求維護參考圖像的完整性。


       Type: 5 bits
          nal_unit_type.  本部件指定NAL單元荷載類型定義在[1]的表 7-1中和本文后面。為了參考所有當前定義的NAL單元類型
          和他們的語義,參考 [1]的7.4.1.


       本文引入新的NAL單元類型,在5.2演示.  定義在本文的NAL單元類型在[1]中標記為未指定。但是,本規范擴展了F和 NRI的
       語義,象5.3描述的那樣.


    2.  Conventions


       The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
       "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
       document are to be interpreted as described in BCP 14, RFC 2119 [3].


       This specification uses the notion of setting and clearing a bit when
       bit fields are handled.  Setting a bit is the same as assigning that
       bit the value of 1 (On).  Clearing a bit is the same as assigning
       that bit the value of 0 (Off).


    3.  Scope


       This payload specification can only be used to carry the "naked"
       H.264 NAL unit stream over RTP, and not the bitstream format
       discussed in Annex B of H.264.  Likely, the first applications of
       this specification will be in the conversational multimedia field,
       video telephony or video conferencing, but the payload format also
       covers other applications, such as Internet streaming and TV over IP.


    4.  定義和縮寫


    4.1.  定義


       本文檔使用[1]中的定義. 為了方便以下定義在[1]中的詞語總結出來:
       
          access unit: 一組NAL單元總包括一個主要的編碼圖像。除了主要的編碼圖像,一個 access unit也可以包含
          一個或多個冗余編碼圖像或其他的不包括片斷或編碼圖像片斷分區數據的NAL單元 。access unit的解碼總是
          導致一個解碼的圖像。


          coded video sequence: A sequence of access units that consists, in
          decoding order, of an instantaneous decoding refresh (IDR) access
          unit followed by zero or more non-IDR access units including all
          subsequent access units up to but not including any subsequent IDR
          access unit.


          IDR access unit: An access unit in which the primary coded picture
          is an IDR picture.


          IDR picture: A coded picture containing only slices with I or SI
          slice types that causes a "reset" in the decoding process.  After
          the decoding of an IDR picture, all following coded pictures in
          decoding order can be decoded without inter prediction from any
          picture decoded prior to the IDR picture.


          primary coded picture: The coded representation of a picture to be
          used by the decoding process for a bitstream conforming to H.264.
          The primary coded picture contains all macroblocks of the picture.


          redundant coded picture: A coded representation of a picture or a
          part of a picture.  The content of a redundant coded picture shall
          not be used by the decoding process for a bitstream conforming to
          H.264.  The content of a redundant coded picture may be used by
          the decoding process for a bitstream that contains errors or
          losses.


          VCL NAL unit: A collective term used to refer to coded slice and
          coded data partition NAL units.


       In addition, the following definitions apply:


          decoding order number (DON): A field in the payload structure, or
          a derived variable indicating NAL unit decoding order.  Values of
          DON are in the range of 0 to 65535, inclusive.  After reaching the
          maximum value, the value of DON wraps around to 0.


          NAL unit decoding order: A NAL unit order that conforms to the
          constraints on NAL unit order given in section 7.4.1.2 in [1].


          transmission order: The order of packets in ascending RTP sequence
          number order (in modulo arithmetic).  Within an aggregation
          packet, the NAL unit transmission order is the same as the order
          of appearance of NAL units in the packet.


          media aware network element (MANE): A network element, such as a
          middlebox or application layer gateway that is capable of parsing
          certain aspects of the RTP payload headers or the RTP payload and
          reacting to the contents.


             Informative note: The concept of a MANE goes beyond normal
             routers or gateways in that a MANE has to be aware of the
             signaling (e.g., to learn about the payload type mappings of
             the media streams), and in that it has to be trusted when
             working with SRTP.  The advantage of using MANEs is that they
             allow packets to be dropped according to the needs of the media
             coding.  For example, if a MANE has to drop packets due to
             congestion on a certain link, it can identify those packets
             whose dropping has the smallest negative impact on the user
             experience and remove them in order to remove the congestion
             and/or keep the delay low.


       縮寫


          DON:        解碼順序號
          DONB:       解碼順序基
          DOND:       解碼順序號差
          FEC:        向前糾錯
          FU:         分片單元
          IDR:        瞬間解碼刷新
          IEC:        國際電子委員會
          ISO:        國際標準化組織
          ITU-T:      國際電聯-通信標準部門
          MANE:       美提感知網絡元素
          MTAP:       多時刻聚合包
          MTAP16:     16位時戳位移的MTAP
          MTAP24:     24位時戳位移的MTAP
          NAL:        網絡抽象層
          NALU:       NAL單元
          SEI:        補充增強信息
          STAP:       單時刻聚合包
          STAP-A:     STAP類型A
          STAP-B:     STAP類型B
          TS:         時戳
          VCL:        視頻編碼層


    5.  RTP 荷載格式


    5.1.  RTP頭的使用


       RTP 頭的格式在RFC 3550 [4]中指定為了方便在圖1又顯示出來。本載荷格式使用頭中域的方式和該規范一致。


       當一個 NAL 單元封裝在每個RTP包中, 推薦的RTP荷載格式在5.6節指定。對于聚合包/分片包的RTP荷載 (以及
       一些rtp頭域的設置)在5.7和5.8節指定。


           0                   1                   2                   3
           0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |V=2|P|X|  CC   |M|     PT      |       sequence number         |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |                           timestamp                           |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |           synchronization source (SSRC) identifier            |
          +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
          |            contributing source (CSRC) identifiers             |
          |                             ....                              |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


                           圖 1.  RTP 頭。


       根據RTP荷載格式設置的RTP頭信息按如下設置: 
       
       Marker bit (M): 1 bit
          對于RTP時戳指示的訪問單元的最后一個包本位進行設置,符合視頻格式M位的常規使用,以允許有效
          緩沖處理布局。對于聚合包(STAP,MTAP),RTP頭中的M位必須設置成最后一個NAL單元如果被傳送在
          單個RTP包中時M位對應的值。解碼器可以使用本位作為早期最后一個包的指示,但是不可以依賴本
          屬性。


           注:運送多個NAL單元的聚合包只有一個M位相關聯。因此,如果一個網關重新打包一個聚合包為幾
           個包,它可能不會可靠設置這些包的M位。


       Payload type (PT): 7 bits
          
          本新的包格式的荷載類型的值超過本文檔的范圍,在此不指明。荷載類型的賦值或者通過profile或者
          通過動態方式。


       Sequence number (SN): 16 bits
       
          根據RFC 3550設置使用. 對于單個NALU與非交錯打包方式, 序號用于對定NALU解碼順序。


       Timestamp: 32 bits
       
          RTP時戳設置為內容的采樣時戳。必須使用90 kHz 時鐘頻率。


          如果NAL單元沒有他自己的時間屬性(即,parameter set and SEI NAL units),RTP時戳設置成訪問單元主編碼圖像
          的RTP時戳,根據[1]的7.4.1.2節。


          MTAPs時戳的設置在5.7.2定義.


          接收者應該忽略包含在訪問單元(只有一個顯示時戳)的任何圖像時間SEI消息,相反,接收者應該使用RTP時戳
          同步顯示過程。
          
          RTP發送者你不應該傳送圖像時間 SEI消息對于不支持被顯示成多個場的圖像。


          如果一個訪問單元有多于一個顯示時戳在圖像時間SEI消息中, SEI消息中的信息應該被對待成相對于RTP時戳的,
          最早事件發生在RTP時戳給定的時間, 后續事件發生的時間由SEI消息中圖像時間值差給定。假設tSEI1, tSEI2, ...,
          tSEIn 為SEI消息中運送的顯示時間戳, 其中tSEI1 是所有這樣時間戳的最早值。tmadjst()是一個函數,他調整
          SEI消息時間到90-kHz時間.TS是RTP時戳.則,和tSEI1關聯的顯示時間是TS. 和tSEIx[x=[2..n]]關聯事件的顯示時間為 
          TS + tmadjst (tSEIx - tSEI1).


             注釋: 在一個3:2折疊的操作中需要顯示編碼的幀作為場, 在其中組成編碼幀的電影內容使用隔行掃描顯示。
             圖像定時SEI消息使得運送相同編碼圖像的多個時戳,因此3:2折疊過程正確控制。圖像定時SEI消息機制是必須
             的,因為在RTP時戳中只可以運送一個時戳。


             注釋:因為H.264允許解碼順序可以和顯示順序不同, RTP時戳的值針對于RTP序號可以不是單調非減的。而且
             RTCP報告中的抖動區間值可以不是網絡性能問題的指示, as the calculation rules
             for interarrival jitter (section 6.4.1 of RFC 3550) assume that
             the RTP timestamp of a packet is directly proportional to its
             transmission time.


    5.2. RTP 荷載格式的公共結構


       荷載格式定義三個不同的基本荷載結構。一個接收者可以識別荷載結構通過RTP荷載的第一個字節, 
       他也共享為RTP荷載頭,某些情況下,作為荷載的第一個字節。本字節總是結構化為NAL單元頭. 
       NAL單元類型指示目前使用那個結構. 可能的結構如下:
       
       單個NAL單元包: 荷載中只包含一個NAL單元。NAL頭類型域等于原始 NAL單元類型,即在范圍1到23之間. 5.6指定


       聚合包: 本類型用于聚合多個NAL單元到單個RTP荷載中。本包有四種版本,單時間聚合包類型A (STAP-A), 單時間
       聚合包類型B (STAP-B), 多時間聚合包類型(MTAP)16位位移(MTAP16), 多時間聚合包類型(MTAP)24位位移(MTAP24)。
       賦予STAP-A, STAP-B, MTAP16, MTAP24的NAL單元類型號分別是 24, 25, 26, 27。見5.7.


       分片單元: 用于分片單個NAL單元到多個RTP包。現存兩個版本FU-A,FU-B,用NAL單元類型 28,29標識。見5.8.


       Table 1.  單元類型以及荷載結構總結


          Type   Packet    Type name                        Section
          ---------------------------------------------------------
          0      undefined                                    -
          1-23   NAL unit  Single NAL unit packet per H.264   5.6
          24     STAP-A    Single-time aggregation packet     5.7.1
          25     STAP-B    Single-time aggregation packet     5.7.1
          26     MTAP16    Multi-time aggregation packet      5.7.2
          27     MTAP24    Multi-time aggregation packet      5.7.2
          28     FU-A      Fragmentation unit                 5.8
          29     FU-B      Fragmentation unit                 5.8
          30-31  undefined                                    -


          注釋: 本規范沒有限制封裝在單個NAL單元包和分片單元的大小。封裝在聚合包中的 NAL單元大小為65535字節。


    5.3.  NAL單元字節使用


       NAL單元字節的結構語義在1.3節介紹。為了方便,NAL單元類型字節的格式在下面列出:


          +---------------+
          |0|1|2|3|4|5|6|7|
          +-+-+-+-+-+-+-+-+
          |F|NRI|  Type   |
          +---------------+


       本部分根據本規范指定F和NRI的語義。


       F: 1 bit
          forbidden_zero_bit.  A value of 0 indicates that the NAL unit type
          octet and payload should not contain bit errors or other syntax
          violations.  A value of 1 indicates that the NAL unit type octet
          and payload may contain bit errors or other syntax violations.


          MANEs SHOULD set the F bit to indicate detected bit errors in the
          NAL unit.  The H.264 specification requires that the F bit is
          equal to 0.  When the F bit is set, the decoder is advised that
          bit errors or any other syntax violations may be present in the
          payload or in the NAL unit type octet.  The simplest decoder
          reaction to a NAL unit in which the F bit is equal to 1 is to
          discard such a NAL unit and to conceal the lost data in the
          discarded NAL unit.


       NRI: 2 bits
          nal_ref_idc.  0值和非零值的語義與H.264規范保持一致。換句話,00值指示NAL單元的內容不用于重建引用圖像的
          幀見圖像預測。這樣的NAL單元可以被丟棄而不用冒引用圖像完整性的風險。大于00的值指示NAL單元的解碼要求維護
          引用圖像的完整性。
          
          除了上面指定的外, 根據本RTP荷載規范, 大于00的NRI值指示相對傳輸優先級, 象編碼器決定的一樣。 MANE可以使用
          本信息保護更重要的NAL單元。最高的傳輸優先級是11, 依次是 10, 01;00 最低。


             注釋: 任何非零的NRI在H.264 解碼器的處理是相同的。因此,接收者在傳送NAL單元給解碼器時不必操作NRI的值。


          H.264編碼器必須根據H.264規范設置NRI值(subclause 7.4.1)當nal_unit_type 范圍的是1到12. 特別是, H.264規范
          要求對于nal_unit_type為6,9,10,11,12的NAL單元的NRI的值應該為0。


          對于nal_unit_type等于7,8 (指示順序參數集或圖像參數集)的NAL單元,H.264編碼器應該設置NRI為11 (二進制格式)
          對于nal_unit_type等于5的主編碼圖像的編碼片NAL單元(指示編碼片屬于一個IDR圖像), H.264編碼器應設置NRI為11。
          
          對于映射其他的nal_unit_types到NRI值,以下的例子可以使用并且在某些環境有效[13].其它的映射也可以,依賴于應用
          以及使用的H.264/AVC Annex A profile.


             注釋: 在某些profile中數據分區不可用,即 , 在Main或Baseline profiles. 因此, nal單元類型2, 3,4 只出現在
             視頻流符合數據分區被允許的profile情況下,不會出現在符合MAIN/Baseline profile的流中。
             
          Table 2.  編碼片和主編碼參考圖像數據分區的編碼片的NRI值的例子


          NAL Unit Type     Content of NAL unit              NRI (binary)
          ----------------------------------------------------------------
           1              non-IDR coded slice                         10
           2              Coded slice data partition A                10
           3              Coded slice data partition B                01
           4              Coded slice data partition C                01


             注釋: 像以前提起的, 非參考圖像NRI值是00.


          H.264編碼器應該設置冗余編碼參考圖像的編碼片和編碼片分區NAL單元的NRI值為01 (二進制格式).


          對于NAL單元類型24~29的NRI的定義在本文5.7,5.8給出。


          對于nal_unit_type范圍在13到23的NAL單元的NRI值沒有推薦的值,因為這些值保留給ITU-T,ISO/IEC. 
          對于nal_unit_type為0或30,31的NAL單元的NRI值也沒有推薦的值,因為這些值的語義本文沒有指定。


    5.4.  打包方式


       本文指定三種打包方式:


          o 單NAL單元方式
          o 非交錯方式
          o 交錯方式
       
       單NAL單元方式目標是常規的系統,該系統兼容ITU-T H.241 [15] (12.1). 非交錯方式目標是常規系統,可以不符合
       ITU-T H.241建議.在非交錯方式, NAL單元按照NAL單元解碼順序傳送。交錯模式目標是不要求非常低端到端延遲的系統。
       交錯方式允許傳送NAL單元不按照NAL單元解碼順序。


       使用的打包方式可以通過OPTIONAL packetization-mode MIME參數的值指定或外部手段。使用的打包方式控制那個NAL
       單元類型在RTP荷載中允許。表3 總結對每個打包方式允許的NAL單元類型。有些NAL單元類型值(在表3中指示為沒有定義)
       保留為將來擴展. 那些類型的NAL單元不應該被發送者發送,接受者必須忽略他們。例如:
       1-23, 相關的包類型"NAL unit",允許出現在 "單NAL單元方式" 和"非交錯方式", 不允許在"交錯方式".
       打包方式在第六節更詳細解釋。


       表 3.  每個打包方式允許的NAL單元類型總結(yes = 允許, no = 不允許, ig = 忽略)


          Type   Packet    Single NAL    Non-Interleaved    Interleaved
                           Unit Mode           Mode             Mode
          -------------------------------------------------------------


          0      undefined     ig               ig               ig
          1-23   NAL unit     yes              yes               no
          24     STAP-A        no              yes               no
          25     STAP-B        no               no              yes
          26     MTAP16        no               no              yes
          27     MTAP24        no               no              yes
          28     FU-A          no              yes              yes
          29     FU-B          no               no              yes
          30-31  undefined     ig               ig               ig


    5.5.  解碼順序號(DON)


       在交錯打包方式, NAL單元的傳輸順序允許和NAL單元的解碼順序不同。解碼順序號(DON)是荷載結構中的一個域
       或一個獲得變量指示NAL單元的解碼順序。 不按解碼順序傳輸的例子和原理以及DON的使用見13節。
       
       傳輸和解碼順序的耦合由OPTIONAL sprop-interleaving-depth MIME參數控制,見下。當OPTIONAL sprop-interleaving-depth
       MIME 參數的值等于0 (明確或缺省) 或者外部手段不允許傳輸NAL單元順序不同于他們的解碼順序, NAL單元的
       傳輸順序必須和他們的解碼順序一致。當OPTIONAL sprop-interleaving-depth MIME參數的值大于0或者傳輸NAL單元
       與解碼序號不一致通過外部手段被允許時,


       o  在MTAP16/MTAP24中的NAL單元順序不要求是NAL單元的解碼順序
       o  在兩個連續包中的STAP-B, MTAP,FU解嵌套產生的NAL單元序號不要求是NAL單元解碼序號。


       用于單NAL單元包 STAP-A和FU-A的RTP荷載結構不包含DON.  STAP-B,FU-B結構包含DON, MTAP結構允許推導DON象5.7.2指定的一樣.


          注釋:檔FU-A出現在交錯方式,后邊總跟一個FU-B, 他設置自己的DON.


          注釋: 一個傳輸器想封裝單個NAL單元每個包并且傳輸包不按照他們的解碼順序,可以使用STAP-B包類型。


       在單個NAL單元打包方式, NAL單元的傳輸順序,由RTP順序號確定, 必須和他們的NAL單元解碼序號一致。
       在非交錯打包方式中, 在單NAL單元包,STAP-A,FU-A中NAL單元的傳輸順序必須和他們的NAL單元解碼順序一致.
       在一個STAP中的NAL單元必須按照他們的NAL單元解碼順序出現。因此,解碼順序首先由STAP隱含順序提供, 第二
       通過RTP序號提供(對于STAPs, FUs, 單個NAL unit包之間的)。


       對于運送在STAP-B, MTAP以及FU-B開始的一些列分片單元中的NAL單元的DON值的信令在5.7.1, 5.7.2, 指定5.8。
       傳輸順序中的NAL單元的第一個DON值可以設置成任何值,DON值的范圍是0到65535。到達最大值后, DON的值回繞到0.


       包含在STAP-B, MTAP,或FU-B開始的一系列分片單元中的兩個NAL單元的解碼順序按照如下確定:
       DON(i)是索引為i傳輸順序的解碼順序號. 函數don_diff(m,n)定義如下:
       
          If DON(m) == DON(n), don_diff(m,n) = 0


          If (DON(m) < DON(n) and DON(n) - DON(m) < 32768),
          don_diff(m,n) = DON(n) - DON(m)


          If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768),
          don_diff(m,n) = 65536 - DON(m) + DON(n)


          If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768),
          don_diff(m,n) = - (DON(m) + 65536 - DON(n))


          If (DON(m) > DON(n) and DON(m) - DON(n) < 32768),
          don_diff(m,n) = - (DON(m) - DON(n))


       don_diff(m,n)正值指示具有傳輸順序n的NAL單元解碼順序跟在具有傳輸順序m的NAL單元后面。 don_diff(m,n)等于0
       指示NAL單元解碼順序號可以按照任何NAL單元優先。don_diff(m,n)的負值指示索引為n的NAL單元解碼序號先于索引為
       m的NAL單元。
       
       DON相關域的值(DON, DONB, and DOND; 5.7)必須使得上面指定的DON的值確定的解碼器順序號符合NAL單元解碼序號。
       如果兩個NAL解碼單元順序的NAL單元交換,新的順序號不符合NAL單元解碼順序,NAL單元不可以有相同的DON值. 如果
       在一個NAL單元流中兩個連續NAL單元的序號交換并且新的序號仍符合NAL單元解碼順序號,NAL解碼單元可以有相同的
       DON值。例如:當使用的視頻編碼profile允許任意分片順序, 一個編碼圖像的所有編碼片的NAL單元可以有相同的DON
       值。因此,相同DON值的 NAL單元可以按照任何順序解碼,有不同DON值的NAL單元應該按照上面指定的順序傳遞給解碼器。
       當兩個連續的NAL單元解碼順序的NAL單元有不同的DON值, 第二個NAL單元的DON應該是第一個NAL單元的DON值加1。
       
       解包過程恢復NAL單元解碼的例子在第7部分給出。


          注: 接收者不應該預測兩個解碼順序號連續的NAL的DON值的絕對差等于1,甚至在沒有錯誤的傳輸過程。
          沒有要求增加1,就像關聯DON的值到NAL單元的時間一樣, 不可能知道所有NAL單元是否分發給接收者。例如:
          一個網關可以不轉發非引用的編碼的NAL片或SEI NAL 單元,當需要轉發的網絡帶寬不足時。;另外的例子:
          現場廣播被預先編碼的內容不時的打斷,如廣告。預先編碼的第一個內幀圖像事先傳送使得接收端準備可用。
          當傳送第一個內幀時,發送者不能精確知道在解碼順序后的第一個內幀前,有多少NAL單元被編碼。因此, 預編碼
          片斷的第一個內幀的DON值不得不估算,當他們傳送時,因此DON中可能產生空隙。


    5.6.  單個NAL單元包


       定義在此的單個NAL單元包必須只包含一個類型定義在[1]中的NAL單元。這意味聚合包和分片單元不可以用在單個NAL
       單元包中。一個封裝單個NAL單元包到RTP的NAL單元流的RTP序號必須符合NAL單元的解碼順序。單個NAL單元包的結構
       顯示在圖2。


          注: NAL單元的第一字節和RTP荷載頭第一個字節重合。


           0                   1                   2                   3
           0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |F|NRI|  type   |                                               |
          +-+-+-+-+-+-+-+-+                                               |
          |                                                               |
          |               Bytes 2..n of a Single NAL unit                 |
          |                                                               |
          |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |                               :...OPTIONAL RTP padding        |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


          Figure 2.  單個NAL單元包的RTP荷載格式。


    5.7.  聚合包


       聚合包是本荷載規范的NAL單元聚合安排。本計劃的引入是反映兩個主要目標網絡差異巨大的MTU:
       有線IP網絡(MTU 通常被以太網的MTU限制; 大約1500 字節), 基于無線通信系統的IP或非IP (ITU-T
       H.324/M)網絡,它的優先傳輸最大單元是254或更少。為了阻止連個世界媒體的轉換以及避免不必要的打包
       負擔,引入聚合單元安排。


       本規范定義了兩類聚合包:


       o  單時間聚合包(STAP): 聚合相同NALU時間的NAL單元。兩類STAP被定義, 一類不包括DON (STAP-A)另一類包括DON (STAP-B).


       o  多時間聚合包(MTAP): 聚合具有差異NALU時間的NAL單元. 兩個MTAP被定義, 差別在 NAL單元時戳位移長度不同。


       詞語NALU-時間被定義成如果NAL單元被傳輸他自己的RTP包中時RTP的時戳。


       運送在一個聚合包中的每個NAL單元封裝在一個聚合單元中。參見下面四個不同聚合單元和他們的特性。


       聚合包的RTP荷載格式的結構見圖3。


           0                   1                   2                   3
           0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |F|NRI|  type   |                                               |
          +-+-+-+-+-+-+-+-+                                               |
          |                                                               |
          |             one or more aggregation units                     |
          |                                                               |
          |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |                               :...OPTIONAL RTP padding        |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


          圖 3.  聚合包的RTP荷載格式。


       MTAPs,STAPs公用以下打包規則:RTP時戳必須設置為被聚合NAL單元中最早NALU時間。NAL單元類型的類型域必須被設置成
       適當的值,像表4描述的一樣.
       如果聚合NAL單元的F位是0,F位必須清除,否則,則必須被設置。 NRI的值必須是運送在聚合包中NAL單元的最大值。


          表 4.  STAPs和MTAPs的類型域


          Type   Packet    時戳位移域長度(位)   DON相關的域(DON, DONB, DOND)是否存在
          --------------------------------------------------------
          24     STAP-A       0                 no
          25     STAP-B       0                 yes
          26     MTAP16      16                 yes
          27     MTAP24      24                 yes


       RTP頭的marker位設置為聚合包中最后NAL單元如果單獨封裝在RTP傳輸中對應Marker位的值。


       聚合包的荷載由一個或多個聚合單元組成。見5.7.1,5.7.2四個不同類型的聚合單元。一個包聚合包可以運送必要多的
       聚合單元; 但是, 聚合包中整個數據顯然必須適合于一個IP包,并且大小應該選擇使得結果的IP包比MTU小。一個聚合包
       不可以包含5.8中指定的分片單元。聚合包不可以嵌套;即,一個聚合包包含另一個聚合包。
       


    5.7.1. 單時間聚合包


       單時刻聚合包(STAP)應該用于當聚合在一起的NAL單元共享相同的NALU時刻。STAP-A荷載不包括DON,至少包含一個單時刻聚合單元
       見圖4. STAP-B荷載包含一個16位的無符號解碼順序號(DON) (網絡字節序)緊跟至少一個單時刻聚合單元。見圖5.


           0                   1                   2                   3
           0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                          :                                               |
          +-+-+-+-+-+-+-+-+                                               |
          |                                                               |
          |                single-time aggregation units                  |
          |                                                               |
          |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |                               :
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


                         圖 4.  STAP-A荷載格式


           0                   1                   2                   3
           0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                          :  decoding order number (DON)  |               |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
          |                                                               |
          |                single-time aggregation units                  |
          |                                                               |
          |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |                               :
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


                     圖 5.  STAP-B 荷載格式


       DON域指定STAP-B傳輸順序中第一個NAL單元的DON值. 對每個后續出現在STAP-B中的NAL單元,它的DON值等于
       (STAP-B中前一個NAL的DON值+1)%65535, %是取模運算。


       單時刻聚合單元有一個16位無符號大小信息(網絡字節序),他指示后續NAL單元的大小(以字節為單位)(不包括
       這兩個字節,但包括NAL單元類型字節),后面緊跟NAL單元本身, 包括它的NAL單元類型字節. 單時刻聚合單元在RTP荷載
       中是字節對齊的,單可以不是32位字邊界對齊。圖6 表示單時刻聚合單元的結構。


           0                   1                   2                   3
           0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                          :        NAL unit size          |               |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
          |                                                               |
          |                           NAL unit                            |
          |                                                               |
          |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |                               :
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


                  圖 6.  單時刻聚合單元的結構




       圖 7表示一個例子--一個RTP包包含一個STAP-A. STAP包含兩個單時刻聚合單元, 在圖中用1,2標記。


           0                   1                   2                   3
           0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |                          RTP Header                           |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |STAP-A NAL HDR |         NALU 1 Size           | NALU 1 HDR    |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |                         NALU 1 Data                           |
          :                                                               :
          +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |               | NALU 2 Size                   | NALU 2 HDR    |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |                         NALU 2 Data                           |
          :                                                               :
          |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |                               :...OPTIONAL RTP padding        |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


          圖 7.  RTP包包含一個STAP-A. STAP包含兩個單時刻聚合單元


       圖 8 表示一個RTP包包含一個STAP-B. STAP包含兩個單時刻聚合單元, 用 1,2標記。


           0                   1                   2                   3
           0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |                          RTP Header                           |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |STAP-B NAL HDR | DON                           | NALU 1 Size   |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          | NALU 1 Size   | NALU 1 HDR    | NALU 1 Data                   |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
          :                                                               :
          +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |               | NALU 2 Size                   | NALU 2 HDR    |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |                       NALU 2 Data                             |
          :                                                               :
          |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |                               :...OPTIONAL RTP padding        |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


          圖 8.  一個RTP包包含一個STAP-B. STAP包含兩個單時刻聚合單元例子


    5.7.2.  多時刻聚合包(MTAPs)


       多時刻聚合包的NAL單元荷載有16位的無符號解碼順序號基址(DONB) (網絡字節序)以及一個或多個多時刻聚合單元,如
       圖9表示。DONB 必須包含MTAP中NAL單元的第一個NAL的DON的值。


          注釋:NAL解碼順序中的第一個NAL單元不必要是封裝在MTAP中的第一個NAL單元。


           0                   1                   2                   3
           0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                          :  decoding order number base   |               |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
          |                                                               |
          |                 multi-time aggregation units                  |
          |                                                               |
          |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |                               :
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


               圖 9. MTAP的NAL單元荷載格式


       本規范定義兩個不同多時刻聚合單元。兩個都有16位的無符號大小信息用于后續NAL單元(網絡字節序),一個8位無符號解碼序號
       差值(DOND), 和n位 (網絡字節序) 時戳位移(TS 位移)用于本NAL單元,n可以是16/24. 不同MTAP類型的選擇是應用相關的(MTAP16
       /MTAP24): 時戳位移越大, MTAP的靈活性越大, 但是負擔也越大。


       MTAP16/MTAP24多時刻聚合單元的結構分別在圖 10 ,11表示。一個包中的聚合單元的開始/結束不要求位于32位的邊界。
       跟隨NAL單元的DON 等于(DONB + DOND) % 65536,  %代表取摸操作. 本文沒有指定MTAP內的NAL單元如何排序,但大多數
       情況,應該使用NAL單元解碼順序。
       
       時戳位移域必須設置成等于以下公式的值:如果NALU-time大于等于包的RTP時戳,則時戳位移等于(NALU-time - 包的RTP時戳).
       如果NALU-time小于包的RTP時戳,則時戳位移等于 NALU-time + (2^32 - 包的RTP時戳).


           0                   1                   2                   3
           0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          :        NAL unit size          |      DOND     |  TS offset    |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |  TS offset    |                                               |
          +-+-+-+-+-+-+-+-+              NAL unit                         |
          |                                                               |
          |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |                               :
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


                      圖 10.  MTAP16多時刻聚合單元


           0                   1                   2                   3
           0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          :        NALU unit size         |      DOND     |  TS offset    |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |         TS offset             |                               |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
          |                              NAL unit                         |
          |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |                               :
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


                     圖 11.  MTAP24多時刻聚合單元


       一個MTAP中的最早的聚合單元時戳位移必須為0。因此, MTAP的RTP時戳和最早NALU-time相同.


          注釋: 最早多時刻聚合單元是MTAP中所有聚合單元的擴展RTP時戳中的最小者,如果聚合單元封裝在單個NAL單元包中。
          擴展時戳是有多于32位的時戳,有能力計算時戳域的饒回,因此時戳如果繞回能夠確定時戳的最小值。這樣的“最早“聚合
          單元可以不是封裝在MTAP中的第一個聚合單元,最早NAL單元不必和NAL解碼順序的第一個NAL單元相同。


       圖 12 表示一個例子,一個RTP包包含一個多時刻MTAP16類型的聚合包,包括兩個多時刻聚合單元,分別用1,2標記。


           0                   1                   2                   3
           0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |                          RTP Header                           |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |MTAP16 NAL HDR |  decoding order number base   | NALU 1 Size   |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |  NALU 1 Size  |  NALU 1 DOND  |       NALU 1 TS offset        |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |  NALU 1 HDR   |  NALU 1 DATA                                  |
          +-+-+-+-+-+-+-+-+                                               +
          :                                                               :
          +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |               | NALU 2 SIZE                   |  NALU 2 DOND  |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |       NALU 2 TS offset        |  NALU 2 HDR   |  NALU 2 DATA  |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
          :                                                               :
          |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |                               :...OPTIONAL RTP padding        |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


          圖 12. 一個RTP包包含一個多時刻MTAP16類型的聚合包,包括兩個多時刻聚合單元


       圖 13 表示一個例子,一個RTP包包含一個多時刻MTAP24類型的聚合包,包括兩個多時刻聚合單元,分別用1,2標記。


           0                   1                   2                   3
           0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |                          RTP Header                           |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |MTAP24 NAL HDR |  decoding order number base   | NALU 1 Size   |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |  NALU 1 Size  |  NALU 1 DOND  |       NALU 1 TS offs          |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |NALU 1 TS offs |  NALU 1 HDR   |  NALU 1 DATA                  |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
          :                                                               :
          +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |               | NALU 2 SIZE                   |  NALU 2 DOND  |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |       NALU 2 TS offset                        |  NALU 2 HDR   |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |  NALU 2 DATA                                                  |
          :                                                               :
          |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |                               :...OPTIONAL RTP padding        |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


          圖 13.  RTP包包含一個多時刻MTAP24類型的聚合包,包括兩個多時刻聚合單元


    5.8.  分片單元 (FUs)


       本荷載類型允許分片一個NAL單元到幾個RTP包中。在應用層這樣做比依賴于底層(IP)的分片有以下好處:


       o  荷載格式有能力傳輸NAL單元大于64K字節的單元通過IPv4網絡,或許存在預編碼的視頻,特別在高清格式 (
          每個圖像的分片數目有限制,導致每個圖像的NAL單元數目的限制, 從而導致大的 NAL單元).


       o  分派機制允許分片單個圖像并且采用一般向前的糾錯像12.5描述的那樣.


       分片只定義于單個NAL單元不用于任何聚合包。NAL單元的一個分片由整數個連續NAL單元字節組成. 每個NAL單元字節
       必須正好是該NAL單元一個分片的一部分。相同NAL單元的分片必須使用遞增的RTP序號連續順序發送(第一和最后分片之間
       沒有其他的RTP包)。相似, NAL單元必須按照RTP順序號的順序裝配。


       當一個NAL單元被分片運送在分片單元(FUs)中時,被引用為分片NAL單元。STAPs,MTAP不可以被分片。 FUs不可以嵌套。
       即, 一個FU 不可以包含另一個FU.


       運送FU的RTP時戳被設置成分片NAL單元的NALU時刻.


       圖 14 表示FU-A的RTP荷載格式。FU-A由1字節的分片單元指示,1字節的分片單元頭,和分片單元荷載組成。


           0                   1                   2                   3
           0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          | FU indicator  |   FU header   |                               |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
          |                                                               |
          |                         FU payload                            |
          |                                                               |
          |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |                               :...OPTIONAL RTP padding        |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


          圖 14.  FU-A的RTP荷載格式


       圖 15 表示FU-B的RTP荷載格式. FU-B由1字節的分片單元指示,1字節的分片單元頭,和解碼順序號(DON)
       以及分片單元荷載組成。


           0                   1                   2                   3
           0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          | FU indicator  |   FU header   |               DON             |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
          |                                                               |
          |                         FU payload                            |
          |                                                               |
          |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          |                               :...OPTIONAL RTP padding        |
          +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


              圖 15.  FU-B的RTP荷載格式


       對于分片NAL單元的第一個分片如果用于交錯打包方式,則必須使用NAL單元類型FU-B。NAL單元類型FU-B MUST不可以
       用于其他情況。換句話, 在交錯打包方式,每個被分片的NALU,FU-B作為第一個分片,后面跟隨的是一個或多個FU-A分片.


       FU指示字節有以下格式:
          +---------------+
          |0|1|2|3|4|5|6|7|
          +-+-+-+-+-+-+-+-+
          |F|NRI|  Type   |
          +---------------+
       FU指示字節的類型域的28,29表示FU-A和FU-B。F的使用在5。3描述。NRI域的值必須根據分片NAL單元的NRI域的值設置。
       FU頭的格式如下:
          +---------------+
          |0|1|2|3|4|5|6|7|
          +-+-+-+-+-+-+-+-+
          |S|E|R|  Type   |
          +---------------+


       S: 1 bit
          當設置成1,開始位指示分片NAL單元的開始。當跟隨的FU荷載不是分片NAL單元荷載的開始,開始位設為0。
       E: 1 bit
          當設置成1, 結束位指示分片NAL單元的結束,即, 荷載的最后字節也是分片NAL單元的最后一個字節。當跟隨的
          FU荷載不是分片NAL單元的最后分片,結束位設置為0。
       R: 1 bit
          保留位必須設置為0,接收者必須忽略該位。
          
       Type: 5 bits
          NAL單元荷載類型定義在[1]的表7-1.


       FU-B中DON的值的選擇在5.5已經描述.


          注: FU-B中的DON域允許網關分片NAL單元到FU-B而不用組織進來的NAL單元到NAL單元解碼順序。


       一個分片單元不可以傳輸在一個FU中; 即, 開始位和結束位不可以被同時設置在同一個FU頭中。


       FU荷載由分片NAL單元的荷載分片組成,使得如果連續FU的分片單元荷載順序連接, 可以重構分片NAL單元的荷載。
       NAL單元分片的類型字節不包括,就像在分片單元荷載中一樣,但是分片單元的NAL單元的類型信息運送FU指示字節
       的F和NRI域以及FU頭的類型域。一個FU荷載可以有任意字節也可以為空。


          注釋: 空的FUs允許減少某類發送者在幾乎無丟失環境中的延遲。這些發送者特點是他們的NALU完全產生前,可以打
          包NALU分片,因此,在NALU大小未知之前。如果零長度分片不被允許,發送者不得不產生至少一位數據在當前分片被發送
          前. 由于H.264的特性, 有時幾個宏快占據0位,這是不希望的并且增加延遲。但是, (潛在)使用0長度的NALU應該仔細
          權衡增加NALU丟失的風險,因為增加了傳輸包。


       如果一個分片單元丟失,接收者應該丟棄后續的所有分片單元對應于相同分片NAL單元的傳輸順序的分片。


       終端或MANE中的接收者可以聚合前一個NAL單元的n-1分片到一個(不完全的) NAL單元,甚至分片n沒有接收到. 這種情況下,
       NAL單元的forbidden_zero_bit必須被設置成1指示語法違背.


    6.  打包規則


       打包方式在5.2節介紹.  對于多于一個打包方式的公共打包規則在6.1節指定. 單個NAL單元方式
       的打包規則,非交錯方式,交錯方式的打包規則分別在6.2, 6.3,6.4節指定。


    6.1.  公共打包規則


       不管使用那種打包方式,所有發送者必須遵守以下打包規則:


       o  屬于同一編碼圖像(共享相同RTP時戳值)的編碼NAL單元片斷或者編碼數據分區NAL單元片斷可以
          按照定義在[1]中的應用Profile允許的任何順序發送; 但是,對于延遲敏感的系統,他們應該按照
          他們原始編碼順序發送,以減少延遲。注意:編碼順序不必要是掃描順序,而是NAL包對RTP協議
          棧可用的順序。
          
       o  參數集根據8.4節給定的規則和建議處理。


       o  MANEs 不可以重復任何NAL單元,除了順序或圖像參數集NAL單元,同樣本文或者H.264規范也沒有提供
          手段識別重復的NAL單元。順序和圖像參數集NAL單元可以重復使得他們的糾錯接收更可靠,但是,任何
          這樣的重復不可以影響任何活動順序或圖像參數集的內容。重復應該在應用層進行,不應通過復制RTP
          包進行(相同序號)。                          
          
       使用非交錯方式和交錯方式的發送者必須遵守以下打包規則:
       
       o  MANEs可以轉換單個NAL單元包到一個聚合包,轉換一個聚合包到幾個單個NAL單元包,或在RTP轉換器中混合
          兩個概念。RTP轉換器至少應該考慮如下參數:路徑MTU大小, 不平等的保護機制(即,根據RFC 2733通過
          基于包的FEC,特別對于順序和圖像參數集NAL單元以及編碼片斷數據分區NAL單元),系統可以忍受的延遲
          以及接收者緩沖能力。
          注:RTP轉換器要求按照每個RFC3550處理RTCP.


    6.2.  單個NAL單元模式


       本方式應用在OPTIONAL打包方式MIME參數值等于0,不包含打包方式,或者沒有外部手段指示其他的打包方式的時候。
       所有的接收者必須支持本方式。它主要用于低延遲應用(和使用ITU-T H.241建議兼容的系統)。(見12.1節). 
       只有單個NAL單元包可以用在這種方式。STAPs, MTAPs, and FUs 不可以使用。單個NAL單元的傳輸順序必須和NAL
       解碼順序一致。
       
    6.3.  非交錯方式


       本方式應用在OPTIONAL打包方式MIME參數值等于1或者改方式被外部的手段打開時。本方式應該被支持。它主要用于
       低延遲應用。本方式只允許單個NAL單元包, STAP-As, FU-As包。STAP-Bs, MTAPs,FU-Bs不可以使用。NAL單元的傳輸
       順序必須和NAL單元解碼順序一致。


    6.4.  交錯方式


       本方式應用在OPTIONAL打包方式MIME參數值等于2或者改方式被外部的手段打開時。有些接收者可以支持本方式。
       可以使用 STAP-Bs, MTAPs, FU-As,FU-Bs。STAP-As 和單個NAL單元包不可以使用。包和NAL單元傳輸順序的限制
       在5.5節指定。
       
    7.  打包過程 (信息)


       打包過程是實現相關的。因此,下面的描述應該被看成合適實現的例子。其他的方案也可以使用。相關描述算法的優化
       也是可能的。7.1演示單個NAL單元和非交錯打包方式的打包過程,7.2描述交錯方式的打包過程。7.3 包括附加的封裝
       指導對于智能接收者。
       
       所有相關于緩沖區管理正常的RTP機制也適用。特別的,重復的過期的RTP包(由RTP序號/時戳指示)被刪除。 為了確定
       精確的解碼時間, 如可能的延遲因素也被允許為了正確的流之間的同步。


    7.1.  單個NAL單元和非交錯方式


       接收者包括一個接收緩沖區以補償傳輸延遲和抖動。接收者存儲進來的包按照接收順序在接收緩沖區中。包被解封裝
       按照RTP序號的順序。如果封裝包是一個單個NAL單元包,包含在包中的NAL單元直接傳遞給解碼器。如果解封裝的包是
       一個STAP-AI, 包含在包中的NAL單元按照他們在包中的封裝順序傳遞給解碼器。如果解封裝包是一個FU-A, 所有的分
       片NAL單元單分片連接在一起傳遞給解碼器。
       
          信息: 如果解碼器支持任意分片順序,編碼的圖像片可以按照任意順序傳送給解碼器而不管他們的接收傳送順序。


    7.2.  交錯方式


       這些打包規則后面的一般概念是重新排序NAL單元從傳輸順序到NAL單元解碼順序。


       接收者包括一個接收緩沖區以補償傳輸延遲抖動以及重新排序包從傳輸順序到NAL單元解碼順序。本部分,接收者操作
       的描述假設沒有傳輸延遲抖動。為了和實際的差異,一個接收緩沖區也用于補償傳輸延遲抖動,接收者者本部分調用
       解交錯緩沖區。接收者應該準備傳輸延遲抖動;即, 或者保留單獨的緩沖區用于傳輸延遲抖動緩沖和解交錯緩沖或者
       使用接收緩沖用于傳輸延遲抖動和解交錯。而且, 接收者應該考慮傳輸延遲抖動在緩沖區操作時,即,在開始解碼和
       回放前增加緩沖區。


       本部分組織如下: 7.2.1 描述如何計算交錯緩沖區的大小. 7.2.2指定接收過程如何組織接收到的NAL單元到NAL解碼順序。
       
    7.2.1.  解交錯緩沖區的大小


       當 SDP Offer/Answer 模型或其他任何能力交換過程被使用時, 接收流的屬性應該使得接收者的能力不被超過。
       在 SDP Offer/Answer 摸型行中, 接收者可以指示它的能力以分配一個解交錯緩沖區使用deintbuf-cap MIME 參數。
       發送者指示解交錯緩沖區大小的要求使用sprop-deint-buf-req MIME參數. 因此,推薦設置解交錯緩沖區大小(字節數目)
       等于或大于sprop-deint-buf-req MIME 參數指定的值.  參見 8.1 得到更多信息關于 deint-buf-cap和sprop-deint-buf-req 
       MIME參數,8.2.2 關于他們在SDP Offer/Answer模型中的使用。
       
       在會話建立中一個公布的會話描述被使用,sprop-deint-buf-req MIME參數指定交錯緩沖大小的要求。因此,推薦
       設置解交錯緩沖區大小(字節位單位)等于或大于sprop-deint-buf-req MIME 參數的值.


    7.2.2.  解交錯過程


       在接收者中有兩個緩沖狀態: 初始緩沖和正在播放緩沖。初始緩沖發生在RTP會話被初始化時。初始緩沖后,解碼和播放
       開始了, 使用緩沖-播放模型。


       不管緩沖的狀態,接收者存儲進來的NAL單元按照接收順序,在解交錯緩沖區中。聚合包的 NAL單元存儲在單個解交錯緩沖區中
       DON的值被計算為所有NAL單元存儲。


       描述在下面的接收操作需要以下的函數常數幫助:
       
       o  函數AbsDON在8.1指定.


       o  函數don_diff在 5.5 指定.


       o  常數 N 是 OPTIONAL sprop-interleaving-depth MIME 類型參數的值( 8.1)加1.


       初始緩沖持續直到以下條件完成:


       o  在解交錯緩沖區中有 N VCL NAL單元。


       o  如果sprop-max-don-diff存在, don_diff(m,n)大于sprop-max-don-diff的值, 其中 n 對應所有接收到
          的NAL單元中最大AbsDON值的NAL單元,m 對應所有接收到的NAL單元中最小AbsDON值的NAL單元。


       o  初始緩沖區已經持續時間等于或大于 OPTIONAL sprop-init-buf-time MIME 參數指定的值.


       要從解交錯緩沖區刪除的NAL單元的確定如下:


       o  如果解交錯緩沖區包含至少N 個VCL NAL單元,NAL單元被從解交錯緩沖區移出傳遞給解碼器按照下面指定
          的次序直到緩沖區中包含N-1 VCL NAL 單元。
          
       o  如果sprop-max-don-diff存在, 所有的NAL單元 m,他們的don_diff(m,n)大于sprop-max-don-diff的從解交錯
          緩沖區移出傳送給解碼器按照下面指定的順序。在此, n 對應所有接收到的NAL單元中最大AbsDON值的NAL單元。


       NAL單元傳遞給解碼器的順序指定如下:


       o  讓PDON是一個變量RTP會話開始時初始化為0。


       o  對于每個關聯DON的NAL單元, 按如下計算一個DON距離。如果NAL單元的DON大于PDON的值, DON距離等于DON-PDON.
          否則DON距離等于 65535 - PDON + DON + 1.


       o  NAL單元分發給解碼器按照DON距離遞增的順序。如果幾個NAL單元有相同的DON距離,則他們可以按照任意順序遞交給解碼器.


       o  當一定數目的NAL單元傳遞給解碼器, PDON的值設置為傳送給解碼器最后一個NAL單元的DON值。


    7.3. 附加打包規則


       以下附加打包規則可用于實現一個可操作的H.264打包器:


       o  智能RTP接收者 (即在網關中) 可以識別丟失的編碼片斷數據分區A (DPAs). 如果發現丟失的DPA,網關可以決定不發送
          對應的編碼片斷數據分區B和C,因為對于H.264解碼器他們的信息是無意義的。這樣通過丟棄無用的包而不用分析復雜
          的位流,一個MANE可以減少網絡負擔。


       o  智能RTP接收者(即在網關中) 可以識別丟失的FU.  如果發現丟失一個FU, 網關可以決定不發送同一個分片NAL的后續FU
          因為對于H.264解碼器他們的信息是無意義的.這樣通過丟棄無用的包而不用分析復雜的位流,一個MANE可以減少網絡負擔。 


       o  不得不丟棄包或NALU的智能接收者應該首先丟棄所有NAL單元類型中NRI值等于0的包/NALU. 這樣最小化用戶體驗的影響并
          保持參考圖像完整。如果更多的包不得不被丟棄,則NRI值低的包應該在NRI值高的前面被丟棄。但是,丟棄任何NRI值大于
          0的包可能導致解碼器飄移應該被避免。


    8.  荷載格式參數


       This section specifies the parameters that MAY be used to select
       optional features of the payload format and certain features of the
       bitstream.  The parameters are specified here as part of the MIME
       subtype registration for the ITU-T H.264 | ISO/IEC 14496-10 codec.  A
       mapping of the parameters into the Session Description Protocol (SDP)
       [5] is also provided for applications that use SDP.  Equivalent
       parameters could be defined elsewhere for use with control protocols
       that do not use MIME or SDP.


       Some parameters provide a receiver with the properties of the stream
       that will be sent.  The name of all these parameters starts with
       "sprop" for stream properties.  Some of these "sprop" parameters are
       limited by other payload or codec configuration parameters.  For
       example, the sprop-parameter-sets parameter is constrained by the
       profile-level-id parameter.  The media sender selects all "sprop"
       parameters rather than the receiver.  This uncommon characteristic of
       the "sprop" parameters may not be compatible with some signaling
       protocol concepts, in which case the use of these parameters SHOULD
       be avoided.


    8.1.  MIME Registration


       The MIME subtype for the ITU-T H.264 | ISO/IEC 14496-10 codec is
       allocated from the IETF tree.


       The receiver MUST ignore any unspecified parameter.


       Media Type name:     video


       Media subtype name:  H264


       Required parameters: none














    Wenger, et al.              Standards Track                    [Page 37]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




       OPTIONAL parameters:
           profile-level-id:
                            A base16 [6] (hexadecimal) representation of
                            the following three bytes in the sequence
                            parameter set NAL unit specified in [1]: 1)
                            profile_idc, 2) a byte herein referred to as
                            profile-iop, composed of the values of
                            constraint_set0_flag, constraint_set1_flag,
                            constraint_set2_flag, and reserved_zero_5bits
                            in bit-significance order, starting from the
                            most significant bit, and 3) level_idc.  Note
                            that reserved_zero_5bits is required to be
                            equal to 0 in [1], but other values for it may
                            be specified in the future by ITU-T or ISO/IEC.


                            If the profile-level-id parameter is used to
                            indicate properties of a NAL unit stream, it
                            indicates the profile and level that a decoder
                            has to support in order to comply with [1] when
                            it decodes the stream.  The profile-iop byte
                            indicates whether the NAL unit stream also
                            obeys all constraints of the indicated profiles
                            as follows.  If bit 7 (the most significant
                            bit), bit 6, or bit 5 of profile-iop is equal
                            to 1, all constraints of the Baseline profile,
                            the Main profile, or the Extended profile,
                            respectively, are obeyed in the NAL unit
                            stream.


                            If the profile-level-id parameter is used for
                            capability exchange or session setup procedure,
                            it indicates the profile that the codec
                            supports and the highest level
                            supported for the signaled profile.  The
                            profile-iop byte indicates whether the codec
                            has additional limitations whereby only the
                            common subset of the algorithmic features and
                            limitations of the profiles signaled with the
                            profile-iop byte and of the profile indicated
                            by profile_idc is supported by the codec.  For
                            example, if a codec supports only the common
                            subset of the coding tools of the Baseline
                            profile and the Main profile at level 2.1 and
                            below, the profile-level-id becomes 42E015, in
                            which 42 stands for the Baseline profile, E0
                            indicates that only the common subset for all
                            profiles is supported, and 15 indicates level
                            2.1.






    Wenger, et al.              Standards Track                    [Page 38]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




                                Informative note: Capability exchange and
                                session setup procedures should provide
                                means to list the capabilities for each
                                supported codec profile separately.  For
                                example, the one-of-N codec selection
                                procedure of the SDP Offer/Answer model can
                                be used (section 10.2 of [7]).


                            If no profile-level-id is present, the Baseline
                            Profile without additional constraints at Level
                            1 MUST be implied.


           max-mbps, max-fs, max-cpb, max-dpb, and max-br:
                            These parameters MAY be used to signal the
                            capabilities of a receiver implementation.
                            These parameters MUST NOT be used for any other
                            purpose.  The profile-level-id parameter MUST
                            be present in the same receiver capability
                            description that contains any of these
                            parameters.  The level conveyed in the value of
                            the profile-level-id parameter MUST be such
                            that the receiver is fully capable of
                            supporting.  max-mbps, max-fs, max-cpb, max-
                            dpb, and max-br MAY be used to indicate
                            capabilities of the receiver that extend the
                            required capabilities of the signaled level, as
                            specified below.


                            When more than one parameter from the set (max-
                            mbps, max-fs, max-cpb, max-dpb, max-br) is
                            present, the receiver MUST support all signaled
                            capabilities simultaneously.  For example, if
                            both max-mbps and max-br are present, the
                            signaled level with the extension of both the
                            frame rate and bit rate is supported.  That is,
                            the receiver is able to decode NAL unit
                            streams in which the macroblock processing rate
                            is up to max-mbps (inclusive), the bit rate is
                            up to max-br (inclusive), the coded picture
                            buffer size is derived as specified in the
                            semantics of the max-br parameter below, and
                            other properties comply with the level
                            specified in the value of the profile-level-id
                            parameter.


                            A receiver MUST NOT signal values of max-
                            mbps, max-fs, max-cpb, max-dpb, and max-br that
                            meet the requirements of a higher level,






    Wenger, et al.              Standards Track                    [Page 39]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




                            referred to as level A herein, compared to the
                            level specified in the value of the profile-
                            level-id parameter, if the receiver can support
                            all the properties of level A.


                                Informative note: When the OPTIONAL MIME
                                type parameters are used to signal the
                                properties of a NAL unit stream, max-mbps,
                                max-fs, max-cpb, max-dpb, and max-br are
                                not present, and the value of profile-
                                level-id must always be such that the NAL
                                unit stream complies fully with the
                                specified profile and level.


           max-mbps:        The value of max-mbps is an integer indicating
                            the maximum macroblock processing rate in units
                            of macroblocks per second.  The max-mbps
                            parameter signals that the receiver is capable
                            of decoding video at a higher rate than is
                            required by the signaled level conveyed in the
                            value of the profile-level-id parameter.  When
                            max-mbps is signaled, the receiver MUST be able
                            to decode NAL unit streams that conform to the
                            signaled level, with the exception that the
                            MaxMBPS value in Table A-1 of [1] for the
                            signaled level is replaced with the value of
                            max-mbps.  The value of max-mbps MUST be
                            greater than or equal to the value of MaxMBPS
                            for the level given in Table A-1 of [1].
                            Senders MAY use this knowledge to send pictures
                            of a given size at a higher picture rate than
                            is indicated in the signaled level.


           max-fs:          The value of max-fs is an integer indicating
                            the maximum frame size in units of macroblocks.
                            The max-fs parameter signals that the receiver
                            is capable of decoding larger picture sizes
                            than are required by the signaled level conveyed
                            in the value of the profile-level-id parameter.
                            When max-fs is signaled, the receiver MUST be
                            able to decode NAL unit streams that conform to
                            the signaled level, with the exception that the
                            MaxFS value in Table A-1 of [1] for the
                            signaled level is replaced with the value of
                            max-fs.  The value of max-fs MUST be greater
                            than or equal to the value of MaxFS for the
                            level given in Table A-1 of [1].  Senders MAY
                            use this knowledge to send larger pictures at a






    Wenger, et al.              Standards Track                    [Page 40]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




                            proportionally lower frame rate than is
                            indicated in the signaled level.


           max-cpb          The value of max-cpb is an integer indicating
                            the maximum coded picture buffer size in units
                            of 1000 bits for the VCL HRD parameters (see
                            A.3.1 item i of [1]) and in units of 1200 bits
                            for the NAL HRD parameters (see A.3.1 item j of
                            [1]).  The max-cpb parameter signals that the
                            receiver has more memory than the minimum
                            amount of coded picture buffer memory required
                            by the signaled level conveyed in the value of
                            the profile-level-id parameter.  When max-cpb
                            is signaled, the receiver MUST be able to
                            decode NAL unit streams that conform to the
                            signaled level, with the exception that the
                            MaxCPB value in Table A-1 of [1] for the
                            signaled level is replaced with the value of
                            max-cpb.  The value of max-cpb MUST be greater
                            than or equal to the value of MaxCPB for the
                            level given in Table A-1 of [1].  Senders MAY
                            use this knowledge to construct coded video
                            streams with greater variation of bit rate
                            than can be achieved with the
                            MaxCPB value in Table A-1 of [1].


                                Informative note: The coded picture buffer
                                is used in the hypothetical reference
                                decoder (Annex C) of H.264.  The use of the
                                hypothetical reference decoder is
                                recommended in H.264 encoders to verify
                                that the produced bitstream conforms to the
                                standard and to control the output bitrate.
                                Thus, the coded picture buffer is
                                conceptually independent of any other
                                potential buffers in the receiver,
                                including de-interleaving and de-jitter
                                buffers.  The coded picture buffer need not
                                be implemented in decoders as specified in
                                Annex C of H.264, but rather standard-
                                compliant decoders can have any buffering
                                arrangements provided that they can decode
                                standard-compliant bitstreams.  Thus, in
                                practice, the input buffer for video
                                decoder can be integrated with de-
                                interleaving and de-jitter buffers of the
                                receiver.








    Wenger, et al.              Standards Track                    [Page 41]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




           max-dpb:         The value of max-dpb is an integer indicating
                            the maximum decoded picture buffer size in
                            units of 1024 bytes.  The max-dpb parameter
                            signals that the receiver has more memory than
                            the minimum amount of decoded picture buffer
                            memory required by the signaled level conveyed
                            in the value of the profile-level-id parameter.
                            When max-dpb is signaled, the receiver MUST be
                            able to decode NAL unit streams that conform to
                            the signaled level, with the exception that the
                            MaxDPB value in Table A-1 of [1] for the
                            signaled level is replaced with the value of
                            max-dpb.  Consequently, a receiver that signals
                            max-dpb MUST be capable of storing the
                            following number of decoded frames,
                            complementary field pairs, and non-paired
                            fields in its decoded picture buffer:


                            Min(1024 * max-dpb / ( PicWidthInMbs *
                            FrameHeightInMbs * 256 * ChromaFormatFactor ),
                            16)


                            PicWidthInMbs, FrameHeightInMbs, and
                            ChromaFormatFactor are defined in [1].


                            The value of max-dpb MUST be greater than or
                            equal to the value of MaxDPB for the level
                            given in Table A-1 of [1].  Senders MAY use
                            this knowledge to construct coded video streams
                            with improved compression.


                                Informative note: This parameter was added
                                primarily to complement a similar codepoint
                                in the ITU-T Recommendation H.245, so as to
                                facilitate signaling gateway designs.  The
                                decoded picture buffer stores reconstructed
                                samples and is a property of the video
                                decoder only.  There is no relationship
                                between the size of the decoded picture
                                buffer and the buffers used in RTP,
                                especially de-interleaving and de-jitter
                                buffers.


           max-br:          The value of max-br is an integer indicating
                            the maximum video bit rate in units of 1000
                            bits per second for the VCL HRD parameters (see
                            A.3.1 item i of [1]) and in units of 1200 bits








    Wenger, et al.              Standards Track                    [Page 42]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




                            per second for the NAL HRD parameters (see
                            A.3.1 item j of [1]).


                            The max-br parameter signals that the video
                            decoder of the receiver is capable of decoding
                            video at a higher bit rate than is required by
                            the signaled level conveyed in the value of the
                            profile-level-id parameter.  The value of max-
                            br MUST be greater than or equal to the value
                            of MaxBR for the level given in Table A-1 of
                            [1].


                            When max-br is signaled, the video codec of the
                            receiver MUST be able to decode NAL unit
                            streams that conform to the signaled level,
                            conveyed in the profile-level-id parameter,
                            with the following exceptions in the limits
                            specified by the level:
                            o The value of max-br replaces the MaxBR value
                              of the signaled level (in Table A-1 of [1]).
                            o When the max-cpb parameter is not present,
                              the result of the following formula replaces
                              the value of MaxCPB in Table A-1 of [1]:
                              (MaxCPB of the signaled level) * max-br /
                              (MaxBR of the signaled level).


                            For example, if a receiver signals capability
                            for Level 1.2 with max-br equal to 1550, this
                            indicates a maximum video bitrate of 1550
                            kbits/sec for VCL HRD parameters, a maximum
                            video bitrate of 1860 kbits/sec for NAL HRD
                            parameters, and a CPB size of 4036458 bits
                            (1550000 / 384000 * 1000 * 1000).


                            The value of max-br MUST be greater than or
                            equal to the value MaxBR for the signaled level
                            given in Table A-1 of [1].


                            Senders MAY use this knowledge to send higher
                            bitrate video as allowed in the level
                            definition of Annex A of H.264, to achieve
                            improved video quality.


                                Informative note: This parameter was added
                                primarily to complement a similar codepoint
                                in the ITU-T Recommendation H.245, so as to
                                facilitate signaling gateway designs.  No
                                assumption can be made from the value of






    Wenger, et al.              Standards Track                    [Page 43]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




                                this parameter that the network is capable
                                of handling such bit rates at any given
                                time.  In particular, no conclusion can be
                                drawn that the signaled bit rate is
                                possible under congestion control
                                constraints.


          redundant-pic-cap:
                            This parameter signals the capabilities of a
                            receiver implementation.  When equal to 0, the
                            parameter indicates that the receiver makes no
                            attempt to use redundant coded pictures to
                            correct incorrectly decoded primary coded
                            pictures.  When equal to 0, the receiver is not
                            capable of using redundant slices; therefore, a
                            sender SHOULD avoid sending redundant slices to
                            save bandwidth.  When equal to 1, the receiver
                            is capable of decoding any such redundant slice
                            that covers a corrupted area in a primary
                            decoded picture (at least partly), and therefore
                            a sender MAY send redundant slices.  When the
                            parameter is not present, then a value of 0
                            MUST be used for redundant-pic-cap.  When
                            present, the value of redundant-pic-cap MUST be
                            either 0 or 1.


                            When the profile-level-id parameter is present
                            in the same capability signaling as the
                            redundant-pic-cap parameter, and the profile
                            indicated in profile-level-id is such that it
                            disallows the use of redundant coded pictures
                            (e.g., Main Profile), the value of redundant-
                            pic-cap MUST be equal to 0.  When a receiver
                            indicates redundant-pic-cap equal to 0, the
                            received stream SHOULD NOT contain redundant
                            coded pictures.


                                Informative note: Even if redundant-pic-cap
                                is equal to 0, the decoder is able to
                                ignore redundant codec pictures provided
                                that the decoder supports such a profile
                                (Baseline, Extended) in which redundant
                                coded pictures are allowed.


                                Informative note: Even if redundant-pic-cap
                                is equal to 1, the receiver may also choose
                                other error concealment strategies to








    Wenger, et al.              Standards Track                    [Page 44]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




                                replace or complement decoding of redundant
                                slices.


           sprop-parameter-sets:
                            This parameter MAY be used to convey
                            any sequence and picture parameter set NAL
                            units (herein referred to as the initial
                            parameter set NAL units) that MUST precede any
                            other NAL units in decoding order.  The
                            parameter MUST NOT be used to indicate codec
                            capability in any capability exchange
                            procedure.  The value of the parameter is the
                            base64 [6] representation of the initial
                            parameter set NAL units as specified in
                            sections 7.3.2.1 and 7.3.2.2 of [1].  The
                            parameter sets are conveyed in decoding order,
                            and no framing of the parameter set NAL units
                            takes place.  A comma is used to separate any
                            pair of parameter sets in the list.  Note that
                            the number of bytes in a parameter set NAL unit
                            is typically less than 10, but a picture
                            parameter set NAL unit can contain several
                            hundreds of bytes.


                               Informative note: When several payload
                               types are offered in the SDP Offer/Answer
                               model, each with its own sprop-parameter-
                               sets parameter, then the receiver cannot
                               assume that those parameter sets do not use
                               conflicting storage locations (i.e.,
                               identical values of parameter set
                               identifiers).  Therefore, a receiver should
                               double-buffer all sprop-parameter-sets and
                               make them available to the decoder instance
                               that decodes a certain payload type.


           parameter-add:   This parameter MAY be used to signal whether
                            the receiver of this parameter is allowed to
                            add parameter sets in its signaling response
                            using the sprop-parameter-sets MIME parameter.
                            The value of this parameter is either 0 or 1.
                            0 is equal to false; i.e., it is not allowed to
                            add parameter sets.  1 is equal to true; i.e.,
                            it is allowed to add parameter sets.  If the
                            parameter is not present, its value MUST be 1.












    Wenger, et al.              Standards Track                    [Page 45]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




           packetization-mode:
                            This parameter signals the properties of an
                            RTP payload type or the capabilities of a
                            receiver implementation.  Only a single
                            configuration point can be indicated; thus,
                            when capabilities to support more than one
                            packetization-mode are declared, multiple
                            configuration points (RTP payload types) must
                            be used.


                            When the value of packetization-mode is equal
                            to 0 or packetization-mode is not present, the
                            single NAL mode, as defined in section 6.2 of
                            RFC 3984, MUST be used.  This mode is in use in
                            standards using ITU-T Recommendation H.241 [15]
                            (see section 12.1).  When the value of
                            packetization-mode is equal to 1, the non-
                            interleaved mode, as defined in section 6.3 of
                            RFC 3984, MUST be used.  When the value of
                            packetization-mode is equal to 2, the
                            interleaved mode, as defined in section 6.4 of
                            RFC 3984, MUST be used.  The value of
                            packetization mode MUST be an integer in the
                            range of 0 to 2, inclusive.


           sprop-interleaving-depth:
                            This parameter MUST NOT be present
                            when packetization-mode is not present or the
                            value of packetization-mode is equal to 0 or 1.
                            This parameter MUST be present when the value
                            of packetization-mode is equal to 2.


                            This parameter signals the properties of a NAL
                            unit stream.  It specifies the maximum number
                            of VCL NAL units that precede any VCL NAL unit
                            in the NAL unit stream in transmission order
                            and follow the VCL NAL unit in decoding order.
                            Consequently, it is guaranteed that receivers
                            can reconstruct NAL unit decoding order when
                            the buffer size for NAL unit decoding order
                            recovery is at least the value of sprop-
                            interleaving-depth + 1 in terms of VCL NAL
                            units.


                            The value of sprop-interleaving-depth MUST be
                            an integer in the range of 0 to 32767,
                            inclusive.








    Wenger, et al.              Standards Track                    [Page 46]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




           sprop-deint-buf-req:
                            This parameter MUST NOT be present when
                            packetization-mode is not present or the value
                            of packetization-mode is equal to 0 or 1.  It
                            MUST be present when the value of
                            packetization-mode is equal to 2.


                            sprop-deint-buf-req signals the required size
                            of the deinterleaving buffer for the NAL unit
                            stream.  The value of the parameter MUST be
                            greater than or equal to the maximum buffer
                            occupancy (in units of bytes) required in such
                            a deinterleaving buffer that is specified in
                            section 7.2 of RFC 3984.  It is guaranteed that
                            receivers can perform the deinterleaving of
                            interleaved NAL units into NAL unit decoding
                            order, when the deinterleaving buffer size is
                            at least the value of sprop-deint-buf-req in
                            terms of bytes.


                            The value of sprop-deint-buf-req MUST be an
                            integer in the range of 0 to 4294967295,
                            inclusive.


                                Informative note: sprop-deint-buf-req
                                indicates the required size of the
                                deinterleaving buffer only.  When network
                                jitter can occur, an appropriately sized
                                jitter buffer has to be provisioned for
                                as well.


           deint-buf-cap:   This parameter signals the capabilities of a
                            receiver implementation and indicates the
                            amount of deinterleaving buffer space in units
                            of bytes that the receiver has available for
                            reconstructing the NAL unit decoding order.  A
                            receiver is able to handle any stream for which
                            the value of the sprop-deint-buf-req parameter
                            is smaller than or equal to this parameter.


                            If the parameter is not present, then a value
                            of 0 MUST be used for deint-buf-cap.  The value
                            of deint-buf-cap MUST be an integer in the
                            range of 0 to 4294967295, inclusive.


                                Informative note: deint-buf-cap indicates
                                the maximum possible size of the
                                deinterleaving buffer of the receiver only.






    Wenger, et al.              Standards Track                    [Page 47]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




                                When network jitter can occur, an
                                appropriately sized jitter buffer has to
                                be provisioned for as well.


           sprop-init-buf-time:
                            This parameter MAY be used to signal the
                            properties of a NAL unit stream.  The parameter
                            MUST NOT be present, if the value of
                            packetization-mode is equal to 0 or 1.


                            The parameter signals the initial buffering
                            time that a receiver MUST buffer before
                            starting decoding to recover the NAL unit
                            decoding order from the transmission order.
                            The parameter is the maximum value of
                            (transmission time of a NAL unit - decoding
                            time of the NAL unit), assuming reliable and
                            instantaneous transmission, the same
                            timeline for transmission and decoding, and
                            that decoding starts when the first packet
                            arrives.


                            An example of specifying the value of sprop-
                            init-buf-time follows.  A NAL unit stream is
                            sent in the following interleaved order, in
                            which the value corresponds to the decoding
                            time and the transmission order is from left to
                            right:


                            0  2  1  3  5  4  6  8  7 ...


                            Assuming a steady transmission rate of NAL
                            units, the transmission times are:


                            0  1  2  3  4  5  6  7  8 ...


                            Subtracting the decoding time from the
                            transmission time column-wise results in the
                            following series:


                            0 -1  1  0 -1  1  0 -1  1 ...


                            Thus, in terms of intervals of NAL unit
                            transmission times, the value of
                            sprop-init-buf-time in this
                            example is 1.










    Wenger, et al.              Standards Track                    [Page 48]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




                            The parameter is coded as a non-negative base10
                            integer representation in clock ticks of a 90-
                            kHz clock.  If the parameter is not present,
                            then no initial buffering time value is
                            defined.  Otherwise the value of sprop-init-
                            buf-time MUST be an integer in the range of 0
                            to 4294967295, inclusive.


                            In addition to the signaled sprop-init-buf-
                            time, receivers SHOULD take into account the
                            transmission delay jitter buffering, including
                            buffering for the delay jitter caused by
                            mixers, translators, gateways, proxies,
                            traffic-shapers, and other network elements.


           sprop-max-don-diff:
                            This parameter MAY be used to signal the
                            properties of a NAL unit stream.  It MUST NOT
                            be used to signal transmitter or receiver or
                            codec capabilities.  The parameter MUST NOT be
                            present if the value of packetization-mode is
                            equal to 0 or 1.  sprop-max-don-diff is an
                            integer in the range of 0 to 32767, inclusive.
                            If sprop-max-don-diff is not present, the value
                            of the parameter is unspecified.  sprop-max-
                            don-diff is calculated as follows:


                            sprop-max-don-diff = max{AbsDON(i) -
                            AbsDON(j)},
                            for any i and any j>i,


                            where i and j indicate the index of the NAL
                            unit in the transmission order and AbsDON
                            denotes a decoding order number of the NAL
                            unit that does not wrap around to 0 after
                            65535.  In other words, AbsDON is calculated as
                            follows: Let m and n be consecutive NAL units
                            in transmission order.  For the very first NAL
                            unit in transmission order (whose index is 0),
                            AbsDON(0) = DON(0).  For other NAL units,
                            AbsDON is calculated as follows:


                            If DON(m) == DON(n), AbsDON(n) = AbsDON(m)


                            If (DON(m) < DON(n) and DON(n) - DON(m) <
                            32768),
                            AbsDON(n) = AbsDON(m) + DON(n) - DON(m)








    Wenger, et al.              Standards Track                    [Page 49]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




                            If (DON(m) > DON(n) and DON(m) - DON(n) >=
                            32768),
                            AbsDON(n) = AbsDON(m) + 65536 - DON(m) + DON(n)


                            If (DON(m) < DON(n) and DON(n) - DON(m) >=
                            32768),


                            AbsDON(n) = AbsDON(m) - (DON(m) + 65536 -
                            DON(n))


                            If (DON(m) > DON(n) and DON(m) - DON(n) <
                            32768),
                            AbsDON(n) = AbsDON(m) - (DON(m) - DON(n))


                            where DON(i) is the decoding order number of
                            the NAL unit having index i in the transmission
                            order.  The decoding order number is specified
                            in section 5.5 of RFC 3984.


                                Informative note: Receivers may use sprop-
                                max-don-diff to trigger which NAL units in
                                the receiver buffer can be passed to the
                                decoder.


         max-rcmd-nalu-size:
                            This parameter MAY be used to signal the
                            capabilities of a receiver.  The parameter MUST
                            NOT be used for any other purposes.  The value
                            of the parameter indicates the largest NALU
                            size in bytes that the receiver can handle
                            efficiently.  The parameter value is a
                            recommendation, not a strict upper boundary.
                            The sender MAY create larger NALUs but must be
                            aware that the handling of these may come at a
                            higher cost than NALUs conforming to the
                            limitation.


                            The value of max-rcmd-nalu-size MUST be an
                            integer in the range of 0 to 4294967295,
                            inclusive.  If this parameter is not specified,
                            no known limitation to the NALU size exists.
                            Senders still have to consider the MTU size
                            available between the sender and the receiver
                            and SHOULD run MTU discovery for this purpose.


                            This parameter is motivated by, for example, an
                            IP to H.223 video telephony gateway, where
                            NALUs smaller than the H.223 transport data






    Wenger, et al.              Standards Track                    [Page 50]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




                            unit will be more efficient.  A gateway may
                            terminate IP; thus, MTU discovery will normally
                            not work beyond the gateway.


                                Informative note: Setting this parameter to
                                a lower than necessary value may have a
                                negative impact.


       Encoding considerations:
                            This type is only defined for transfer via RTP
                            (RFC 3550).


                            A file format of H.264/AVC video is defined in
                            [29].  This definition is utilized by other
                            file formats, such as the 3GPP multimedia file
                            format (MIME type video/3gpp) [30] or the MP4
                            file format (MIME type video/mp4).


       Security considerations:
                            See section 9 of RFC 3984.


       Public specification:
                            Please refer to RFC 3984 and its section 15.


       Additional information:
                            None


       File extensions:     none
       Macintosh file type code: none
       Object identifier or OID: none


       Person & email address to contact for further information:
                            stewe@stewe.org


       Intended usage:      COMMON


       Author:
                            stewe@stewe.org
       Change controller:
                            IETF Audio/Video Transport working group
                            delegated from the IESG.




















    Wenger, et al.              Standards Track                    [Page 51]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




    8.2.  SDP Parameters


    8.2.1.  Mapping of MIME Parameters to SDP


       The MIME media type video/H264 string is mapped to fields in the
       Session Description Protocol (SDP) [5] as follows:


       o  The media name in the "m=" line of SDP MUST be video.


       o  The encoding name in the "a=rtpmap" line of SDP MUST be H264 (the
          MIME subtype).


       o  The clock rate in the "a=rtpmap" line MUST be 90000.


       o  The OPTIONAL parameters "profile-level-id", "max-mbps", "max-fs",
          "max-cpb", "max-dpb", "max-br", "redundant-pic-cap", "sprop-
          parameter-sets", "parameter-add", "packetization-mode", "sprop-
          interleaving-depth", "deint-buf-cap", "sprop-deint-buf-req",
          "sprop-init-buf-time", "sprop-max-don-diff", and "max-rcmd-nalu-
          size", when present, MUST be included in the "a=fmtp" line of SDP.
          These parameters are expressed as a MIME media type string, in the
          form of a semicolon separated list of parameter=value pairs.


       An example of media representation in SDP is as follows (Baseline
       Profile, Level 3.0, some of the constraints of the Main profile may
       not be obeyed):


          m=video 49170 RTP/AVP 98
          a=rtpmap:98 H264/90000
          a=fmtp:98 profile-level-id=42A01E;
                    sprop-parameter-sets=Z0IACpZTBYmI,aMljiA==


    8.2.2.  Usage with the SDP Offer/Answer Model


       When H.264 is offered over RTP using SDP in an Offer/Answer model [7]
       for negotiation for unicast usage, the following limitations and
       rules apply:


       o  The parameters identifying a media format configuration for H.264
          are "profile-level-id", "packetization-mode", and, if required by
          "packetization-mode", "sprop-deint-buf-req".  These three
          parameters MUST be used symmetrically; i.e., the answerer MUST
          either maintain all configuration parameters or remove the media
          format (payload type) completely, if one or more of the parameter
          values are not supported.












    Wenger, et al.              Standards Track                    [Page 52]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




             Informative note: The requirement for symmetric use applies
             only for the above three parameters and not for the other
             stream properties and capability parameters.


          To simplify handling and matching of these configurations, the
          same RTP payload type number used in the offer SHOULD also be used
          in the answer, as specified in [7].  An answer MUST NOT contain a
          payload type number used in the offer unless the configuration
          ("profile-level-id", "packetization-mode", and, if present,
          "sprop-deint-buf-req") is the same as in the offer.


             Informative note: An offerer, when receiving the answer, has to
             compare payload types not declared in the offer based on media
             type (i.e., video/h264) and the above three parameters with any
             payload types it has already declared, in order to determine
             whether the configuration in question is new or equivalent to a
             configuration already offered.


       o  The parameters "sprop-parameter-sets", "sprop-deint-buf-req",
          "sprop-interleaving-depth", "sprop-max-don-diff", and "sprop-
          init-buf-time" describe the properties of the NAL unit stream that
          the offerer or answerer is sending for this media format
          configuration.  This differs from the normal usage of the
          Offer/Answer parameters: normally such parameters declare the
          properties of the stream that the offerer or the answerer is able
          to receive.  When dealing with H.264, the offerer assumes that the
          answerer will be able to receive media encoded using the
          configuration being offered.


             Informative note: The above parameters apply for any stream
             sent by the declaring entity with the same configuration; i.e.,
             they are dependent on their source.  Rather then being bound to
             the payload type, the values may have to be applied to another
             payload type when being sent, as they apply for the
             configuration.


       o  The capability parameters ("max-mbps", "max-fs", "max-cpb", "max-
          dpb", "max-br", ,"redundant-pic-cap", "max-rcmd-nalu-size") MAY be
          used to declare further capabilities.  Their interpretation
          depends on the direction attribute.  When the direction attribute
          is sendonly, then the parameters describe the limits of the RTP
          packets and the NAL unit stream that the sender is capable of
          producing.  When the direction attribute is sendrecv or recvonly,
          then the parameters describe the limitations of what the receiver
          accepts.












    Wenger, et al.              Standards Track                    [Page 53]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




       o  As specified above, an offerer has to include the size of the
          deinterleaving buffer in the offer for an interleaved H.264
          stream.  To enable the offerer and answerer to inform each other
          about their capabilities for deinterleaving buffering, both
          parties are RECOMMENDED to include "deint-buf-cap".  This
          information MAY be used when the value for "sprop-deint-buf-req"
          is selected in a second round of offer and answer.  For
          interleaved streams, it is also RECOMMENDED to consider offering
          multiple payload types with different buffering requirements when
          the capabilities of the receiver are unknown.


       o  The "sprop-parameter-sets" parameter is used as described above.
          In addition, an answerer MUST maintain all parameter sets received
          in the offer in its answer.  Depending on the value of the
          "parameter-add" parameter, different rules apply: If "parameter-
          add" is false (0), the answer MUST NOT add any additional
          parameter sets.  If "parameter-add" is true (1), the answerer, in
          its answer, MAY add additional parameter sets to the "sprop-
          parameter-sets" parameter.  The answerer MUST also, independent of
          the value of "parameter-add", accept to receive a video stream
          using the sprop-parameter-sets it declared in the answer.


             Informative note: care must be taken when parameter sets are
             added not to cause overwriting of already transmitted parameter
             sets by using conflicting parameter set identifiers.


       For streams being delivered over multicast, the following rules apply
       in addition:


       o  The stream properties parameters ("sprop-parameter-sets", "sprop-
          deint-buf-req", "sprop-interleaving-depth", "sprop-max-don-diff",
          and "sprop-init-buf-time") MUST NOT be changed by the answerer.
          Thus, a payload type can either be accepted unaltered or removed.


       o  The receiver capability parameters "max-mbps", "max-fs", "max-
          cpb", "max-dpb", "max-br", and "max-rcmd-nalu-size" MUST be
          supported by the answerer for all streams declared as sendrecv or
          recvonly; otherwise, one of the following actions MUST be
          performed: the media format is removed, or the session rejected.


       o  The receiver capability parameter redundant-pic-cap SHOULD be
          supported by the answerer for all streams declared as sendrecv or
          recvonly as follows:  The answerer SHOULD NOT include redundant
          coded pictures in the transmitted stream if the offerer indicated
          redundant-pic-cap equal to 0.  Otherwise (when redundant_pic_cap
          is equal to 1), it is beyond the scope of this memo to recommend
          how the answerer should use redundant coded pictures.








    Wenger, et al.              Standards Track                    [Page 54]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




       Below are the complete lists of how the different parameters shall be
       interpreted in the different combinations of offer or answer and
       direction attribute.


       o  In offers and answers for which "a=sendrecv" or no direction
          attribute is used, or in offers and answers for which "a=recvonly"
          is used, the following interpretation of the parameters MUST be
          used.


          Declaring actual configuration or properties for receiving:


             - profile-level-id
             - packetization-mode


          Declaring actual properties of the stream to be sent (applicable
          only when "a=sendrecv" or no direction attribute is used):


             - sprop-deint-buf-req
             - sprop-interleaving-depth
             - sprop-parameter-sets
             - sprop-max-don-diff
             - sprop-init-buf-time


          Declaring receiver implementation capabilities:


             - max-mbps
             - max-fs
             - max-cpb
             - max-dpb
             - max-br
             - redundant-pic-cap
             - deint-buf-cap
             - max-rcmd-nalu-size


          Declaring how Offer/Answer negotiation shall be performed:


             - parameter-add


       o  In an offer or answer for which the direction attribute
          "a=sendonly" is included for the media stream, the following
          interpretation of the parameters MUST be used:


          Declaring actual configuration and properties of stream proposed
          to be sent:


             - profile-level-id
             - packetization-mode
             - sprop-deint-buf-req






    Wenger, et al.              Standards Track                    [Page 55]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




             - sprop-max-don-diff
             - sprop-init-buf-time
             - sprop-parameter-sets
             - sprop-interleaving-depth


          Declaring the capabilities of the sender when it receives a
          stream:


             - max-mbps
             - max-fs
             - max-cpb
             - max-dpb
             - max-br
             - redundant-pic-cap
             - deint-buf-cap
             - max-rcmd-nalu-size


          Declaring how Offer/Answer negotiation shall be performed:


             - parameter-add


       Furthermore, the following considerations are necessary:


       o  Parameters used for declaring receiver capabilities are in general
          downgradable; i.e., they express the upper limit for a sender's
          possible behavior.  Thus a sender MAY select to set its encoder
          using only lower/lesser or equal values of these parameters.
          "sprop-parameter-sets" MUST NOT be used in a sender's declaration
          of its capabilities, as the limits of the values that are carried
          inside the parameter sets are implicit with the profile and level
          used.


       o  Parameters declaring a configuration point are not downgradable,
          with the exception of the level part of the "profile-level-id"
          parameter.  This expresses values a receiver expects to be used
          and must be used verbatim on the sender side.


       o  When a sender's capabilities are declared, and non-downgradable
          parameters are used in this declaration, then these parameters
          express a configuration that is acceptable.  In order to achieve
          high interoperability levels, it is often advisable to offer
          multiple alternative configurations; e.g., for the packetization
          mode.  It is impossible to offer multiple configurations in a
          single payload type.  Thus, when multiple configuration offers are
          made, each offer requires its own RTP payload type associated with
          the offer.










    Wenger, et al.              Standards Track                    [Page 56]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




       o  A receiver SHOULD understand all MIME parameters, even if it only
          supports a subset of the payload format's functionality.  This
          ensures that a receiver is capable of understanding when an offer
          to receive media can be downgraded to what is supported by the
          receiver of the offer.


       o  An answerer MAY extend the offer with additional media format
          configurations.  However, to enable their usage, in most cases a
          second offer is required from the offerer to provide the stream
          properties parameters that the media sender will use.  This also
          has the effect that the offerer has to be able to receive this
          media format configuration, not only to send it.


       o  If an offerer wishes to have non-symmetric capabilities between
          sending and receiving, the offerer has to offer different RTP
          sessions; i.e., different media lines declared as "recvonly" and
          "sendonly", respectively.  This may have further implications on
          the system.


    8.2.3.  Usage in Declarative Session Descriptions


       When H.264 over RTP is offered with SDP in a declarative style, as in
       RTSP [27] or SAP [28], the following considerations are necessary.


       o  All parameters capable of indicating the properties of both a NAL
          unit stream and a receiver are used to indicate the properties of
          a NAL unit stream.  For example, in this case, the parameter
          "profile-level-id" declares the values used by the stream, instead
          of the capabilities of the sender.  This results in that the
          following interpretation of the parameters MUST be used:


          Declaring actual configuration or properties:


             - profile-level-id
             - sprop-parameter-sets
             - packetization-mode
             - sprop-interleaving-depth
             - sprop-deint-buf-req
             - sprop-max-don-diff
             - sprop-init-buf-time






















    Wenger, et al.              Standards Track                    [Page 57]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




          Not usable:


             - max-mbps
             - max-fs
             - max-cpb
             - max-dpb
             - max-br
             - redundant-pic-cap
             - max-rcmd-nalu-size
             - parameter-add
             - deint-buf-cap


       o  A receiver of the SDP is required to support all parameters and
          values of the parameters provided; otherwise, the receiver MUST
          reject (RTSP) or not participate in (SAP) the session.  It falls
          on the creator of the session to use values that are expected to
          be supported by the receiving application.


    8.3.  Examples


       A SIP Offer/Answer exchange wherein both parties are expected to both
       send and receive could look like the following.  Only the media codec
       specific parts of the SDP are shown.  Some lines are wrapped due to
       text constraints.


          Offerer -> Answer SDP message:


          m=video 49170 RTP/AVP 100 99 98
          a=rtpmap:98 H264/90000
          a=fmtp:98 profile-level-id=42A01E; packetization-mode=0;
                    sprop-parameter-sets=Z0IACpZTBYmI,aMljiA==
          a=rtpmap:99 H264/90000
          a=fmtp:99 profile-level-id=42A01E; packetization-mode=1;
                    sprop-parameter-sets=Z0IACpZTBYmI,aMljiA==
          a=rtpmap:100 H264/90000
          a=fmtp:100 profile-level-id=42A01E; packetization-mode=2;
                     sprop-parameter-sets=Z0IACpZTBYmI,aMljiA==;
                     sprop-interleaving-depth=45; sprop-deint-buf-req=64000;
                     sprop-init-buf-time=102478; deint-buf-cap=128000


       The above offer presents the same codec configuration in three
       different packetization formats.  PT 98 represents single NALU mode,
       PT 99 non-interleaved mode; PT 100 indicates the interleaved mode.
       In the interleaved mode case, the interleaving parameters that the
       offerer would use if the answer indicates support for PT 100 are also
       included.  In all three cases the parameter "sprop-parameter-sets"
       conveys the initial parameter sets that are required for the answerer
       when receiving a stream from the offerer when this configuration






    Wenger, et al.              Standards Track                    [Page 58]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




       (profile-level-id and packetization mode) is accepted.  Note that the
       value for "sprop-parameter-sets", although identical in the example
       above, could be different for each payload type.


         Answerer -> Offerer SDP message:


         m=video 49170 RTP/AVP 100 99 97
         a=rtpmap:97 H264/90000
         a=fmtp:97 profile-level-id=42A01E; packetization-mode=0;
                   sprop-parameter-sets=Z0IACpZTBYmI,aMljiA==,As0DEWlsIOp==,
                   KyzFGleR
         a=rtpmap:99 H264/90000
         a=fmtp:99 profile-level-id=42A01E; packetization-mode=1;
                   sprop-parameter-sets=Z0IACpZTBYmI,aMljiA==,As0DEWlsIOp==,
                   KyzFGleR; max-rcmd-nalu-size=3980
         a=rtpmap:100 H264/90000
         a=fmtp:100 profile-level-id=42A01E; packetization-mode=2;
                   sprop-parameter-sets=Z0IACpZTBYmI,aMljiA==,As0DEWlsIOp==,
                   KyzFGleR; sprop-interleaving-depth=60;
                   sprop-deint-buf-req=86000; sprop-init-buf-time=156320;
                   deint-buf-cap=128000; max-rcmd-nalu-size=3980


       As the Offer/Answer negotiation covers both sending and receiving
       streams, an offer indicates the exact parameters for what the offerer
       is willing to receive, whereas the answer indicates the same for what
       the answerer accepts to receive.  In this case the offerer declared
       that it is willing to receive payload type 98.  The answerer accepts
       this by declaring a equivalent payload type 97; i.e., it has
       identical values for the three parameters "profile-level-id",
       packetization-mode, and "sprop-deint-buf-req".  This has the
       following implications for both the offerer and the answerer
       concerning the parameters that declare properties.  The offerer
       initially declared a certain value of the "sprop-parameter-sets" in
       the payload definition for PT=98.  However, as the answerer accepted
       this as PT=97, the values of "sprop-parameter-sets" in PT=98 must now
       be used instead when the offerer sends PT=97.  Similarly, when the
       answerer sends PT=98 to the offerer, it has to use the properties
       parameters it declared in PT=97.


       The answerer also accepts the reception of the two configurations
       that payload types 99 and 100 represent.  It provides the initial
       parameter sets for the answerer-to-offerer direction, and for
       buffering related parameters that it will use to send the payload
       types.  It also provides the offerer with its memory limit for
       deinterleaving operations by providing a "deint-buf-cap" parameter.
       This is only useful if the offerer decides on making a second offer,
       where it can take the new value into account.  The "max-rcmd-nalu-
       size" indicates that the answerer can efficiently process NALUs up to






    Wenger, et al.              Standards Track                    [Page 59]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




       the size of 3980 bytes.  However, there is no guarantee that the
       network supports this size.


       Please note that the parameter sets in the above example do not
       represent a legal operation point of an H.264 codec.  The base64
       strings are only used for illustration.


    8.4.  Parameter Set Considerations


       The H.264 parameter sets are a fundamental part of the video codec
       and vital to its operation; see section 1.2.  Due to their
       characteristics and their importance for the decoding process, lost
       or erroneously transmitted parameter sets can hardly be concealed
       locally at the receiver.  A reference to a corrupt parameter set has
       normally fatal results to the decoding process.  Corruption could
       occur, for example, due to the erroneous transmission or loss of a
       parameter set data structure, but also due to the untimely
       transmission of a parameter set update.  Therefore, the following
       recommendations are provided as a guideline for the implementer of
       the RTP sender.


       Parameter set NALUs can be transported using three different
       principles:


       A. Using a session control protocol (out-of-band) prior to the actual
          RTP session.


       B. Using a session control protocol (out-of-band) during an ongoing
          RTP session.


       C. Within the RTP stream in the payload (in-band) during an ongoing
          RTP session.


       It is necessary to implement principles A and B within a session
       control protocol.  SIP and SDP can be used as described in the SDP
       Offer/Answer model and in the previous sections of this memo.  This
       section contains guidelines on how principles A and B must be
       implemented within session control protocols.  It is independent of
       the particular protocol used.  Principle C is supported by the RTP
       payload format defined in this specification.


       The picture and sequence parameter set NALUs SHOULD NOT be
       transmitted in the RTP payload unless reliable transport is provided
       for RTP, as a loss of a parameter set of either type will likely
       prevent decoding of a considerable portion of the corresponding RTP












    Wenger, et al.              Standards Track                    [Page 60]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




       stream.  Thus, the transmission of parameter sets using a reliable
       session control protocol (i.e., usage of principle A or B above) is
       RECOMMENDED.


       In the rest of the section it is assumed that out-of-band signaling
       provides reliable transport of parameter set NALUs and that in-band
       transport does not.  If in-band signaling of parameter sets is used,
       the sender SHOULD take the error characteristics into account and use
       mechanisms to provide a high probability for delivering the parameter
       sets correctly.  Mechanisms that increase the probability for a
       correct reception include packet repetition, FEC, and retransmission.
       The use of an unreliable, out-of-band control protocol has similar
       disadvantages as the in-band signaling (possible loss) and, in
       addition, may also lead to difficulties in the synchronization (see
       below).  Therefore, it is NOT RECOMMENDED.


       Parameter sets MAY be added or updated during the lifetime of a
       session using principles B and C.  It is required that parameter sets
       are present at the decoder prior to the NAL units that refer to them.
       Updating or adding of parameter sets can result in further problems,
       and therefore the following recommendations should be considered.


       -  When parameter sets are added or updated, principle C is
          vulnerable to transmission errors as described above, and
          therefore principle B is RECOMMENDED.


       -  When parameter sets are added or updated, care SHOULD be taken to
          ensure that any parameter set is delivered prior to its usage.  It
          is common that no synchronization is present between out-of-band
          signaling and in-band traffic.  If out-of-band signaling is used,
          it is RECOMMENDED that a sender does not start sending NALUs
          requiring the updated parameter sets prior to acknowledgement of
          delivery from the signaling protocol.


       -  When parameter sets are updated, the following synchronization
          issue should be taken into account.  When overwriting a parameter
          set at the receiver, the sender has to ensure that the parameter
          set in question is not needed by any NALU present in the network
          or receiver buffers.  Otherwise, decoding with a wrong parameter
          set may occur.  To lessen this problem, it is RECOMMENDED either
          to overwrite only those parameter sets that have not been used for
          a sufficiently long time (to ensure that all related NALUs have
          been consumed), or to add a new parameter set instead (which may
          have negative consequences for the efficiency of the video
          coding).


       -  When new parameter sets are added, previously unused parameter set
          identifiers are used.  This avoids the problem identified in the






    Wenger, et al.              Standards Track                    [Page 61]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




          previous paragraph.  However, in a multiparty session, unless a
          synchronized control protocol is used, there is a risk that
          multiple entities try to add different parameter sets for the same
          identifier, which has to be avoided.


       -  Adding or modifying parameter sets by using both principles B and
          C in the same RTP session may lead to inconsistencies of the
          parameter sets because of the lack of synchronization between the
          control and the RTP channel.  Therefore, principles B and C MUST
          NOT both be used in the same session unless sufficient
          synchronization can be provided.


       In some scenarios (e.g., when only the subset of this payload format
       specification corresponding to H.241 is used), it is not possible to
       employ out-of-band parameter set transmission.  In this case,
       parameter sets have to be transmitted in-band.  Here, the
       synchronization with the non-parameter-set-data in the bitstream is
       implicit, but the possibility of a loss has to be taken into account.
       The loss probability should be reduced using the mechanisms discussed
       above.


       -  When parameter sets are initially provided using principle A and
          then later added or updated in-band (principle C), there is a risk
          associated with updating the parameter sets delivered out-of-band.
          If receivers miss some in-band updates (for example, because of a
          loss or a late tune-in), those receivers attempt to decode the
          bitstream using out-dated parameters.  It is RECOMMENDED that
          parameter set IDs be partitioned between the out-of-band and in-
          band parameter sets.


       To allow for maximum flexibility and best performance from the H.264
       coder, it is recommended, if possible, to allow any sender to add its
       own parameter sets to be used in a session.  Setting the "parameter-
       add" parameter to false should only be done in cases where the
       session topology prevents a participant to add its own parameter
       sets.


    9.  Security Considerations


       RTP packets using the payload format defined in this specification
       are subject to the security considerations discussed in the RTP
       specification [4], and in any appropriate RTP profile (for example,
       [16]).  This implies that confidentiality of the media streams is
       achieved by encryption; for example, through the application of SRTP
       [26].  Because the data compression used with this payload format is
       applied end-to-end, any encryption needs to be performed after
       compression.








    Wenger, et al.              Standards Track                    [Page 62]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




       A potential denial-of-service threat exists for data encodings using
       compression techniques that have non-uniform receiver-end
       computational load.  The attacker can inject pathological datagrams
       into the stream that are complex to decode and that cause the
       receiver to be overloaded.  H.264 is particularly vulnerable to such
       attacks, as it is extremely simple to generate datagrams containing
       NAL units that affect the decoding process of many future NAL units.
       Therefore, the usage of data origin authentication and data integrity
       protection of at least the RTP packet is RECOMMENDED; for example,
       with SRTP [26].


       Note that the appropriate mechanism to ensure confidentiality and
       integrity of RTP packets and their payloads is very dependent on the
       application and on the transport and signaling protocols employed.
       Thus, although SRTP is given as an example above, other possible
       choices exist.


       Decoders MUST exercise caution with respect to the handling of user
       data SEI messages, particularly if they contain active elements, and
       MUST restrict their domain of applicability to the presentation
       containing the stream.


       End-to-End security with either authentication, integrity or
       confidentiality protection will prevent a MANE from performing
       media-aware operations other than discarding complete packets.  And
       in the case of confidentiality protection it will even be prevented
       from performing discarding of packets in a media aware way.  To allow
       any MANE to perform its operations, it will be required to be a
       trusted entity which is included in the security context
       establishment.


    10.  Congestion Control


       Congestion control for RTP SHALL be used in accordance with RFC 3550
       [4], and with any applicable RTP profile; e.g., RFC 3551 [16].  An
       additional requirement if best-effort service is being used is:
       users of this payload format MUST monitor packet loss to ensure that
       the packet loss rate is within acceptable parameters.  Packet loss is
       considered acceptable if a TCP flow across the same network path, and
       experiencing the same network conditions, would achieve an average
       throughput, measured on a reasonable timescale, that is not less than
       the RTP flow is achieving.  This condition can be satisfied by
       implementing congestion control mechanisms to adapt the transmission
       rate (or the number of layers subscribed for a layered multicast
       session), or by arranging for a receiver to leave the session if the
       loss rate is unacceptably high.










    Wenger, et al.              Standards Track                    [Page 63]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




       The bit rate adaptation necessary for obeying the congestion control
       principle is easily achievable when real-time encoding is used.
       However, when pre-encoded content is being transmitted, bandwidth
       adaptation requires the availability of more than one coded
       representation of the same content, at different bit rates, or the
       existence of non-reference pictures or sub-sequences [22] in the
       bitstream.  The switching between the different representations can
       normally be performed in the same RTP session; e.g., by employing a
       concept known as SI/SP slices of the Extended Profile, or by
       switching streams at IDR picture boundaries.  Only when non-
       downgradable parameters (such as the profile part of the
       profile/level ID) are required to be changed does it become necessary
       to terminate and re-start the media stream.  This may be accomplished
       by using a different RTP payload type.


       MANEs MAY follow the suggestions outlined in section 7.3 and remove
       certain unusable packets from the packet stream when that stream was
       damaged due to previous packet losses.  This can help reduce the
       network load in certain special cases.


    11.  IANA Consideration


       IANA has registered one new MIME type; see section 8.1.
























































    Wenger, et al.              Standards Track                    [Page 64]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




    12.  Informative Appendix: Application Examples


       This payload specification is very flexible in its use, in order to
       cover the extremely wide application space anticipated for H.264.
       However, this great flexibility also makes it difficult for an
       implementer to decide on a reasonable packetization scheme.  Some
       information on how to apply this specification to real-world
       scenarios is likely to appear in the form of academic publications
       and a test model software and description in the near future.
       However, some preliminary usage scenarios are described here as well.


    12.1.  Video Telephony according to ITU-T Recommendation H.241
           Annex A


       H.323-based video telephony systems that use H.264 as an optional
       video compression scheme are required to support H.241 Annex A [15]
       as a packetization scheme.  The packetization mechanism defined in
       this Annex is technically identical with a small subset of this
       specification.


       When a system operates according to H.241 Annex A, parameter set NAL
       units are sent in-band.  Only Single NAL unit packets are used.  Many
       such systems are not sending IDR pictures regularly, but only when
       required by user interaction or by control protocol means; e.g., when
       switching between video channels in a Multipoint Control Unit or for
       error recovery requested by feedback.


    12.2.  Video Telephony, No Slice Data Partitioning, No NAL Unit
           Aggregation


       The RTP part of this scheme is implemented and tested (though not the
       control-protocol part; see below).


       In most real-world video telephony applications, picture parameters
       such as picture size or optional modes never change during the
       lifetime of a connection.  Therefore, all necessary parameter sets
       (usually only one) are sent as a side effect of the capability
       exchange/announcement process, e.g., according to the SDP syntax
       specified in section 8.2 of this document.  As all necessary
       parameter set information is established before the RTP session
       starts, there is no need for sending any parameter set NAL units.
       Slice data partitioning is not used, either.  Thus, the RTP packet
       stream basically consists of NAL units that carry single coded
       slices.


       The encoder chooses the size of coded slice NAL units so that they
       offer the best performance.  Often, this is done by adapting the
       coded slice size to the MTU size of the IP network.  For small






    Wenger, et al.              Standards Track                    [Page 65]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




       picture sizes, this may result in a one-picture-per-one-packet
       strategy.  Intra refresh algorithms clean up the loss of packets and
       the resulting drift-related artifacts.


    12.3.  Video Telephony, Interleaved Packetization Using NAL Unit
           Aggregation


       This scheme allows better error concealment and is used in H.263
       based designs using RFC 2429 packetization [10].  It has been
       implemented, and good results were reported [12].


       The VCL encoder codes the source picture so that all macroblocks
       (MBs) of one MB line are assigned to one slice.  All slices with even
       MB row addresses are combined into one STAP, and all slices with odd
       MB row addresses into another.  Those STAPs are transmitted as RTP
       packets.  The establishment of the parameter sets is performed as
       discussed above.


       Note that the use of STAPs is essential here, as the high number of
       individual slices (18 for a CIF picture) would lead to unacceptably
       high IP/UDP/RTP header overhead (unless the source coding tool FMO is
       used, which is not assumed in this scenario).  Furthermore, some
       wireless video transmission systems, such as H.324M and the IP-based
       video telephony specified in 3GPP, are likely to use relatively small
       transport packet size.  For example, a typical MTU size of H.223 AL3
       SDU is around 100 bytes [17].  Coding individual slices according to
       this packetization scheme provides further advantage in communication
       between wired and wireless networks, as individual slices are likely
       to be smaller than the preferred maximum packet size of wireless
       systems.  Consequently, a gateway can convert the STAPs used in a
       wired network into several RTP packets with only one NAL unit, which
       are preferred in a wireless network, and vice versa.


    12.4.  Video Telephony with Data Partitioning


       This scheme has been implemented and has been shown to offer good
       performance, especially at higher packet loss rates [12].


       Data Partitioning is known to be useful only when some form of
       unequal error protection is available.  Normally, in single-session
       RTP environments, even error characteristics are assumed; i.e., the
       packet loss probability of all packets of the session is the same
       statistically.  However, there are means to reduce the packet loss
       probability of individual packets in an RTP session.  A FEC packet
       according to RFC 2733 [18], for example, specifies which media
       packets are associated with the FEC packet.










    Wenger, et al.              Standards Track                    [Page 66]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




       In all cases, the incurred overhead is substantial but is in the same
       order of magnitude as the number of bits that have otherwise been
       spent for intra information.  However, this mechanism does not add
       any delay to the system.


       Again, the complete parameter set establishment is performed through
       control protocol means.


    12.5.  Video Telephony or Streaming with FUs and Forward Error
           Correction


       This scheme has been implemented and has been shown to provide good
       performance, especially at higher packet loss rates [19].


       The most efficient means to combat packet losses for scenarios where
       retransmissions are not applicable is forward error correction (FEC).
       Although application layer, end-to-end use of FEC is often less
       efficient than an FEC-based protection of individual links
       (especially when links of different characteristics are in the
       transmission path), application layer, end-to-end FEC is unavoidable
       in some scenarios.  RFC 2733 [18] provides means to use generic,
       application layer, end-to-end FEC in packet-loss environments.  A
       binary forward error correcting code is generated by applying the XOR
       operation to the bits at the same bit position in different packets.
       The binary code can be specified by the parameters (n,k) in which k
       is the number of information packets used in the connection and n is
       the total number of packets generated for k information packets;
       i.e., n-k parity packets are generated for k information packets.


       When a code is used with parameters (n,k) within the RFC 2733
       framework, the following properties are well known:


       a) If applied over one RTP packet, RFC 2733 provides only packet
          repetition.


       b) RFC 2733 is most bit rate efficient if XOR-connected packets have
          equal length.


       c) At the same packet loss probability p and for a fixed k, the
          greater the value of n is, the smaller the residual error
          probability becomes.  For example, for a packet loss probability
          of 10%, k=1, and n=2, the residual error probability is about 1%,
          whereas for n=3, the residual error probability is about 0.1%.


       d) At the same packet loss probability p and for a fixed code rate
          k/n, the greater the value of n is, the smaller the residual error
          probability becomes.  For example, at a packet loss probability of
          p=10%, k=1 and n=2, the residual error rate is about 1%, whereas






    Wenger, et al.              Standards Track                    [Page 67]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




          for an extended Golay code with k=12 and n=24, the residual error
          rate is about 0.01%.


       For applying RFC 2733 in combination with H.264 baseline coded video
       without using FUs, several options might be considered:


       1) The video encoder produces NAL units for which each video frame is
          coded in a single slice.  Applying FEC, one could use a simple
          code; e.g., (n=2, k=1).  That is, each NAL unit would basically
          just be repeated.  The disadvantage is obviously the bad code
          performance according to d), above, and the low flexibility, as
          only (n, k=1) codes can be used.


       2) The video encoder produces NAL units for which each video frame is
          encoded in one or more consecutive slices.  Applying FEC, one
          could use a better code, e.g., (n=24, k=12), over a sequence of
          NAL units.  Depending on the number of RTP packets per frame, a
          loss may introduce a significant delay, which is reduced when more
          RTP packets are used per frame.  Packets of completely different
          length might also be connected, which decreases bit rate
          efficiency according to b), above.  However, with some care and
          for slices of 1kb or larger, similar length (100-200 bytes
          difference) may be produced, which will not lower the bit
          efficiency catastrophically.


       3) The video encoder produces NAL units, for which a certain frame
          contains k slices of possibly almost equal length.  Then, applying
          FEC, a better code, e.g., (n=24, k=12), can be used over the
          sequence of NAL units for each frame.  The delay compared to that
          of 2), above,  may be reduced, but several disadvantages are
          obvious.  First, the coding efficiency of the encoded video is
          lowered significantly, as slice-structured coding reduces intra-
          frame prediction and additional slice overhead is necessary.
          Second, pre-encoded content or, when operating over a gateway, the
          video is usually not appropriately coded with k slices such that
          FEC can be applied.  Finally, the encoding of video producing k
          slices of equal length is not straightforward and might require
          more than one encoding pass.


       Many of the mentioned disadvantages can be avoided by applying FUs in
       combination with FEC.  Each NAL unit can be split into any number of
       FUs of basically equal length; therefore, FEC with a reasonable k and
       n can be applied, even if the encoder made no effort to produce
       slices of equal length.  For example, a coded slice NAL unit
       containing an entire frame can be split to k FUs, and a parity check
       code (n=k+1, k) can be applied.  However, this has the disadvantage










    Wenger, et al.              Standards Track                    [Page 68]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




       that unless all created fragments can be recovered, the whole slice
       will be lost.  Thus a larger section is lost than would be if the
       frame had been split into several slices.


       The presented technique makes it possible to achieve good
       transmission error tolerance, even if no additional source coding
       layer redundancy (such as periodic intra frames) is present.
       Consequently, the same coded video sequence can be used to achieve
       the maximum compression efficiency and quality over error-free
       transmission and for transmission over error-prone networks.
       Furthermore, the technique allows the application of FEC to pre-
       encoded sequences without adding delay.  In this case, pre-encoded
       sequences that are not encoded for error-prone networks can still be
       transmitted almost reliably without adding extensive delays.  In
       addition, FUs of equal length result in a bit rate efficient use of
       RFC 2733.


       If the error probability depends on the length of the transmitted
       packet (e.g., in case of mobile transmission [14]), the benefits of
       applying FUs with FEC are even more obvious.  Basically, the
       flexibility of the size of FUs allows appropriate FEC to be applied
       for each NAL unit and unequal error protection of NAL units.


       When FUs and FEC are used, the incurred overhead is substantial but
       is in the same order of magnitude as the number of bits that have to
       be spent for intra-coded macroblocks if no FEC is applied.  In [19],
       it was shown that the overall performance of the FEC-based approach
       enhanced quality when using the same error rate and same overall bit
       rate, including the overhead.


    12.6.  Low Bit-Rate Streaming


       This scheme has been implemented with H.263 and non-standard RTP
       packetization and has given good results [20].  There is no technical
       reason why similarly good results could not be achievable with H.264.


       In today's Internet streaming, some of the offered bit rates are
       relatively low in order to allow terminals with dial-up modems to
       access the content.  In wired IP networks, relatively large packets,
       say 500 - 1500 bytes, are preferred to smaller and more frequently
       occurring packets in order to reduce network congestion.  Moreover,
       use of large packets decreases the amount of RTP/UDP/IP header
       overhead.  For low bit-rate video, the use of large packets means
       that sometimes up to few pictures should be encapsulated in one
       packet.












    Wenger, et al.              Standards Track                    [Page 69]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




       However, loss of a packet including many coded pictures would have
       drastic consequences for visual quality, as there is practically no
       other way to conceal a loss of an entire picture than to repeat the
       previous one.  One way to construct relatively large packets and
       maintain possibilities for successful loss concealment is to
       construct MTAPs that contain interleaved slices from several
       pictures.  An MTAP should not contain spatially adjacent slices from
       the same picture or spatially overlapping slices from any picture.
       If a packet is lost, it is likely that a lost slice is surrounded by
       spatially adjacent slices of the same picture and spatially
       corresponding slices of the temporally previous and succeeding
       pictures.  Consequently, concealment of the lost slice is likely to
       be relatively successful.


    12.7.  Robust Packet Scheduling in Video Streaming


       Robust packet scheduling has been implemented with MPEG-4 Part 2 and
       simulated in a wireless streaming environment [21].  There is no
       technical reason why similar or better results could not be
       achievable with H.264.


       Streaming clients typically have a receiver buffer that is capable of
       storing a relatively large amount of data.  Initially, when a
       streaming session is established, a client does not start playing the
       stream back immediately.  Rather, it typically buffers the incoming
       data for a few seconds.  This buffering helps maintain continuous
       playback, as, in case of occasional increased transmission delays or
       network throughput drops, the client can decode and play buffered
       data.  Otherwise, without initial buffering, the client has to freeze
       the display, stop decoding, and wait for incoming data.  The
       buffering is also necessary for either automatic or selective
       retransmission in any protocol level.  If any part of a picture is
       lost, a retransmission mechanism may be used to resend the lost data.
       If the retransmitted data is received before its scheduled decoding
       or playback time, the loss is recovered perfectly.  Coded pictures
       can be ranked according to their importance in the subjective quality
       of the decoded sequence.  For example, non-reference pictures, such
       as conventional B pictures, are subjectively least important, as
       their absence does not affect decoding of any other pictures.  In
       addition to non-reference pictures, the ITU-T H.264 | ISO/IEC
       14496-10 standard includes a temporal scalability method called sub-
       sequences [22].  Subjective ranking can also be made on coded slice
       data partition or slice group basis.  Coded slices and coded slice
       data partitions that are subjectively the most important can be sent
       earlier than their decoding order indicates, whereas coded slices and
       coded slice data partitions that are subjectively the least important
       can be sent later than their natural coding order indicates.
       Consequently, any retransmitted parts of the most important slices






    Wenger, et al.              Standards Track                    [Page 70]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




       and coded slice data partitions are more likely to be received before
       their scheduled decoding or playback time compared to the least
       important slices and slice data partitions.


    13.  Informative Appendix: Rationale for Decoding Order Number


    13.1.  Introduction


       The Decoding Order Number (DON) concept was introduced mainly to
       enable efficient multi-picture slice interleaving (see section 12.6)
       and robust packet scheduling (see section 12.7).  In both of these
       applications, NAL units are transmitted out of decoding order.  DON
       indicates the decoding order of NAL units and should be used in the
       receiver to recover the decoding order.  Example use cases for
       efficient multi-picture slice interleaving and for robust packet
       scheduling are given in sections 13.2 and 13.3, respectively.
       Section 13.4 describes the benefits of the DON concept in error
       resiliency achieved by redundant coded pictures.  Section 13.5
       summarizes considered alternatives to DON and justifies why DON was
       chosen to this RTP payload specification.


    13.2.  Example of Multi-Picture Slice Interleaving


       An example of multi-picture slice interleaving follows.  A subset of
       a coded video sequence is depicted below in output order.  R denotes
       a reference picture, N denotes a non-reference picture, and the
       number indicates a relative output time.


          ... R1 N2 R3 N4 R5 ...


       The decoding order of these pictures from left to right is as
       follows:


          ... R1 R3 N2 R5 N4 ...


       The NAL units of pictures R1, R3, N2, R5, and N4 are marked with a
       DON equal to 1, 2, 3, 4, and 5, respectively.




























    Wenger, et al.              Standards Track                    [Page 71]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




       Each reference picture consists of three slice groups that are
       scattered as follows (a number denotes the slice group number for
       each macroblock in a QCIF frame):


          0 1 2 0 1 2 0 1 2 0 1
          2 0 1 2 0 1 2 0 1 2 0
          1 2 0 1 2 0 1 2 0 1 2
          0 1 2 0 1 2 0 1 2 0 1
          2 0 1 2 0 1 2 0 1 2 0
          1 2 0 1 2 0 1 2 0 1 2
          0 1 2 0 1 2 0 1 2 0 1
          2 0 1 2 0 1 2 0 1 2 0
          1 2 0 1 2 0 1 2 0 1 2




       For the sake of simplicity, we assume that all the macroblocks of a
       slice group are included in one slice.  Three MTAPs are constructed
       from three consecutive reference pictures so that each MTAP contains
       three aggregation units, each of which contains all the macroblocks
       from one slice group.  The first MTAP contains slice group 0 of
       picture R1, slice group 1 of picture R3, and slice group 2 of
       picture R5.  The second MTAP contains slice group 1 of picture R1,
       slice group 2 of picture R3, and slice group 0 of picture R5.  The
       third MTAP contains slice group 2 of picture R1, slice group 0 of
       picture R3, and slice group 1 of picture R5.  Each non-reference
       picture is encapsulated into an STAP-B.


       Consequently, the transmission order of NAL units is the following:


          R1, slice group 0, DON 1, carried in MTAP,   RTP SN: N
          R3, slice group 1, DON 2, carried in MTAP,   RTP SN: N
          R5, slice group 2, DON 4, carried in MTAP,   RTP SN: N
          R1, slice group 1, DON 1, carried in MTAP,   RTP SN: N+1
          R3, slice group 2, DON 2, carried in MTAP,   RTP SN: N+1
          R5, slice group 0, DON 4, carried in MTAP,   RTP SN: N+1
          R1, slice group 2, DON 1, carried in MTAP,   RTP SN: N+2
          R3, slice group 1, DON 2, carried in MTAP,   RTP SN: N+2
          R5, slice group 0, DON 4, carried in MTAP,   RTP SN: N+2
          N2,                DON 3, carried in STAP-B, RTP SN: N+3
          N4,                DON 5, carried in STAP-B, RTP SN: N+4


       The receiver is able to organize the NAL units back in decoding order
       based on the value of DON associated with each NAL unit.


       If one of the MTAPs is lost, the spatially adjacent and temporally
       co-located macroblocks are received and can be used to conceal the
       loss efficiently.  If one of the STAPs is lost, the effect of the
       loss does not propagate temporally.






    Wenger, et al.              Standards Track                    [Page 72]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




    13.3.  Example of Robust Packet Scheduling


       An example of robust packet scheduling follows.  The communication
       system used in the example consists of the following components in
       the order that the video is processed from source to sink:


          o camera and capturing
          o pre-encoding buffer
          o encoder
          o encoded picture buffer
          o transmitter
          o transmission channel
          o receiver
          o receiver buffer
          o decoder
          o decoded picture buffer
          o display


       The video communication system used in the example operates as
       follows.  Note that processing of the video stream happens gradually
       and at the same time in all components of the system.  The source
       video sequence is shot and captured to a pre-encoding buffer.  The
       pre-encoding buffer can be used to order pictures from sampling order
       to encoding order or to analyze multiple uncompressed frames for bit
       rate control purposes, for example.  In some cases, the pre-encoding
       buffer may not exist; instead, the sampled pictures are encoded right
       away.  The encoder encodes pictures from the pre-encoding buffer and
       stores the output; i.e., coded pictures, to the encoded picture
       buffer.  The transmitter encapsulates the coded pictures from the
       encoded picture buffer to transmission packets and sends them to a
       receiver through a transmission channel.  The receiver stores the
       received packets to the receiver buffer.  The receiver buffering
       process typically includes buffering for transmission delay jitter.
       The receiver buffer can also be used to recover correct decoding
       order of coded data.  The decoder reads coded data from the receiver
       buffer and produces decoded pictures as output into the decoded
       picture buffer.  The decoded picture buffer is used to recover the
       output (or display) order of pictures.  Finally, pictures are
       displayed.


       In the following example figures, I denotes an IDR picture, R denotes
       a reference picture, N denotes a non-reference picture, and the
       number after I, R, or N indicates the sampling time relative to the
       previous IDR picture in decoding order.  Values below the sequence of
       pictures indicate scaled system clock timestamps.  The system clock
       is initialized arbitrarily in this example, and time runs from left
       to right.  Each I, R, and N picture is mapped into the same timeline
       compared to the previous processing step, if any, assuming that






    Wenger, et al.              Standards Track                    [Page 73]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




       encoding, transmission, and decoding take no time.  Thus, events
       happening at the same time are located in the same column throughout
       all example figures.


       A subset of a sequence of coded pictures is depicted below in
       sampling order.


           ...  N58 N59 I00 N01 N02 R03 N04 N05 R06 ... N58 N59 I00 N01 ...
           ... --|---|---|---|---|---|---|---|---|- ... -|---|---|---|- ...
           ...  58  59  60  61  62  63  64  65  66  ... 128 129 130 131 ...


          Figure 16.  Sequence of pictures in sampling order


       The sampled pictures are buffered in the pre-encoding buffer to
       arrange them in encoding order.  In this example, we assume that the
       non-reference pictures are predicted from both the previous and the
       next reference picture in output order, except for the non-reference
       pictures immediately preceding an IDR picture, which are predicted
       only from the previous reference picture in output order.  Thus, the
       pre-encoding buffer has to contain at least two pictures, and the
       buffering causes a delay of two picture intervals.  The output of the
       pre-encoding buffering process and the encoding (and decoding) order
       of the pictures are as follows:


                    ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
                    ... -|---|---|---|---|---|---|---|---|- ...
                    ... 60  61  62  63  64  65  66  67  68  ...


          Figure 17.  Re-ordered pictures in the pre-encoding buffer


       The encoder or the transmitter can set the value of DON for each
       picture to a value of DON for the previous picture in decoding order
       plus one.


       For the sake of simplicity, let us assume that:


       o  the frame rate of the sequence is constant,
       o  each picture consists of only one slice,
       o  each slice is encapsulated in a single NAL unit packet,
       o  there is no transmission delay, and
       o  pictures are transmitted at constant intervals (that is, 1 / frame
          rate).


















    Wenger, et al.              Standards Track                    [Page 74]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




       When pictures are transmitted in decoding order, they are received as
       follows:


                    ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
                    ... -|---|---|---|---|---|---|---|---|- ...
                    ... 60  61  62  63  64  65  66  67  68  ...


          Figure 18.  Received pictures in decoding order


       The OPTIONAL sprop-interleaving-depth MIME type parameter is set to
       0, as the transmission (or reception) order is identical to the
       decoding order.


       The decoder has to buffer for one picture interval initially in its
       decoded picture buffer to organize pictures from decoding order to
       output order as depicted below:


                        ... N58 N59 I00 N01 N02 R03 N04 N05 R06 ...
                        ... -|---|---|---|---|---|---|---|---|- ...
                        ... 61  62  63  64  65  66  67  68  69  ...


          Figure 19.  Output order


       The amount of required initial buffering in the decoded picture
       buffer can be signaled in the buffering period SEI message or with
       the num_reorder_frames syntax element of H.264 video usability
       information.  num_reorder_frames indicates the maximum number of
       frames, complementary field pairs, or non-paired fields that precede
       any frame, complementary field pair, or non-paired field in the
       sequence in decoding order and that follow it in output order.  For
       the sake of simplicity, we assume that num_reorder_frames is used to
       indicate the initial buffer in the decoded picture buffer.  In this
       example, num_reorder_frames is equal to 1.


       It can be observed that if the IDR picture I00 is lost during
       transmission and a retransmission request is issued when the value of
       the system clock is 62, there is one picture interval of time (until
       the system clock reaches timestamp 63) to receive the retransmitted
       IDR picture I00.
























    Wenger, et al.              Standards Track                    [Page 75]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




       Let us then assume that IDR pictures are transmitted two frame
       intervals earlier than their decoding position; i.e., the pictures
       are transmitted as follows:


                           ...  I00 N58 N59 R03 N01 N02 R06 N04 N05 ...
                           ... --|---|---|---|---|---|---|---|---|- ...
                           ...  62  63  64  65  66  67  68  69  70  ...


          Figure 20.  Interleaving: Early IDR pictures in sending order


       The OPTIONAL sprop-interleaving-depth MIME type parameter is set
       equal to 1 according to its definition.  (The value of sprop-
       interleaving-depth in this example can be derived as follows:
       Picture I00 is the only picture preceding picture N58 or N59 in
       transmission order and following it in decoding order.  Except for
       pictures I00, N58, and N59, the transmission order is the same as the
       decoding order of pictures.  As a coded picture is encapsulated into
       exactly one NAL unit, the value of sprop-interleaving-depth is equal
       to the maximum number of pictures preceding any picture in
       transmission order and following the picture in decoding order.)


       The receiver buffering process contains two pictures at a time
       according to the value of the sprop-interleaving-depth parameter and
       orders pictures from the reception order to the correct decoding
       order based on the value of DON associated with each picture.  The
       output of the receiver buffering process is as follows:


                                ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
                                ... -|---|---|---|---|---|---|---|---|- ...
                                ... 63  64  65  66  67  68  69  70  71  ...


          Figure 21.  Interleaving: Receiver buffer


       Again, an initial buffering delay of one picture interval is needed
       to organize pictures from decoding order to output order, as depicted
       below:


                                    ... N58 N59 I00 N01 N02 R03 N04 N05 ...
                                    ... -|---|---|---|---|---|---|---|- ...
                                    ... 64  65  66  67  68  69  70  71  ...


          Figure 22.  Interleaving: Receiver buffer after reordering


       Note that the maximum delay that IDR pictures can undergo during
       transmission, including possible application, transport, or link
       layer retransmission, is equal to three picture intervals.  Thus, the










    Wenger, et al.              Standards Track                    [Page 76]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




       loss resiliency of IDR pictures is improved in systems supporting
       retransmission compared to the case in which pictures were
       transmitted in their decoding order.


    13.4.  Robust Transmission Scheduling of Redundant Coded Slices


       A redundant coded picture is a coded representation of a picture or a
       part of a picture that is not used in the decoding process if the
       corresponding primary coded picture is correctly decoded.  There
       should be no noticeable difference between any area of the decoded
       primary picture and a corresponding area that would result from
       application of the H.264 decoding process for any redundant picture
       in the same access unit.  A redundant coded slice is a coded slice
       that is a part of a redundant coded picture.


       Redundant coded pictures can be used to provide unequal error
       protection in error-prone video transmission.  If a primary coded
       representation of a picture is decoded incorrectly, a corresponding
       redundant coded picture can be decoded.  Examples of applications and
       coding techniques using the redundant codec picture feature include
       the video redundancy coding [23] and the protection of "key pictures"
       in multicast streaming [24].


       One property of many error-prone video communications systems is that
       transmission errors are often bursty.  Therefore, they may affect
       more than one consecutive transmission packets in transmission order.
       In low bit-rate video communication, it is relatively common that an
       entire coded picture can be encapsulated into one transmission
       packet.  Consequently, a primary coded picture and the corresponding
       redundant coded pictures may be transmitted in consecutive packets in
       transmission order.  To make the transmission scheme more tolerant of
       bursty transmission errors, it is beneficial to transmit the primary
       coded picture and redundant coded picture separated by more than a
       single packet.  The DON concept enables this.


    13.5.  Remarks on Other Design Possibilities


       The slice header syntax structure of the H.264 coding standard
       contains the frame_num syntax element that can indicate the decoding
       order of coded frames.  However, the usage of the frame_num syntax
       element is not feasible or desirable to recover the decoding order,
       due to the following reasons:


       o  The receiver is required to parse at least one slice header per
          coded picture (before passing the coded data to the decoder).












    Wenger, et al.              Standards Track                    [Page 77]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




       o  Coded slices from multiple coded video sequences cannot be
          interleaved, as the frame number syntax element is reset to 0 in
          each IDR picture.


       o  The coded fields of a complementary field pair share the same
          value of the frame_num syntax element.  Thus, the decoding order
          of the coded fields of a complementary field pair cannot be
          recovered based on the frame_num syntax element or any other
          syntax element of the H.264 coding syntax.


       The RTP payload format for transport of MPEG-4 elementary streams
       [25] enables interleaving of access units and transmission of
       multiple access units in the same RTP packet.  An access unit is
       specified in the H.264 coding standard to comprise all NAL units
       associated with a primary coded picture according to subclause
       7.4.1.2 of [1].  Consequently, slices of different pictures cannot be
       interleaved, and the multi-picture slice interleaving technique (see
       section 12.6) for improved error resilience cannot be used.


    14.  Acknowledgements


       The authors thank Roni Even, Dave Lindbergh, Philippe Gentric,
       Gonzalo Camarillo, Gary Sullivan, Joerg Ott, and Colin Perkins for
       careful review.


    15.  References


    15.1.  Normative References


       [1]  ITU-T Recommendation H.264, "Advanced video coding for generic
            audiovisual services", May 2003.


       [2]  ISO/IEC International Standard 14496-10:2003.


       [3]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
            Levels", BCP 14, RFC 2119, March 1997.


       [4]  Schulzrinne, H.,  Casner, S., Frederick, R., and V. Jacobson,
            "RTP: A Transport Protocol for Real-Time Applications", STD 64,
            RFC 3550, July 2003.


       [5]  Handley, M. and V. Jacobson, "SDP: Session Description
            Protocol", RFC 2327, April 1998.


       [6]  Josefsson, S., "The Base16, Base32, and Base64 Data Encodings",
            RFC 3548, July 2003.










    Wenger, et al.              Standards Track                    [Page 78]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




       [7]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
            Session Description Protocol (SDP)", RFC 3264, June 2002.


    15.2.  Informative References


       [8]  "Draft ITU-T Recommendation and Final Draft International
            Standard of Joint Video Specification (ITU-T Rec. H.264 |
            ISO/IEC 14496-10 AVC)", available from http://ftp3.itu.int/av-
            arch/jvt-site/2003_03_Pattaya/JVT-G050r1.zip, May 2003.


       [9]  Luthra, A., Sullivan, G.J., and T. Wiegand (eds.), Special Issue
            on H.264/AVC. IEEE Transactions on Circuits and Systems on Video
            Technology, July 2003.


       [10] Bormann, C., Cline, L., Deisher, G., Gardos, T., Maciocco, C.,
            Newell, D., Ott, J., Sullivan, G., Wenger, S., and C. Zhu, "RTP
            Payload Format for the 1998 Version of ITU-T Rec. H.263 Video
            (H.263+)", RFC 2429, October 1998.


       [11] ISO/IEC IS 14496-2.


       [12] Wenger, S., "H.26L over IP", IEEE Transaction on Circuits and
            Systems for Video technology, Vol. 13, No. 7, July 2003.


       [13] Wenger, S., "H.26L over IP: The IP Network Adaptation Layer",
            Proceedings Packet Video Workshop 02, April 2002.


       [14] Stockhammer, T., Hannuksela, M.M., and S. Wenger, "H.26L/JVT
            Coding Network Abstraction Layer and IP-based Transport" in
            Proc. ICIP 2002, Rochester, NY, September 2002.


       [15] ITU-T Recommendation H.241, "Extended video procedures and
            control signals for H.300 series terminals", 2004.


       [16] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video
            Conferences with Minimal Control", STD 65, RFC 3551, July 2003.


       [17] ITU-T Recommendation H.223, "Multiplexing protocol for low bit
            rate multimedia communication", July 2001.


       [18] Rosenberg, J. and H. Schulzrinne, "An RTP Payload Format for
            Generic Forward Error Correction", RFC 2733, December 1999.


       [19] Stockhammer, T., Wiegand, T., Oelbaum, T., and F. Obermeier,
            "Video Coding and Transport Layer Techniques for H.264/AVC-Based
            Transmission over Packet-Lossy Networks", IEEE International
            Conference on Image Processing (ICIP 2003), Barcelona, Spain,
            September 2003.






    Wenger, et al.              Standards Track                    [Page 79]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




       [20] Varsa, V. and M. Karczewicz, "Slice interleaving in compressed
            video packetization", Packet Video Workshop 2000.


       [21] Kang, S.H. and A. Zakhor, "Packet scheduling algorithm for
            wireless video streaming," International Packet Video Workshop
            2002.


       [22] Hannuksela, M.M., "Enhanced concept of GOP", JVT-B042, available
            http://ftp3.itu.int/av-arch/video-site/0201_Gen/JVT-B042.doc,
            January 2002.


       [23] Wenger, S., "Video Redundancy Coding in H.263+", 1997
            International Workshop on Audio-Visual Services over Packet
            Networks, September 1997.


       [24] Wang, Y.-K., Hannuksela, M.M., and M. Gabbouj, "Error Resilient
            Video Coding Using Unequally Protected Key Pictures", in Proc.
            International Workshop VLBV03, September 2003.


       [25] van der Meer, J., Mackie, D., Swaminathan, V., Singer, D., and
            P. Gentric, "RTP Payload Format for Transport of MPEG-4
            Elementary Streams", RFC 3640, November 2003.


       [26] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
            Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC
            3711, March 2004.


       [27] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming
            Protocol (RTSP)", RFC 2326, April 1998.


       [28] Handley, M., Perkins, C., and E. Whelan, "Session Announcement
            Protocol", RFC 2974, October 2000.


       [29] ISO/IEC 14496-15: "Information technology - Coding of audio-
            visual objects - Part 15: Advanced Video Coding (AVC) file
            format".


       [30] Castagno, R. and D. Singer, "MIME Type Registrations for 3rd
            Generation Partnership Project (3GPP) Multimedia files", RFC
            3839, July 2004.






















    Wenger, et al.              Standards Track                    [Page 80]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




    Authors' Addresses


       Stephan Wenger
       TU Berlin / Teles AG
       Franklinstr. 28-29
       D-10587 Berlin
       Germany


       Phone: +49-172-300-0813
       EMail: stewe@stewe.org




       Miska M. Hannuksela
       Nokia Corporation
       P.O. Box 100
       33721 Tampere
       Finland


       Phone: +358-7180-73151
       EMail: miska.hannuksela@nokia.com




       Thomas Stockhammer
       Nomor Research
       D-83346 Bergen
       Germany


       Phone: +49-8662-419407
       EMail: stockhammer@nomor.de




       Magnus Westerlund
       Multimedia Technologies
       Ericsson Research EAB/TVA/A
       Ericsson AB
       Torshamsgatan 23
       SE-164 80 Stockholm
       Sweden


       Phone: +46-8-7190000
       EMail: magnus.westerlund@ericsson.com




















    Wenger, et al.              Standards Track                    [Page 81]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




       David Singer
       QuickTime Engineering
       Apple
       1 Infinite Loop MS 302-3MT
       Cupertino
       CA 95014
       USA


       Phone +1 408 974-3162
       EMail: singer@apple.com


















































































    Wenger, et al.              Standards Track                    [Page 82]




    RFC 3984           RTP Payload Format for H.264 Video      February 2005




    Full Copyright Statement


       Copyright (C) The Internet Society (2005).


       This document is subject to the rights, licenses and restrictions
       contained in BCP 78, and except as set forth therein, the authors
       retain all their rights.


       This document and the information contained herein are provided on an
       "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
       OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
       ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
       INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
       INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
       WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


    Intellectual Property


       The IETF takes no position regarding the validity or scope of any
       Intellectual Property Rights or other rights that might be claimed to
       pertain to the implementation or use of the technology described in
       this document or the extent to which any license under such rights
       might or might not be available; nor does it represent that it has
       made any independent effort to identify any such rights.  Information
       on the IETF's procedures with respect to rights in IETF Documents can
       be found in BCP 78 and BCP 79.


       Copies of IPR disclosures made to the IETF Secretariat and any
       assurances of licenses to be made available, or the result of an
       attempt made to obtain a general license or permission for the use of
       such proprietary rights by implementers or users of this
       specification can be obtained from the IETF on-line IPR repository at
       http://www.ietf.org/ipr.


       The IETF invites any interested party to bring to its attention any
       copyrights, patents or patent applications, or other proprietary
       rights that may cover technology that may be required to implement
       this standard.  Please address the information to the IETF at ietf-
       ipr@ietf.org.




    Acknowledgement


       Funding for the RFC Editor function is currently provided by the
       Internet Society.












    Wenger, et al.              Standards Track                    [Page 83]

    RFID管理系統集成商 RFID中間件 條碼系統中間層 物聯網軟件集成
    最近免费观看高清韩国日本大全