Internet Multimedia Final Exam 06.09.2002

    1a. What are the advantages and disadvantages of using RTP-over-TCP to carry (a) video-on-demand, (b) Internet phone calls, compared to RTP-over-UDP? (Name at least two advantages and disadvantages each.)
    video-on-demand +: bandwidth-adaptive, delays and packet header size (since payload is bigger) don't much matter, reliable, works through firewalls; -: rate averaging may cause receiver starvation +: avoid receiver starvation; -: no congestion control
    Internet phone calls -: longer delays due to retransmission; +: easier to get through firewalls; compensates for packet losses +: lower packet-header overhead, no delays imposed by retransmission or flow/congestion control; -: needs application-layer reliability;
    1b. Why is DCT computed on small blocks (of 8x8 or 16x16) instead of on the whole image? (Assume the image size is 512x512.)

    DCT is computed on small blocks for efficiency reasons. DCT on an NxN block (without optimization) takes O(N4) time. If we compute on smaller blocks of size 8x8, then there are (N/8)2 such blocks and it takes (N/8)284, which is much smaller than N4 (e.g., for N=512).

    One disadvantage is that large blocks of the same luminance or chrominance value cannot be treated as one (DC) value. However, such blocks are rare in natural images.

    1c. How is RTSP similar to HTTP? How is RTSP different from HTTP? Can HTTP be used to request a stream? Which is in-band vs. out-of-band? How much state is maintained between requests in each protocol?
    RTSP is similar to HTTP in its message format, i.e., the use of request line, headers a message body. It differs from HTTP in:
    1d. Imagine compressing two audio files using gzip, a PCM (mu-law) audio file and a G.729-encoded audio file. How would you expect the compression ratios to compare? How well does this compression work? Justify your answer.

    In general, gzip does not work well for audio files. Gzip is based on dictionary coding, basically finding patterns and using fewer bits to encode repeated patterns. Usually, audio files do not have repeated patterns, so gzip won't work well.

    Since G.729 is already compressed more than PCM (mu-law), one argument could be that further applying gzip to these files will result in more compression ratio for PCM because it is already less compressed.

    Since G.729 uses code book (CELP), it just stores the code that generates approximately same audio intead of encoding the audio. So dependening on the type of audio file it may have repeated code sequences, and can result in better compression than PCM.

    1e. If you were to listen to the error signal transmitted for an ADPCM codec, what would you hear? (E.g., a reduced-volume version of the input signal?)
    Roughly white noise, if the predictor is good.
    2.Consider the packet arrival sequence below, consisting of two talkspurts. These are PCMU (mu-law) RTP packets, with transmission starting at time 10.00.
    RTP sequence number RTP timestamp network delay (seconds)
    1 160 0.4
    2 320 0.3
    3 480 0.35
    4 800 0.32
    5 960 0.27
    6 1120 0.45

    Show a timeline as to when the packets are played out at the receiver. Your "algorithm" should minimize the end-to-end delay (a) without losing any packets, (b) while being allowed to drop one packet. Your algorithm can be non-causal, i.e., may look ahead in time. Compute the delay for the two cases.

    The sampling and RTP clock rate is 8000 Hz, with each 160 RTP clock units corresponding to 0.02 s. Thus, we have, assuming that the first packet is transmitted at time zero:
    RTP sequence number RTP timestamp network delay (seconds) arrival time playout time, with no loss playout time, with one loss
    1 160 0.40 0.40 0.40 0.40
    2 320 0.30 0.32 0.42 0.42
    3 480 0.35 0.39 0.44 0.44
    4 800 0.32 0.40 0.53 0.40
    5 960 0.27 0.37 0.55 0.42
    6 1120 0.45 0.57 0.57 drop
    The packets with sequence numbers 4, 5 and 6 are part of the second talkspurt, as can be seen by the jump in timestamp. If there is no loss allowed, the average delay is (3*0.40 + 3*0.45)/6 = 0.425 s. If we drop one packet, we can decrease the delay to (3*0.40 + 2*0.08)/5 = 0.272 s.
    3. The audio and video stream shown below need to be lip-synched. The figure shows the timing in the sender report. For simplicity, assume that the network has no jitter, so that the packets arrive, somewhat delayed, at the receiver. The delay for both audio and video packets is the same. (Is this a realistic assumption?) What audio packet should be played out when the video frame with timestamp 9000 is being displayed? Justify your answer!

    For the video stream, RTP ts 1800 corresponds to 17.18 s real-time, so that RTP ts 9000 corresponds to 17.26 s real-time. (Recall that RTP video stream clocks tick at 90,000 Hz.) For the audio stream, timestamp 560 corresponds to 17.23 s. 17.26 s is thus 30 ms away, or 240 timestamp units. Thus, the audio packet with timestamp 800 needs to be played out together with the video frame with timestamp 9000.

    It is not a realistic assumption that audio and video frames experience the same delay. For example, video frames are usually substantially longer than audio frames, so that the transmission time is longer. (Video frames typically occupy a whole packet of 1500 bytes, while low-rate audio frames are typically less than 200 bytes long.)

    4.Essay. RTSP and SIP are somewhat similar. Describe the similarities and highlight the important differences between the two protocols.


    • Both SIP and RTSP are used to initiate multimedia sessions where the actual user data is carried "out-of-band", usually using RTP.
    • Both use SDP to describe sessions.
    • They have similar syntax, derived from HTTP or email.
    • RTSP and SIP both support a form of aggregate control, i.e., the ability to control multiple streams from different locations with one control session.
    • Both support redirection.
    • Both use DNS SRV records to resolve URIs. (The current SIP resolution mechanism can also use NAPTR records for additional functionality.)


    • RTSP is used for streaming media, e.g., video-on-demand, while SIP supports mainly interactive media sessions. This means, for example, that RTSP can issue commands to pause and fast forward in a media stream and has the notion of a time axis.
    • RTSP can carry data in-band, although this is rarely used.
    • RTSP URIs are closer to HTTP URIs, identifying a server and file, while SIP URIs identify users.
    • SIP supports proxies for routing requests; that's not really supported in RTSP.