Internet Multimedia Final Exam 20.12.2002

    1a. Compare the properties of the international telephone numbering system on one hand and IP addresses and DNS host names on the other.
    property E.164 (international telephone numbers) IP DNS
    format textual, numbers, < 15 digits 32-bit binary dot-separated text
    assignment geographic, function (800) network topology/provider, function (multicast) organization
    structure fixed (NANP: country, area code, exchange) variable-length prefix country/type, organization, sub-organization
    capacity > 1015 109 > 1028 (with 16 letters)
    mapping 800 to landline, directory service ARP: IP to MAC, RARP: MAC to IP DNS: DNS to/from IP
    density very high, with area-code splits high, but permament very low
    sharing of names several home phones = one number; hunt groups each interface = one IP several hosts = one name; one host = several names
    local names yes, several levels no, but 'net 10' + NAT yes, within domain
    1b. Give an example of an RSVP killer reservation, using a drawing. What feature of RSVP causes this problem?
    See class notes and RFC 2205, section 2.5: "The first killer reservation problem (KR-I) arises when there is already a reservation Q0 in place. If another receiver now makes a larger reservation Q1 > Q0, the result of merging Q0 and Q1 may be rejected by admission control in some upstream node. This must not deny service to Q0."

    The problem is caused by the merging of different reservations coming from receivers for the same flow and appears in any receiver-oriented reservation protocol with diverse receiver requirements. It is not caused by having small and large reservations for different flows - large flows blocking smaller ones is a generic problem of any reservation protocol.

    1c. What components are necessary and sufficient to create a network with delay bounds? What does the delay bound depend on? Explain the components briefly.

    For a packet network, delay bounds require components like policing (use token bucket to avoid burstiness), scheduling (use a fair scheduler, e.g., WFQ) and admission control (so that the sources do not overuse the resources). An an example WFQ with a token-bucket based policer can create a network with delay bounds.

    The total delay depends on (1) delay due to burstiness; (2) delay due to packets in the same flow; (3) delay due to other competing packets at the routers; (4) transmission delay. Transmission delay depends on the physical medium. Burstiness is defined by the token bucket used. Delay due to packets in the same flow depends on the maximum packet size in the flow. Delay due to other competing packets at the intermediate routers depends on the maximum packet length encountered, number of hops and total bandwidth at each hop.

    A token bucket (or a leaky bucket) work on a single flow. If the bucket allows burst of 100 bytes, then it may cause worst case delay of 100 bytes. WFQ works on multiple input flows, to generate output for the given link. If a very large packet is moving ahead of this packet in the flow then it will cause this packet to get delayed. Similarly if a very large packet is being served from some other queue then this packet (at the top of this queue) will get delayed. These three components define the delay bound equation as discussed in the class.

    In general the maximum size of the packet allowed defines the delay. This one of the motivation for fix size ATM cells.

    A non-work conserving scheduling is not needed for delay bounds, but is needed for jitter-bounds.

    1d. Sketch the SIP INVITE request, with SDP, when you call up presidentti@tpk.fi from your office.
    INVITE sip:presidenti@tpk.fi SIP/2.0
    To:      Tarja Halonen <sip:presidenti@tpk.fi>
    From:    Henning Schulzrinne <sip:hgs@cs.columbia.edu>;tag=1234
    Call-ID: 1234@bart.cs.columbia.edu
    CSeq:    1 INVITE
    Via:     bart.cs.columbia.edu
    Content-Type: application/sdp
    Content-Length: 50
    
    v=0  
    c=IN IP4 128.59.19.191
    m=audio 3456 RTP/AVP 0
    
    2. At 10 pm on some evening in the not too distant future, 1,000 users on a CATV network (bandwidth 10 Mb/s) have set their timers to start their video program to watch the evening news. The news are carried as a 20 kb/s bitstream using RTP and RTCP over multicast IP. How long, on average, does it take until everybody knows the number of other receivers? RTCP packets are 100 bytes long and the standard RTCP bandwidth (5% of data stream) is used.

    Note that the problem addresses the transient, not the steady-state behavior of the RTCP bandwidth scaling mechanism.

    According to RFC 1889 (and the code in the appendix), the first RTCP packet is sent within one half the minimum RTCP interval of 5 seconds, that is between 1.25 and 3.75 seconds after joining the conference. The overall RTCP bandwidth is 1 kb/s or 125 bytes/s or 1.25 packet/s. If all receivers send their first RTCP packet in these 2.5 seconds, 1000 * 100 bytes will be sent in that time interval, or 320,000 bits/s.

    In the case of CATV this is unlikely to cause a problem, so that all receivers know about the other 999 after the first transmission. After that, every receiver would send an RTCP packet after, on average, 1000 seconds.

    For simplicity, we are ignoring the bandwidth division between senders and receivers, which would decrease the sending rate somewhat.

    Note that the RTCP packet interval never depends directly on the line speed, but only at the nominal rate of the RTP data rate. In most cases, an RTCP sender would have no way of ascertaining the minimum bandwidth among all the group members, for one. Also, if every session would have an RTCP bandwidth which is a fraction of the physical layer speed, 20 sessions on a physical medium would always saturate the wire with RTCP traffic.

    3. You can run SIP in proxy or redirect mode. Compute the call-setup delay for both modes and compare the server processing requirements for a server handling the oulu.fi domain, assuming that the caller is in New York and the callee is at ees1.oulu.fi, with a one-way delay between New York and Finland being 50 ms. You can ignore the delay within the Oulu University local area network.

    We assume that no packets are lost. For a redirect server, there would be two trips across the ocean (100 ms roundtrip time each), for a total delay of 250 ms. The second INVITE would require the caller to wait for the ACK, thus the extra 50 ms.

    For a proxy server, we can ignore the propagation delay within the oulu.fi domain. The call would take about 150 ms to set up.

    Processing overhead is somewhat less for a redirect server since it does not have to generate its own call.

    4. Design task 2: Design a video conferencing service that integrates IP-based PCs as well as traditional telephones. Each conference can have anywhere from two to one hundred participants. The system should support many simultaneous conferences. On occasion, where will be large lectures where a lecture, for example, is being distributed to a large audience watching on PCs distributed across a modest number of local area networks. Describe the architecture and protocol components. Also, size the conference mixer and its network connection, assuming the following:
    • 10 simultaneous conferences with 20 participants each;
    • each participant listens to audio, but only at most two speak at any one a time;
    • there is one video channel per conference, for the active speaker;
    • video uses H.261 at 128 kb/s and 15 frames a second;
    • audio is G.729 with 10 ms packetization.

    To estimate the necessary server size for the server(s), describe the number of encodings and decodings that the server needs to perform.

    Describe the protocols needed for this service. How would you assure quality-of-service for the participants? How do participants dial into the conference? How do they find out who else is in the conference? Can you improve the efficiency of audio for low-bandwidth users (that may only get audio, rather than video and audio)? Do end systems need any special capabilities?

For this application, some combination of "multi-unicast" and multicast conferencing appears best. A central conference server can be used for small conferences, while a tree of servers can serve large conferences and lectures, where a single server may not have enough CPU resources or network bandwidth. The conference server or bridge acts as an RTP mixer.

Multicast is generally only supported within a local area network, so one approach is to install mixers in each LAN and have it multicast the multimedia streams locally. Multi-unicast refers to the replication of content by a conference server, where the same or similar streams are replicated and sent via unicast to each participant.

Each video stream consumes about 128 kb/s, since the packet header overhead for video is negligible. (If we assume 15 frames a second, with 16 kB/s, each frame contains about 1,067 bytes of video on average. The IP, UDP and RTP headers add about 40 bytes or less than 4%.)

Each audio stream, however, consumes far more bandwidth: at 10 ms packetization, each audio block is 10 bytes long, but consumes 50 bytes of network bandwidth once packet headers have been added, for a total bandwidth of 5000 bytes/second or 40 kb/s. RTP header compression can reduce the bandwidth overhead to one or two bytes per packet, on average.

Thus, for unicast, each participant consumes about 170 kb/s of bandwidth. In addition, the speaker also sends audio and video, so that the total bandwidth usage is (N+1) * 170 kb/s if there are N participants. Here, we assume that the sender also receives audio and its own video transmission. The bandwidth increases by 40 kb/s if two people speak at the same time. speak at once. In addition, each participant will need to send and receive RTCP, adding another 5% to that load. This calculation assumes that participants perform silence suppression. For multicast, receivers need to be prepared to receive up to two audio streams (and one video stream), but the total output bandwidth for the mixer is also the same.

The CPU in the RTP mixer needs to

  • decode the incoming audio stream(s) into linear samples;
  • create two kinds of mixes: one for the source, containing the other speaker if active, and one for everybody else, containing all active speakers. Thus, as long as only one speaker is active, only one encoding is needed, if two are active, three are needed.

Users join a conference by sending a SIP request to the designated address, such as sip:physics_lecture@conference.oulu.fi. The server can then redirect the request to the correct local server, for load balancing.

Participants can find out about other participants via RTCP SDES announcements, sent periodically to the multicast group or to each unicast recipient by the conference server.

When participating via unicast, end systems don't need any special capabilities, but need to be able to suppress audio when not speaking (silence suppression), as otherwise the inbound bandwidth would increase dramatically, as well as the processing requirements, since the mixer would need to detect silence and speech. For multicast, the end systems have to be able to mix audio streams.