Difference between revisions of "FAQ network"

From AMule Project FAQ
Jump to: navigation, search
m (mail adress)
m (Corrected Spelling of Español in language selection)
 
(22 intermediate revisions by 11 users not shown)
Line 1: Line 1:
=Network speed: what you should know before asking questions=
+
<center>
by Froenchenko Leonid, lfroen@gmail.com
+
'''English''' |
 +
[[FAQ network-es|Espa&ntilde;ol]]
 +
</center>
  
 +
= Network speed: what you should know before asking questions =
 +
== Preface ==
 +
The purpose of this document is to clarify issues regarding network
 +
speed that arise from time to time in the [[aMule]] [[forum]]. Generally speaking, there are several reasons for questions about "[[FAQ_eD2k-Kademlia|aMule network]]":
 +
 +
* the speed reported by [[aMule]] doesn't match your provider's given rate;
 +
* poor performance of [[aMule]] itself or another network application on the same computer; or
 +
* the key factors influencing network performance while [[aMule]] is running.
 +
 +
The intended audience for this document is users who want to gain a better understanding of network functionality in general, and, in particular, its implications for [[aMule]] functionality.
 +
 +
However, this page is not a comprehensive, general purpose "[[FAQ_eD2k-Kademlia|Network FAQ]]". If you were expecting something else, you might be interested in the [[aMule is slow|aMule is slow FAQ]].
 
   
 
   
==Preface==
+
== Network speed - how fast is it? ==
The purpose of this document is to clarify different issues regarding network  
+
When talking about network speed, people use the unit "bps", which means
speed that pops up from time to time in amule forum. Generally speaking, there're several reasons for questions about "amule &amp; network":
+
"bits per second". The reason that a ''bit'' is used rather than a ''byte'' is mostly historical, but also arises from the engineering behind the system, specifically, the fact that not all networks in the world transfer their traffic in bytes.
<ul>
+
 
  <li>Speed reported by amule doesn't match provider given rate</li>
+
There is a convention to use a capital "B" in "Bps" when speed is marked
  <li>Poor performance of amule itself or another network application on the same computer</li>
+
in "bytes per second". However, this convention is not widely accepted. In particular, organizations like the [http://www.ietf.org/ IETF] and [http://www.ieee.org/ IEEE] have stuck to the original "bps".
  <li>What are key factors influencing network performance while amule is running
+
 
  </li>
+
== Prefixes ==
</ul>
+
Since their invention, networks have made a lot of progress. Today we have networks that transfer billions of bits per second. For measuring these speeds, we use prefixes such as ''"kilo"'', ''"mega"'', ''"giga"'', ''"tera"''.
Intended audience for this document are users who want to get better understanding of network functionality in general and in practical implication to amule functionality.<br>
+
 
This page, however, is not to be seen as comprehensive general purpose "Network
+
It is a <u>common mistake</u> to think that values with those prefixes are the same as in computer science, i.e., powers of 2. The truth is that, for historical reasons, prefixes in networking have a decimal, not a binary base.
FAQ". <br>
+
 
 +
{| cellpadding="2" cellspacing="2" border="1" width="100%" title="Table 1"
 +
| valign="middle" title="Table 1" align="center" bgcolor="#33ff33" | Prefix
 +
| valign="middle" align="center" bgcolor="#33ff33" | meaning in computers
 +
| valign="middle" title="Table 1" align="center" bgcolor="#33ff33" | meaning in networks
 +
| valign="middle" align="center" bgcolor="#33ff33" | difference, %%
 +
|-
 +
| valign="top" | k (kilo)
 +
| valign="top" | 2^10 = 1024
 +
| valign="top" | 10^3 = 1000
 +
| valign="top" | 2%
 +
|-
 +
| valign="top" | M (mega)
 +
| valign="top" | 2^20 = 1,048,576
 +
| valign="top" | 10^6 = 1,000,000
 +
| valign="top" | 5%
 +
|-
 +
| valign="top" | G (giga)
 +
| valign="top" | 2^30 = 1,073,741,624
 +
| valign="top" | 10^9 = 1,000,000,000
 +
| valign="top" | 7%
 +
|-
 +
| valign="top" | T (tera)
 +
| valign="top" | 2^40 = 1,099,511,627,776
 +
| valign="top" | 10^12 = 1,000,000,000,000
 +
| valign="top" | 9%
 +
|}
 +
 
 +
As you can see from the table above, the error in calculation is about 5% when the prefix is incorrectly interpreted. Please note that the speed your provider quotes you is the "speed in network units"; i.e., calculated on a decimal basis. For example when your provider tells you that your link is "ADSL 256/128", they mean 256000/128000 bps (bits per second). This means that the speed of your connection is 32000/16000 bps (bytes per second), since there are eight bits to a byte.
 +
 
 +
== Protocol overhead - what is it about? ==
 +
When [[aMule]] is running, it constantly "talks" with other [[client]]s and [[server]]s.
 +
This data exchange is needed to perform such tasks as identifying itself, requesting information about
 +
available [[FAQ_eD2k-Kademlia#What_is_a_source?|source]]s and [[file]]s, and performing [[search]]es.
 +
 
 +
Since this information has no direct use to the user itself, it is called "overhead"; i.e., a necessary addition to the data you want to [[upload]] or [[download]].
 +
 
 +
[[aMule]] calls this "''connection overhead''". However the number that [[aMule]] presents includes only the size of the actual data that [[aMule]] itself sends to the network stack. Later, these data are sent out on the network with even more overhead - that of the network protocols.
 +
How much is it? - let's see that in the next section.
 +
 
 +
== Network overhead ==
 +
First of all - we're talking about the [[IP address|IPv4]] network. Once upon a time, there
 +
was only one type of [[IP address|IP]] network. Now there are two - [[IP address|IP version 4]], the old protocol that we all know; and [[IP address|IP version 6]] - the new protocol made to fix the limitations of [[IP address|IPv4]].
 +
 
 +
[[FAQ_eD2k-Kademlia|ED2K protocol]] by design, is unable to talk over [[IP address|IPv6]] network, so users who have it (in Japan and China for example) will not be able to connect "as is". Using [[IP address|IPv4]] means, that each packet ([http://www.ietf.org/rfc/rfc793.txt TCP], [http://www.ietf.org/rfc/rfc768.txt UDP], [http://www.ietf.org/rfc/rfc792.txt ICMP]) will have an [[IP address|IPv4]] header.
 +
 
 +
The minimum size of this header is 20 bytes. The header can have optional parts (each of 4 bytes) and that is up to your provider - for example, mine adds an optional [dword].
 +
 
 +
When talking to other clients and servers on [[FAQ_eD2k-Kademlia|ed2k network]], [[aMule]] uses the widely known [http://www.ietf.org/rfc/rfc793.txt TCP] protocol. [http://www.ietf.org/rfc/rfc768.txt UDP] is also used, but on a much smaller scale. As you may already know, [http://www.ietf.org/rfc/rfc793.txt TCP] is a reliable protocol, i.e. it guarantees that data that is sent from one side will arrive on the other or an error will be reported.
 +
 
 +
To achieve this, [http://www.ietf.org/rfc/rfc793.txt TCP] sends its own data in addition to the actual "payload" data being transferred. These data include [http://www.ietf.org/rfc/rfc793.txt TCP] client initial negotiation, checksums, sequence numbers and acknowledgments. All of this is in the ''[http://www.ietf.org/rfc/rfc793.txt TCP] header'' that is added to each packet sent. The size of this header is a minimum of 20 bytes.
 +
 
 +
While being only a small overhead for a large bulk transfer, it can take significant part of bandwidth when small amounts of data are being exchanged. <u>This is exactly what happens on [[FAQ_eD2k-Kademlia#What_is_a_source?|source]] discovery part of [[aMule]]</u>.
 +
 
 +
Our [[client]] is trying to establish a connection and negotiate with a large number of other [[client]]s. Doing this, [[aMule]] opens new [http://www.ietf.org/rfc/rfc793.txt TCP] connections <u>''all the time''</u>. The number of connections opened is controlled by the ''"Maximum number of connections in 5 seconds"'' setting in the preferences.
 +
 
 +
A typical number is about 100. Each [http://www.ietf.org/rfc/rfc793.txt TCP] connection results in at least three packets traveling on the net - one is a SYN packet, i.e. connection request, and one an ACK or a RST when the connection is accepted or refused, and SYN+ACK to establish the session.
 +
 
 +
There's more overhead of [http://www.ietf.org/rfc/rfc1034.txt DNS] queries when an address is resolved, retries when a host doesn't reply and so on.
 
   
 
   
==Network speed - how much is it ?==
+
=== At low level ===
While talking about network speed, people are using "bps" units, which mean
+
After passing [http://www.ietf.org/rfc/rfc793.txt TCP] and [[IP address|IP]] layers packets go down to the network interface driver. What kind of driver this is depends on the way your computer is connected to the internet. For the sake of simplicity, we will assume that this computer is connected to the ISP directly; i.e., that you have no LAN (or switch or router) between.  
"bit per second". The reason for <i>bit</i> rather that <i>byte</i> is pretty
+
 
match historical, but also have engineering motivation behind. This motivation
+
Common setups include:
comes from the fact, that not all networks in the world are transferring bytes.<br>
+
There's also convention to use capital "B" in "Bps" when speed is marked
+
in "bytes per second". However, this convention is not widely accepted. Particularly, organizations like IETF and IEEE are stick to original "bps".<br>
+
 
   
 
   
==Prefixes==
+
* an analog modem, connected to a telephone line (ISDN modem falls in this category too);
Since their invention, networks made quite a progress, and now we have networks
+
* a cable modem, connected through ethernet, ISP gives you an [[IP address|IP]] address through [http://www.ietf.org/rfc/rfc2131.txt DHCP];
that transfers thousands and millions bits and more bits per second. For marking
+
* a cable modem, connected through ethernet, ISP requires you to configure PPPoE or PPTP tunnel;
those speeds, prefixes <i>"kilo"</i>, <i>"mega"</i>, <i>"giga"</i>, <i>"tera"</i>
+
* an ADSL modem, connected through ethernet. You must have a PPPoE or PPTP tunnel;
etc. are used. It is a <u>common mistake</u> to think that values with those prefixes are the same as in computer science, i.e. powers of 2. The truth is that, for historical reasons, prefixes in networking have a decimal base, and not a binary one.
+
* a variation on these, e.g., a modem connected to a computer via USB.
 
   
 
   
<table cellpadding="2" cellspacing="2" border="1" width="100%"
+
In each of the these setups there are different protocols in use, and different headers are added to transmitted packets. There's one important thing to note: <u>''ethernet frames travel between cable/ADSL modem and computer, and don't reach the ISP''</u>. Consequently, they're not counted in rate calculations. [http://www.ietf.org/rfc/rfc2516.txt PPPoE]; in constrast,
title="Table 1">
+
[http://www.ietf.org/rfc/rfc2637.txt PPTP] headers <u>''do reach the ISP''</u>. In this respect, your particular provider may or may not choose to include them in their rate calculations. For this reason, we have excluded those headers from our calculations.  
    <tr>
+
 
      <th valign="middle" title="Table 1" align="center"
+
If you think that your ISP includes it, add 4 bytes to the size of each packet.
bgcolor="#33ff33">Prefix<br>
+
      </th>
+
      <th valign="middle" align="center" bgcolor="#33ff33">meaning in computers<br>
+
      </th>
+
      <th valign="middle" title="Table 1" align="center"
+
bgcolor="#33ff33">meaning in networks<br>
+
      </th>
+
      <th valign="middle" align="center" bgcolor="#33ff33">difference, %%<br>
+
      </th>
+
    </tr>
+
    <tr>
+
      <td valign="top">K (kilo)<br>
+
      </td>
+
      <td valign="top">2^10 = 1024<br>
+
      </td>
+
      <td valign="top">10^3 = 1000<br>
+
      </td>
+
      <td valign="top">2%<br>
+
      </td>
+
    </tr>
+
    <tr>
+
      <td valign="top">M (mega)<br>
+
      </td>
+
      <td valign="top">2^20 = 1,048,576<br>
+
      </td>
+
      <td valign="top">10^6 = 1,000,000<br>
+
      </td>
+
      <td valign="top">5%<br>
+
      </td>
+
    </tr>
+
    <tr>
+
      <td valign="top">G (giga)<br>
+
      </td>
+
      <td valign="top">2^30 = 1,073,741,624<br>
+
      </td>
+
      <td valign="top">10^9 = 1,000,000,000<br>
+
      </td>
+
      <td valign="top">7%<br>
+
      </td>
+
    </tr>
+
    <tr>
+
      <td valign="top">T (tera)<br>
+
      </td>
+
      <td valign="top">2^40 = 1,099,511,627,776<br>
+
      </td>
+
      <td valign="top">10^12 = 1,000,000,000,000<br>
+
      </td>
+
      <td valign="top">9%<br>
+
      </td>
+
    </tr>
+
    <tr>
+
      <td valign="top"><br>
+
      </td>
+
      <td valign="top"><br>
+
      </td>
+
      <td valign="top"><br>
+
      </td>
+
      <td valign="top"><br>
+
      </td>
+
    </tr>
+
 
+
</table>
+
<br>
+
As you can see from the table above the error in calculation is about 5% when the prefix
+
is incorrectly interpreted. Please note that the speed your provider tells
+
you is "speed in network", i.e. calculated on decimal base. <br>
+
For example when your provider tells you that your link is "ADSL 256/128" you
+
should understand that he means 256000/128000 bps. Which means, that you have
+
64000/16000 bytes per second speed in your link.<br>
+
 
   
 
   
==Protocol overhead - what is it about==
+
=== Example ===
When amule is running, it constantly "talks" with other "mules" and servers.
+
Let's see how much network overhead we have on a typical network. Our connection is a cable modem connected via an ethernet link to a PC directly (no router between them).  
This data exchange is needed to identify itself, request information about
+
 
available sources and files, perform searches and so on. Since this information
+
In this setup we have [[IP address|IPv4]] packets sent over ethernet.  
has no use for the user itself, it's called "overhead" i.e. inevitable addition
+
 
to the data you actually want to upload or download. Amule calls this "<i>connection
+
Lets say we have 10 new connections opened each second, and all are being accepted (successfully established [http://www.ietf.org/rfc/rfc793.txt TCP] session). This alone sums up to (I'm counting data going up - from my computer to the net):
overhead</i>". However, the number amule presents, includes only the size of the actual
+
 
data that amule itself is sending to the network stack. Later, this data is
+
''10 connection * 2 packets * (20 bytes of TCP + 20 bytes of [[IP address|IPv4]]) = 800 bytes of overhead.''
sent down to the net with more overhead - now of network protocols. How much
+
 
is it - lets see that in the next section.<br>
+
This means that we are starting with 1.16*8 Kbps of "''invisible"''
+
==Network overhead==
+
First of all - we're talking about IPv4 network. Once upon a time, there
+
was only one type of IP network. Now there's 2 - IP version &nbsp;4, the old
+
we all know; and IP version 6 - the new one. ED2K protocol by design, is
+
unable to talk over IPv6 network, so users who have it (in Japan and China
+
for example) will not be able to connect "as is". Using IPv4 means, that each
+
packet (TCP, UDP, ICMP) will have IPv4 header. The minimum size of this header
+
is 20 bytes. Header can have optional parts (each 4 bytes) and it's up to
+
your provider &nbsp;- for example my add 1 option dword.<br>
+
When talking to other thing on ed2k network, amule uses the widely known TCP protocol.
+
UDP is also used, but in much smaller scale. As the reader might know, TCP is a reliable protocol, i.e. it's guaranteed that data which sent from one side will arrive on the other or an error will be reported. In order to achieve this, TCP send its own data in addition to the actual transfer. This data includes TCP client initial negotiation, checksums, sequence numbers and acknowledgments. All this is in the <i>TCP header</i> which is added to each packet sent. The size of this header
+
is 20 bytes minimum. While being small overhead for large bulk transfer, it
+
can take significant part of bandwidth when small amounts of data are being
+
exchanged. <u>This is exactly what happens on source discovery part of amule</u>.
+
Our client is trying to establish a connection and negotiate with a large number
+
of other clients. Doing this, amule opens new TCP connections <u><i>all the
+
time</i></u>. The amount of those connections is controlled by the <i>"Maximum
+
number of connections in 5 seconds"</i> setting in the preferences. A typical number
+
is about 100. Each TCP connection results in at least 3 packets traveling
+
the net - one is a SYN packet, i.e. connection request, and one an ACK or a RST
+
when the connection is accepted or refused, and SYN+ACK to establish the session.
+
There's more overhead of DNS queries when an address is resolved, retries when a
+
host doesn't reply and so on.<br>
+
+
===On low level:===
+
After passing TCP and IP layers packets go down to the network interface
+
driver. The kind of this driver depends on the way your computer is connected to the internet. For simplicity sake we will assume that this computer is connected to the ISP directly, i.e. you have no LAN (or switch or router) between.
+
Common setups that I'm aware of:<br>
+
+
<ol>
+
  <li>Analog modem, connected to telephone line (ISDN modem falls in this category too)</li>
+
  <li>Cable modem, connected through ethernet, ISP gives you an IP address through DHCP</li>
+
  <li>Cable modem, connected through ethernet, ISP requires you to configure PPPoE or PPTP tunnel</li>
+
  <li>ADSL modem, connected through ethernet. You must have a PPPoE or PPTP tunnel</li>
+
  <li>Variation of above - modem connected to PC by USB.&nbsp;</li>
+
+
</ol>
+
In each of above setups there are different protocols in use, and different headers added to transmitted packets. But there's one important thing to note: <u><i>ethernet frames traveling between cable/ADSL modem and PC don't reach the ISP</i></u>. And consequently they are not counted in rate calculations. PPPoE and
+
PPTP headers, on the contrary <u><i>do reach the ISP</i></u>. Whether or not
+
your particular provider includes them in rate calculations I obviously have
+
no idea about. For this reason I will exclude those headers from my calculations.
+
If you think that your ISP includes it, add 4 bytes to the size of each packet.<br>
+
+
===Example:===
+
Let's see how much network overhead we have on a typical network. Our connection  
+
is a cable modem connected via an ethernet link to a PC directly (no router between them).  
+
In this setup we have IPv4 packets sent over ethernet. <br>
+
Lets say we have 10 new connections opened each second, and all are being accepted
+
(successfully established TCP session). This alone sums up to (I'm counting data
+
going up - from my computer to the net):<br>
+
<br>
+
<i>10 connection * 2 packets * (20 bytes of TCP + 20 bytes of IPv4) = 800 bytes of overhead. </i><br>
+
<br>
+
This means that we are starting with&nbsp; 1.16*8 Kbps of "<i>invisible"</i>
+
 
overhead caused by the very way the network works. Now, let's assume that
 
overhead caused by the very way the network works. Now, let's assume that
after each connection is established our amule sends something to the other side
+
after each connection is established our amule sends something to the other side and waits to receive an answer.
and waits to receive an answer.<br>
+
 
<br>
+
''Total of 800 bytes + 800 bytes = 1600 bytes per second = 6400 bps = 6.4 Kbps''
<i>10 connections * (1 packet of data + 1 ACK)*(20 bytes of TCP + 20 bytes of IPv4) = 800</i><i> bytes of overhead. <br>
+
 
<br>
+
Total of 800 bytes + 800 bytes = 1600 bytes per second = 6400 bps = 6.4 Kbps<br>
+
</i><br>
+
 
What we have here is 6.4 Kbps of network overhead alone. Taking into account  
 
What we have here is 6.4 Kbps of network overhead alone. Taking into account  
 
that amule has other data to send (uploads) and it is not the only network  
 
that amule has other data to send (uploads) and it is not the only network  
application running we will have the following picture: Most chances that your
+
application running we will have the following picture:  
link to provider is not that fast. &nbsp;Amule will <u><i>try</i></u> to open
+
 
10 connections per second and will <u><i>try</i></u> to upload on the specified  
+
Most likely the link to your provider is not that fast. [[aMule]] will <u>''try''</u> to open 10 connections per second and will <u>''try''</u> to upload on the specified speed.  
speed. Your operating system will share all available bandwidth between those and between amule and other network applications (browser for example). Actual results will vary depending on specific OS settings.<br>
+
 
+
Your operating system will share all available bandwidth between those and between [[aMule]] and other network applications (browser for example). Actual results will vary depending on specific OS settings.
==ACK bottleneck==
+
 
 +
== ACK bottleneck ==
 
In all calculations above there was one assumption - zero download. But downloading is what amule was built for. So let's examine how the overhead  
 
In all calculations above there was one assumption - zero download. But downloading is what amule was built for. So let's examine how the overhead  
above affects your downloading speed. The answer is in TCP protocol. When TCP is sending  
+
above affects your downloading speed. The answer is in [http://www.ietf.org/rfc/rfc793.txt TCP] protocol.  
data, it requires from the other side to acknowledge the reception. So if client  
+
 
A is sending data to client B by TCP, B has to send a special ACK packets to A which tells B "ok, I got it". If, however, A doesn't receive the ACK packets  
+
When [http://www.ietf.org/rfc/rfc793.txt TCP] is sending data, it requires that the other side acknowledge the reception. So if client A is sending data to [[client]] B by [http://www.ietf.org/rfc/rfc793.txt TCP], B has to send a special ACK packets to A which tells B "ok, I got it". If, however, A doesn't receive the ACK packets in time, he will assume that either packet is lost.  
in time, he will assume that either packet is lost. So, without going deeply  
+
 
into TCP specification: <u><i>if B fails to send ACK to A, as a result A will
+
So, without going deeply into [http://www.ietf.org/rfc/rfc793.txt TCP] specification: <u>''if B fails to send ACK to A, as a result A will transmit slower''</u>.
transmit slower</i></u>. <br>
+
 
Now let's see the situation in amule. We saw in the previous chapter, that the uplink  
+
Now let's see the situation in [[aMule]]. We saw in the previous chapter, that the uplink stream is congested by connection requests and uploads. As a result, there's a good chance that ACK packets for a file we are downloading <u>''will not be sent on time''</u>.
stream is congested by connection requests and uploads. As a result, there's a
+
 
good chance that ACK packets for a file we are downloading <u><i>will not be sent  
+
The remote party will notice this and slow down. This is one more reason why the upstream should better not be too congested.
on time</i></u>. The remote party will notice this and slow down. This is one  
+
more reason why the upstream should better not be too congested.<br>
+
+
==Is there something I can do ?==
+
OK, now that you understood why your network is so slow while amule is
+
running you will maybe look for a way to fix this. The answer in 2 words: "rate limit".
+
The first thing you should do is to assign realistic rate limits in amule
+
itself. If you have a uplink rate of 128 Kbps don't set amules upload limit to
+
16 (kilobytes per second) just because 128/8=16.<br>
+
A better, but far more complicated solution is to use the QoS and packet scheduling
+
services of your OS. For example, you can give a higher priority to ACK packets
+
to solve the above mentioned "ACK bottleneck" problem. The QoS topic, however, is beyond
+
scope of this article.<br>
+
<br>
+
 
   
 
   
==Router (switch, home network):&nbsp; is there any difference ?==
+
== Is there anything I can do? ==
 +
OK, now that you understood why your network is so slow while [[aMule]] is
 +
running you will maybe look for a way to fix this. The answer in 2 words: "rate limit".
 +
 
 +
The first thing you should do is to assign realistic rate limits in [[aMule]]
 +
itself. If you have a uplink rate of 128 Kbps don't set [[aMule]]'s [[upload]] limit to 16 (kilobytes per second) just because 128/8 = 16.
 +
 
 +
A better, but far more complicated solution is to use the QoS and packet scheduling services of your OS. For example, you can give a higher priority to ACK packets to solve the above mentioned "ACK bottleneck" problem.
 +
 
 +
The QoS topic, however, is beyond the scope of this article.
 +
 
 +
== Router (switch, home network): is there any difference? ==
 
When the cable coming from your ISP is connected to some switching or routing  
 
When the cable coming from your ISP is connected to some switching or routing  
 
device, which in turn is connected to several PC's, bandwidth is shared between  
 
device, which in turn is connected to several PC's, bandwidth is shared between  
them. So, having N computers connected, an ideal device would simply provide  
+
them.  
each one of them with 1/N of the total bandwidth. The situation may vary in real  
+
 
life, and your particular device may have different idea about fairness. Since
+
So, having N computers connected, an ideal device would simply provide  
you're not going to have the hardware specs of your router chipset the only
+
each one of them with 1/N of the total bandwidth. The situation may vary in real life, and your particular device may have different idea about fairness.  
advice here is "try and see yourself". <br>
+
 
 +
Editor's note 1: for lots of cheap SOHO router devices this is not a case though.Many cheap SOHO devices do not apply any advanced packets scheduling or bandwidth allocation - actually they simple don't care at all.So basically if there is N connectins in total, each TCP connection may expect to get something like 1/N of bandwith without any classification by PC who made connection.So, PC which makes more connections may consume more bandwith than PC with fewer connections.Some routers may do more advanced bandwith allocation though.Also, if some of PCs not using bandwith, usually router is willing to give it to other clients.So if you're just a single active PC behind the router, in the ideal world you can expect to see speed similar to direct wiring.In a real world your mileage may wary though.
 +
 
 +
Editor's note 2: While switches are usually pretty dumb devices which are OK with handling of P2P traffic at full speed, situation is somewhat worse when it comes to routing, NATing (internet connection sharing) and using VPN with SOHO routers.Some (usually cheapest) SOHO routers may have too weak CPU or too few RAM on board or some similar hardware limitations.If router's CPU is too weak, it may fail to cope with routing and\or NATing of packets at full channel speed and aMule DOES produces lots of packets.So total speed could be reduced compared to direct wire connection (without router in the middle).If there is too few RAM on board, this will limit maximum possible number of connections and\or may cause instability or strange issues if router runs out of memory when dealing with lots of connections and THERE IS lots of TCP\IP connections and UDP traffic.Especially true if router also performs NATing since each TCP connection and UDP packets exchange are tracked by NAT for some time and if there is lots of TCP connections and UDP packets, NAT may need to use lots of RAM for connections tracking.Or it will have to limit number of connections.In worst case NAT will even fail to detect such scenario and system-wide "out of RAM" condition would occur in poorly engineered routers, leading to overall device instability\crashes\reboots\other strange things.
 +
 
 +
Editor's note 3: If you're running NAT you're definitely want to set up ports forwarding (or allow UPnP to do this for you) on your router.If remote peers are unable to initiate connections to your aMule (and this is a case if NAT used but no port forwarding) you will be unable to reach some peers.This leads to degraded aMule performance.
 +
 
 +
Some hints:
 +
* Choose your router device carefully!Read P2P-related and router-related forums to find out which routers are failing under heavy P2P load and which ones are OK.Generally, you may prefer full-featured routers with extra features like VPN support, built-in downloader, FTP server, USB storages support, etc.Such devices are powerful enough "by design" just to be able to cope with all their features.Poor choice of your routing device may result in a poor P2P experience.
 +
* Set up port forwarding on your router (or allow UPnP to do this for you).
 +
* Prefer to do not use ISPs with VPN connection method since VPN takes extra processing power on router and may be slower due to limited router's hardware resources and quite lots of processing power required.For ethernet routers ideal case is when ISP assigns you IP via DHCP and uses plain classic Ethernet without any extra layers like PPPoE or PPTP.
 +
 
 +
Since you're not going to have the hardware specs of your router chipset the only advice here is "try and see yourself".
 +
 
 +
Editor's note 4: lots of SOHO routers are actually just some computer-like devices with RAM, CPU and some networking hardware (like built-in switch) who running just usual Linux to display web interface (often thttpd does this), perform routing, NATing and firewalling (often implemented via iptables) or even to apply traffic priority or act as small FTP server, etc.So if you're really feeling like a hardcore networking and Linux guru you may be able to figure out how bandwith scheduling, NATing, routing, etc works, which limits are in play and even to be able to tweak all this as you like (taking hardware limits into account of course).But all this requires a decent amount of knowledge and some risk is involved.If you're not a such guru or do not want to apply hardcore tweaks to your router, you're better to choose device carefully and just have fun.
 +
 
 +
== Multiple links ==
 +
Until now, we talked about computers that are connected to the network through
 +
single interface. While being most frequent, this is not mandatory. A user
 +
may choose to connect via 2 (or more links) provided by different ISP's.
 +
There're 2 reasons for this decision that I know about: link redundancy and
 +
load balancing.
 +
 
 +
=== Link redundancy ===
 +
In a case of link redundancy second link becomes operational when primary
 +
link fails. This can be done automatically, or by explicit user command.
 +
When this setup used, [[aMule]] along with other network applications must be
 +
restarted when links are being switched. This will allow to bind new address,
 +
reconnect to server and receive new ID. If [[aMule]] is connected via &nbsp;[http://www.ietf.org/rfc/rfc3022.txt NAT]]
 +
enabled router (it doesn't matter if you have [[FAQ_eD2k-Kademlia#What_is_LowID_and_HighID?|low or high ID]]), and links
 +
are switched <u>''on the router''</u>, restart not needed.
 +
 
 +
=== Load balancing ===
 +
This is a far more complicated case. Both (all) links are simultaneously active,
 +
and traffic is being distributed between them. The problem is that [[aMule]]
 +
binds to <u>''all interfaces on the system''</u> i.e. 0.0.0.0. But, on [[FAQ_eD2k-Kademlia|ed2k]]
 +
your ID is your [[IP address|IP]] address, <u>''and you can not have two''</u>.<br>
 +
So the problem is that [[aMule]] does not explicitly choose the source
 +
address for ''<u>outgoing</u>'' [http://www.ietf.org/rfc/rfc793.txt TCP] connections. Note, that it <u>''doesn't
 +
matter''</u> on which interface it listens. This is exactly <u>''opposite''</u>
 +
to <u>''server''</u> applications like FTP or HTTP. When a client
 +
tries to connect to a server it discovers its IP address by resolving DNS. Resolver
 +
replies will contain all [[IP address|IP]] addresses of the specified host and a client should try them all. The server, in turn, may choose not to listen on
 +
one of them and thus prevent the client from using this interface. In our case
 +
[[aMule]] <u>''is a [[client]]''</u>, and the [[server|ed2k server]] discovers its address from the
 +
[[FAQ_eD2k-Kademlia#What_is_a_source?|source]] [[IP address|IP]] in the connection request. That's where the [[server|ed2k server]] will try to connect.
 +
If the connection succeeds the client is [[client]] assigned a [[FAQ_eD2k-Kademlia#What_is_LowID_and_HighID?|high ID]], if it doesn't the client gets a Low ID.
 +
The only solution in this situation (until [[aMule]] will have an ability to
 +
bind to specific address) is to use [[aMule]] on your "primary" link.<br>
 +
You can, however, cause [http://www.kernel.org Linux] to send packet through interface of your choice.
 +
But, most probably they will be dropped by your ISP's router as "spoofed" because the source [[IP address|IP]] address doesn't match the address the ISP assigned to that interface.

Latest revision as of 15:04, 24 September 2008

English | Español

Network speed: what you should know before asking questions

Preface

The purpose of this document is to clarify issues regarding network speed that arise from time to time in the aMule forum. Generally speaking, there are several reasons for questions about "aMule network":

  • the speed reported by aMule doesn't match your provider's given rate;
  • poor performance of aMule itself or another network application on the same computer; or
  • the key factors influencing network performance while aMule is running.

The intended audience for this document is users who want to gain a better understanding of network functionality in general, and, in particular, its implications for aMule functionality.

However, this page is not a comprehensive, general purpose "Network FAQ". If you were expecting something else, you might be interested in the aMule is slow FAQ.

Network speed - how fast is it?

When talking about network speed, people use the unit "bps", which means "bits per second". The reason that a bit is used rather than a byte is mostly historical, but also arises from the engineering behind the system, specifically, the fact that not all networks in the world transfer their traffic in bytes.

There is a convention to use a capital "B" in "Bps" when speed is marked in "bytes per second". However, this convention is not widely accepted. In particular, organizations like the IETF and IEEE have stuck to the original "bps".

Prefixes

Since their invention, networks have made a lot of progress. Today we have networks that transfer billions of bits per second. For measuring these speeds, we use prefixes such as "kilo", "mega", "giga", "tera".

It is a common mistake to think that values with those prefixes are the same as in computer science, i.e., powers of 2. The truth is that, for historical reasons, prefixes in networking have a decimal, not a binary base.

Prefix meaning in computers meaning in networks difference, %%
k (kilo) 2^10 = 1024 10^3 = 1000 2%
M (mega) 2^20 = 1,048,576 10^6 = 1,000,000 5%
G (giga) 2^30 = 1,073,741,624 10^9 = 1,000,000,000 7%
T (tera) 2^40 = 1,099,511,627,776 10^12 = 1,000,000,000,000 9%

As you can see from the table above, the error in calculation is about 5% when the prefix is incorrectly interpreted. Please note that the speed your provider quotes you is the "speed in network units"; i.e., calculated on a decimal basis. For example when your provider tells you that your link is "ADSL 256/128", they mean 256000/128000 bps (bits per second). This means that the speed of your connection is 32000/16000 bps (bytes per second), since there are eight bits to a byte.

Protocol overhead - what is it about?

When aMule is running, it constantly "talks" with other clients and servers. This data exchange is needed to perform such tasks as identifying itself, requesting information about available sources and files, and performing searches.

Since this information has no direct use to the user itself, it is called "overhead"; i.e., a necessary addition to the data you want to upload or download.

aMule calls this "connection overhead". However the number that aMule presents includes only the size of the actual data that aMule itself sends to the network stack. Later, these data are sent out on the network with even more overhead - that of the network protocols. How much is it? - let's see that in the next section.

Network overhead

First of all - we're talking about the IPv4 network. Once upon a time, there was only one type of IP network. Now there are two - IP version 4, the old protocol that we all know; and IP version 6 - the new protocol made to fix the limitations of IPv4.

ED2K protocol by design, is unable to talk over IPv6 network, so users who have it (in Japan and China for example) will not be able to connect "as is". Using IPv4 means, that each packet (TCP, UDP, ICMP) will have an IPv4 header.

The minimum size of this header is 20 bytes. The header can have optional parts (each of 4 bytes) and that is up to your provider - for example, mine adds an optional [dword].

When talking to other clients and servers on ed2k network, aMule uses the widely known TCP protocol. UDP is also used, but on a much smaller scale. As you may already know, TCP is a reliable protocol, i.e. it guarantees that data that is sent from one side will arrive on the other or an error will be reported.

To achieve this, TCP sends its own data in addition to the actual "payload" data being transferred. These data include TCP client initial negotiation, checksums, sequence numbers and acknowledgments. All of this is in the TCP header that is added to each packet sent. The size of this header is a minimum of 20 bytes.

While being only a small overhead for a large bulk transfer, it can take significant part of bandwidth when small amounts of data are being exchanged. This is exactly what happens on source discovery part of aMule.

Our client is trying to establish a connection and negotiate with a large number of other clients. Doing this, aMule opens new TCP connections all the time. The number of connections opened is controlled by the "Maximum number of connections in 5 seconds" setting in the preferences.

A typical number is about 100. Each TCP connection results in at least three packets traveling on the net - one is a SYN packet, i.e. connection request, and one an ACK or a RST when the connection is accepted or refused, and SYN+ACK to establish the session.

There's more overhead of DNS queries when an address is resolved, retries when a host doesn't reply and so on.

At low level

After passing TCP and IP layers packets go down to the network interface driver. What kind of driver this is depends on the way your computer is connected to the internet. For the sake of simplicity, we will assume that this computer is connected to the ISP directly; i.e., that you have no LAN (or switch or router) between.

Common setups include:

  • an analog modem, connected to a telephone line (ISDN modem falls in this category too);
  • a cable modem, connected through ethernet, ISP gives you an IP address through DHCP;
  • a cable modem, connected through ethernet, ISP requires you to configure PPPoE or PPTP tunnel;
  • an ADSL modem, connected through ethernet. You must have a PPPoE or PPTP tunnel;
  • a variation on these, e.g., a modem connected to a computer via USB.

In each of the these setups there are different protocols in use, and different headers are added to transmitted packets. There's one important thing to note: ethernet frames travel between cable/ADSL modem and computer, and don't reach the ISP. Consequently, they're not counted in rate calculations. PPPoE; in constrast, PPTP headers do reach the ISP. In this respect, your particular provider may or may not choose to include them in their rate calculations. For this reason, we have excluded those headers from our calculations.

If you think that your ISP includes it, add 4 bytes to the size of each packet.

Example

Let's see how much network overhead we have on a typical network. Our connection is a cable modem connected via an ethernet link to a PC directly (no router between them).

In this setup we have IPv4 packets sent over ethernet.

Lets say we have 10 new connections opened each second, and all are being accepted (successfully established TCP session). This alone sums up to (I'm counting data going up - from my computer to the net):

10 connection * 2 packets * (20 bytes of TCP + 20 bytes of IPv4) = 800 bytes of overhead.

This means that we are starting with 1.16*8 Kbps of "invisible" overhead caused by the very way the network works. Now, let's assume that after each connection is established our amule sends something to the other side and waits to receive an answer.

Total of 800 bytes + 800 bytes = 1600 bytes per second = 6400 bps = 6.4 Kbps

What we have here is 6.4 Kbps of network overhead alone. Taking into account that amule has other data to send (uploads) and it is not the only network application running we will have the following picture:

Most likely the link to your provider is not that fast. aMule will try to open 10 connections per second and will try to upload on the specified speed.

Your operating system will share all available bandwidth between those and between aMule and other network applications (browser for example). Actual results will vary depending on specific OS settings.

ACK bottleneck

In all calculations above there was one assumption - zero download. But downloading is what amule was built for. So let's examine how the overhead above affects your downloading speed. The answer is in TCP protocol.

When TCP is sending data, it requires that the other side acknowledge the reception. So if client A is sending data to client B by TCP, B has to send a special ACK packets to A which tells B "ok, I got it". If, however, A doesn't receive the ACK packets in time, he will assume that either packet is lost.

So, without going deeply into TCP specification: if B fails to send ACK to A, as a result A will transmit slower.

Now let's see the situation in aMule. We saw in the previous chapter, that the uplink stream is congested by connection requests and uploads. As a result, there's a good chance that ACK packets for a file we are downloading will not be sent on time.

The remote party will notice this and slow down. This is one more reason why the upstream should better not be too congested.

Is there anything I can do?

OK, now that you understood why your network is so slow while aMule is running you will maybe look for a way to fix this. The answer in 2 words: "rate limit".

The first thing you should do is to assign realistic rate limits in aMule itself. If you have a uplink rate of 128 Kbps don't set aMule's upload limit to 16 (kilobytes per second) just because 128/8 = 16.

A better, but far more complicated solution is to use the QoS and packet scheduling services of your OS. For example, you can give a higher priority to ACK packets to solve the above mentioned "ACK bottleneck" problem.

The QoS topic, however, is beyond the scope of this article.

Router (switch, home network): is there any difference?

When the cable coming from your ISP is connected to some switching or routing device, which in turn is connected to several PC's, bandwidth is shared between them.

So, having N computers connected, an ideal device would simply provide each one of them with 1/N of the total bandwidth. The situation may vary in real life, and your particular device may have different idea about fairness.

Editor's note 1: for lots of cheap SOHO router devices this is not a case though.Many cheap SOHO devices do not apply any advanced packets scheduling or bandwidth allocation - actually they simple don't care at all.So basically if there is N connectins in total, each TCP connection may expect to get something like 1/N of bandwith without any classification by PC who made connection.So, PC which makes more connections may consume more bandwith than PC with fewer connections.Some routers may do more advanced bandwith allocation though.Also, if some of PCs not using bandwith, usually router is willing to give it to other clients.So if you're just a single active PC behind the router, in the ideal world you can expect to see speed similar to direct wiring.In a real world your mileage may wary though.

Editor's note 2: While switches are usually pretty dumb devices which are OK with handling of P2P traffic at full speed, situation is somewhat worse when it comes to routing, NATing (internet connection sharing) and using VPN with SOHO routers.Some (usually cheapest) SOHO routers may have too weak CPU or too few RAM on board or some similar hardware limitations.If router's CPU is too weak, it may fail to cope with routing and\or NATing of packets at full channel speed and aMule DOES produces lots of packets.So total speed could be reduced compared to direct wire connection (without router in the middle).If there is too few RAM on board, this will limit maximum possible number of connections and\or may cause instability or strange issues if router runs out of memory when dealing with lots of connections and THERE IS lots of TCP\IP connections and UDP traffic.Especially true if router also performs NATing since each TCP connection and UDP packets exchange are tracked by NAT for some time and if there is lots of TCP connections and UDP packets, NAT may need to use lots of RAM for connections tracking.Or it will have to limit number of connections.In worst case NAT will even fail to detect such scenario and system-wide "out of RAM" condition would occur in poorly engineered routers, leading to overall device instability\crashes\reboots\other strange things.

Editor's note 3: If you're running NAT you're definitely want to set up ports forwarding (or allow UPnP to do this for you) on your router.If remote peers are unable to initiate connections to your aMule (and this is a case if NAT used but no port forwarding) you will be unable to reach some peers.This leads to degraded aMule performance.

Some hints:

* Choose your router device carefully!Read P2P-related and router-related forums to find out which routers are failing under heavy P2P load and which ones are OK.Generally, you may prefer full-featured routers with extra features like VPN support, built-in downloader, FTP server, USB storages support, etc.Such devices are powerful enough "by design" just to be able to cope with all their features.Poor choice of your routing device may result in a poor P2P experience.
* Set up port forwarding on your router (or allow UPnP to do this for you).
* Prefer to do not use ISPs with VPN connection method since VPN takes extra processing power on router and may be slower due to limited router's hardware resources and quite lots of processing power required.For ethernet routers ideal case is when ISP assigns you IP via DHCP and uses plain classic Ethernet without any extra layers like PPPoE or PPTP.

Since you're not going to have the hardware specs of your router chipset the only advice here is "try and see yourself".

Editor's note 4: lots of SOHO routers are actually just some computer-like devices with RAM, CPU and some networking hardware (like built-in switch) who running just usual Linux to display web interface (often thttpd does this), perform routing, NATing and firewalling (often implemented via iptables) or even to apply traffic priority or act as small FTP server, etc.So if you're really feeling like a hardcore networking and Linux guru you may be able to figure out how bandwith scheduling, NATing, routing, etc works, which limits are in play and even to be able to tweak all this as you like (taking hardware limits into account of course).But all this requires a decent amount of knowledge and some risk is involved.If you're not a such guru or do not want to apply hardcore tweaks to your router, you're better to choose device carefully and just have fun.

Multiple links

Until now, we talked about computers that are connected to the network through single interface. While being most frequent, this is not mandatory. A user may choose to connect via 2 (or more links) provided by different ISP's. There're 2 reasons for this decision that I know about: link redundancy and load balancing.

Link redundancy

In a case of link redundancy second link becomes operational when primary link fails. This can be done automatically, or by explicit user command. When this setup used, aMule along with other network applications must be restarted when links are being switched. This will allow to bind new address, reconnect to server and receive new ID. If aMule is connected via  NAT] enabled router (it doesn't matter if you have low or high ID), and links are switched on the router, restart not needed.

Load balancing

This is a far more complicated case. Both (all) links are simultaneously active, and traffic is being distributed between them. The problem is that aMule binds to all interfaces on the system i.e. 0.0.0.0. But, on ed2k your ID is your IP address, and you can not have two.
So the problem is that aMule does not explicitly choose the source address for outgoing TCP connections. Note, that it doesn't matter on which interface it listens. This is exactly opposite to server applications like FTP or HTTP. When a client tries to connect to a server it discovers its IP address by resolving DNS. Resolver replies will contain all IP addresses of the specified host and a client should try them all. The server, in turn, may choose not to listen on one of them and thus prevent the client from using this interface. In our case aMule is a client, and the ed2k server discovers its address from the source IP in the connection request. That's where the ed2k server will try to connect. If the connection succeeds the client is client assigned a high ID, if it doesn't the client gets a Low ID. The only solution in this situation (until aMule will have an ability to bind to specific address) is to use aMule on your "primary" link.
You can, however, cause Linux to send packet through interface of your choice. But, most probably they will be dropped by your ISP's router as "spoofed" because the source IP address doesn't match the address the ISP assigned to that interface.