Clusters - a Proposed Topology for a Peer-to-Peer Network

Philip Ganchev (pgan002)
philip@cs.pitt.edu
2000/10/9

Abstract

This is a proposal to modify the Gnutella protocol, in partcular the topology of the networks. The current Gnutella protocol has a number of limitations, some of which are:

The proposed architecture is an attempt to reduce the redundant network traffic. It is orthogonal to efforts to address the other issues, and orthogonal to other solutions, such as caching. It specifies the protocol only to the level required by the architecture.

Introduction

Gnutella is a peer-to-peer filesharing internet service. The network of peers consists of nodes and connections between them. A Gnutella a node knows only its own neighbours, and passes new multicast packets (search requests) to all of them. Since networks have no enforced structure, they are often highly cyclic. This results in some nodes receiving the same message from more than one neighbour, which is redundant.

Related Proposals

One approach to more intelligent routing, is to reveal a larger part of a node's neighbourhood. This approach has been the basis of several proposals for the new version of the Gnutella protocol:
  • Ben Houston's Grid Proposal;
  • Joseph Wasson's Proposal for New Gnutella'esque Architecture;
  • Sebastien Lambla's Monaco Proposal;
  • and in fact most proposals addressing the problem, including this one.

    The Grid Proposal suggests that the whole network have regular structure, for example a 2D matrix where each internal node is connected to four other nodes. The message-passing protocol can then be optimised for the network's underlying structure. This solution has two limitations. First, it imposes on the network a global structure which is difficult to maintain as nodes connect to, and disconnect from the network at high rate. Second, some nodes have lower connection bandwidths than others. (For example, a computer using a 56Kb modem has lower bandwidth than one with a cable connection.) So, nodes with low bandwidth would hog the transfer, while those with high bandwidth would have unused bandwidth. The Proposal leaves open the problem of connecting two networks.

    Wasson's and Lambla's proposed architectures are both hierarchical. They have faster-connection nodes play the role of server/proxies for many slower-connection nodes, which connect to few servers. Thus they distribute the connection load more evenly. However, they are susceptible to failure of or malice by the nodes high in the hierarchy. Also, the schemes are arguably complex to implement.

    Proposed Topology

    Rather than enforcing topology structure on the network at large, it may be easier to manage it on a more local level. The network can be viewed as a supernetwork of clusters of nodes (Figure1). Each cluster is a network with regular topology, known to the nodes in the cluster. Thus, each cluster is a small `grid' à la Grid Proposal. Let us call two nodes `co-nodes' if they are in the same cluster. Different clusters may have different topologies. Additionally, a node in a cluster may be connected to a node in another cluster. Those two nodes act as proxies between the two `neighbouring clusters'. Message passing within a cluster would be done according to a protocol optimised for the cluster's topology, making it efficient and fast. For example, if a cluster is fully connected, a node initiating or receiving a multicast message, can pass it to all its co-nodes (its neighbours). None of them should then resend the message within the cluster. This eliminates some of the traffic redundancy due to cyclicity, but there is still cyclicity between clusters. To remove some of that, we can prevent co-nodes from connecting to the same neighbouring cluster. This can be achieved by giving an ID to each cluster, and having nodes know their cluster's ID and that of all neighbouring clusters. Notice a node does not know which of its co-nodes is connected to a given neighbouring cluster. Nor does it know which node of a neighbouring cluster is connected to its own cluster. Of course, there are still cycles spanning groups of three or more clusters. Some cyclicity is desirable in the network due to the high node-turnover rate, but how much is desirable, is still an open question.

    An example instance of the proposed architecture.
    Figure1. An example instance of the proposed architecture. The connections between nodes in a cluster are not shown.

    Protocol

    On entering the network, a node can either form a new cluster, or join an existing one. (It is not certain which nodes should be given that choice.) A new cluster's topology is specified by its founding node. An optimal protocol is determined by the topology and can be hard-coded into an implementation of a node.

    On joining a cluster, a node finds a place in the structure by iterated referral: if it contacts a member node that does not know of a spare space in the structure, the member node refers it to its neighbours. The cluster makes the other connections to the new node, necessary to preserve the topology. The new node receives a list of all neighbouring clusters from one of its neighbours and is free to connect to nodes of other clusters.

    When a node connects to an outside cluster, it pass the ID of the outside cluster around its home cluster.

    A multicast message is passed through the cluster, and to neighbouring clusters. It reaches each member node once and travels to each neighbouring cluster.

    There is a problem when a node disconnects -- its co-nodes think they are still neighbouring with all the clusters they were before. One solution is to find the neighbourng clusters over again, as they were found initially. But this may mean more communication overhead than the whole scheme saves.

    Discussion

    The proposed architecture should reduce the traffic redundancy due to network cyclicity. It does not bring anonymity or security any lower than they are in the current Gnutella. It is decentralised and less susceptible to failure of single nodes, than a hierarchical architecture.

    It allows merging of networks naturally, in the same way as connections within networks.

    The architecture must be validated against simulations and empirical data. It is unclear whether its assumptions hold in real Gnutella-type networks. The exact overhead it incurs is also unclear.

    Further consideration (albeit purely abstract) can be given to using clusters of clusters, or having nodes belong to more than one cluster.

    References

    1. Gnutella.
    2. gPulp (formerly known as GnutellaNG - "New Generation").
    3. Ben Houston, Grid Prpoposal, 2000/8/7.
    4. Sébastien Lambla, Monaco Proposal, 2000/4/14/.
    5. Ben Houston, Peer-to-Peer Networking Ideas.
    6. Joseph Wasson, Proposal For the New Gnutellesque Architecture.