August 9, 2020

Blockchain and P2P Exchange: Repeating Scalability Mistakes

In 2001, when BitTorrent was invented, another p2p network that was already well known and worked on the same most principles, began to give up their positions. She was a victim of her own success: she had almost 150,000 users, and community-supported directory servers could not cope with the load. Three hours after connecting, the new “server” reached the bandwidth limits and disconnected.

</p>

How cryptocurrency can (not) learn from the mistakes of others

How is this related to cryptocurrencies? You may encounter similar problems when trying to fully synchronize a crypto wallet, for example, the full Ethereum client. Distributed technology has scalability issues. For those who began their “online career” with unconsciously documenting the decline of the p2p network, which was mentioned above, this situation is surprisingly familiar.

In this article I will try to describe the rootsproblems, identify its connection with cryptocurrencies, analyze how it was solved in the p2p network, and also consider why these solutions are partially or completely unsuitable for the cryptosphere.

Distributed Scalability History

You can draw many parallels of the cryptosphere withp2p file sharing. Many of them are wonderfully described in a four-part series of Simon Morris’s “Bittorrent Lessons for Crypto” articles. I agree with most of his points, except for one thing: BitTorrent was not the first system to provide a simple andreliable file sharing over slow and unreliable channels. Even if you do not take into account the 1986 Z-Modem, which already had many functions for stability, BT was only simplified version of an existing technology: a system that allowed many people to upload a single file and share fragments with each other to increase the download speed and, most importantly, reduce the load on the one who downloaded the file. This system also allowed you to place hyperlinks to shared files on regular sites. There was at least one such system - eDonkey 2000 or eDonkey, or ed2k. The principles of operation of BitTorrent and eDonkey are so similar that BitTorrent can hardly be called a clone of the latter.

The main difference between BT and ed2k isthat in ed2k you could access files without a third-party website. No need to go to Pirate Bay. You could just search the connected eDonkey directory server and access all the files shared by other members. This could be done using the good old search bar. What happened outside of this search line was both the happiness and the curse of the web.

If you do not have technical knowledge, then simply skip the text below in italics.

In order to download a file from BitTorrent, you first need to find and download a torrent file from some web server, and put it in the BT client. Then he himself will deal with all the rather simple things: he will connect to the tracker or trackers to get access to the desired file, then he will get a list of everyone who can download fragments of this file from and start downloading.

Neither the tracker, nor the participants can be found without a torrent file, which must be posted by someone on the website. This means that if you want to publish a file, then you need to:

  • run tracker for this file
  • create a torrent file with the path to the tracker
  • upload torrent file to website
  • distribute the link to the torrent file

Here an additional level of complexity arises, an additional intermediary. Of course, sites that publish such links, such as Pirate Bay, very often break the law.

In order to publish a file in eDonkey, you had to:

  • launch eDonkey client
  • put the file in the desired folder on the computer

And that’s it! Anyone could connect to the same "eDonkey server" by simply typing part of the file name in the search bar. Push the button, and you're done. Everyone who uploaded the same file automatically shared fragments of it with others, as in BitTorrent. Thus, the ed2k server played the role of both a website and a tracker.

Please note that neither party stores valid file data, therefore it does not infringe copyright directly. Of course, everyone who uploads or downloads a real file violates these rights.

In addition, to use ed2k it was possiblePublish hyperlinks to websites. If it was more convenient for users to receive a link through a trusted website, then they could use this option as well.

Of course, no one thought that only one serverwill do all the work of the search service and track information about all files on the Internet. Therefore, anyone could run their eDonkey server. The servers connected to each other, forming their own network, as is the case with cryptocurrencies.

However, there was one small problem. How to find the files that “my server” - the server to which I connect - does not track?

How did BitTorrent solve this problem? No way, at least in the beginning. If your torrent file does not have an active tracker, then nothing will work. (This system feature also prevented some BT issues from occurring, but I will talk about this another time *.)

On the other hand, eDonkey servers sent a list of all the servers they knew during the connection. It allowed ... send a file request to all servers. That is why the eDonkey network has died.

An unlimited number of eDonkey servers could exist on the Internet. Every customer could request a file from your server. Adding a new server did not reduce the load on other servers, but only increased the overall search traffic of eDonkey on the Internet. As more and more clients learned about the eDonkey server, its Internet connection, and often it was a home DSL connection, became too busy. The only way to continue was to get a different IP address from the provider.

Adding new customer also affected every server this client could find. But I’ll tell you about how this discovery happened another time. It turned out that the maximum amount customers throughout the networkthat the servers can withstand is about 150,000. This is nothing compared to modern standards, but we must remember that the bandwidth of the DSL connection was then 256 Kbps.

I participated in the eDonkey community andwas in the midst of this disaster. I tried to find solutions to the problem and tried unsuccessfully to convince people that they did not use network-loading tools that increased traffic, and so went out of control.

That is why when I first heard about howBitcoin works, which happened long before its official appearance, I laughed. And you had to start mining. Understanding the technical limitations does not guarantee an understanding of the principle of greed.

The problem of cryptocurrency scalability

If you know anything about distributed registries, then you may know that everyone full-fledged the client of such networks must have full copy of the registry (or at least the current version of eachregistry account and some history), which means that you need to download all network updates. As in the case of edonkey servers, the creation of another full-fledged client does not reduce the load on other full-fledged clients, but only burdens everyone, since clients need to send copies of transactions to yet one computer that also generates transactions.

Creating an “easy” client or “lich” does not have a positive effect on full clients. In this case, all of them also you have to process the transactions of another client, however, at least one of them now receives requests from light clients every time they need to check the balance.

Thus, the load on the network increases withthe emergence of each new client, whether it is a full or light client. It cannot be reduced by any means built into the tools. Doesn’t resemble anything?

If there are miners on the network, then the same problemappears also at them: they should process and sign each transaction. Typically, these computers and their connections are more suitable for the server role, in the end, they make money on transactions. But even in this case, overloading the connection is only a matter of time. Especially if miners become actual servers on the network, when all clients turn into light ones, as the requirements for full clients are too high for non-commercial use.

File hosting solution

If you are an avid techie, you might think, “Why didn't they just use distributed hash tables?”

And here's why: it all happened before distributed hash tables were invented.

DHT (Distributed Hash Tables - “Distributed Hash Tables”) distribute hash tables, essentially ordinary databases, over a network of computers, so each one is responsible for part data. Looking ahead, I’ll say that this way you can store much more data than fit into the memory of any of the participating computers. If everything is done correctly, then each participant will receive requests only of “their” part of the hash table, which distributes the load between many nodes. Adding Nodes to DHT reduces load on each node, but does not increase it. That is why today DHT options are used almost everywhere, starting with Google.

In fact, DHT is also a solution for ed2k(or rather, for the Overnet protocol), and for the problem of the segregation of the BitTorrent tracker. In Overnet, as in DHT BitTorrent, each client becomes a tracker for a certain part of the network, providing much more resources for search. Overnet eliminates the need for an entire network of eDonkey servers.

Can cryptocurrency learn from the mistakes of others?

Distributed hash tables are like magicthe wand. They allow you to "sharding" almost all information, protecting the nodes from too many requests. Can this solution be used for cryptocurrencies?

In short, no (a more detailed answer is provided below). Longer answer: possible, but all the initial blockchain promises must be kept.

Reliability of the "distributed registry" for the most partparts is ensured by the fact that distributed registers, unlike distributed hash tables, are distributed only as it happens with newspapers - everyone gets their own copy. The division of registries will reduce the availability of data and, consequently, their reliability.

Here is another analogy with file sharing: You can download a movie only if there are enough sources on the network at the same time to download the entire file. If 300 sources have the beginning of the file, and 50 have its end, and no one has the middle of the file, then the entire movie will not load. It is sad, but not the end of the world. It is much worse when some part of the money is not enough in the wallet.

Of course, you can copy parts repeatedlyregistry to ensure the presence of a sufficient number of customers ... But enough - how much? Is it good when all your money can go offline with a chance of 1 in a thousand? 1 to a million? As long as all nodes belong to ordinary people, you never know how many participants can go offline at any given time. What if a massive power outage occurs in Sudan, and all copies of your wallet will be stored there, even if you are in the United States?

Of course, you can always serve yourselffull client, and the fragmentation protocol will ensure a constant availability of a copy of the wallet and transaction history on your computer. But you can look at it from the other side: let's say you are selling something, and the buyer is the only participant who has evidence of the availability of funds ... Do you agree to such a deal?

There is also a slight security issue.network - it is provided by miners. Of course, in this case, the block cannot be signed by each miner. Otherwise, in a network without servers, the server role will have to be shifted to miners: each miner will have to process all transactions on the network, heavily loading its connection.

This means that fragmentation is automaticallywill reduce the security costs of the cryptocurrency network, and it will become easier to attack. Today, there are only two cryptocurrency networks that can afford to reduce security costs: Bitcoin and Ethereum. Almost all other cryptocurrency networks have already been attacked by 51%. This proves one thing: the computational power of attackers is simply amazing.

All this leads us to the following. If you fragment the cryptocurrency registry, then who can guarantee that the fragment that you have is really part of the registry? In the end, it turns out a network of untrusted nodes that cannot be trusted. What happens if some part of the registry is “in the hands” of an attacker?

Three arguments are distinguished here:

  1. Logical negligence (no one will be able to capture all the nodes on which a certain fragment is stored).
  2. Statistics (the probability that all nodes with a certain fragment will be on the same network is very small).
  3. Encryption (cryptographic protection prevents this problem).

The problem is that people tend to greatly underestimate the size and strength of cybercriminals' networks in the world of cryptocurrencies. For example, at the beginning of 2019 there was a 51% attack on Ethereum Classic, which could cost attackers $ 55 million (or at leastat least half of this amount, given the decline in cryptocurrency prices). The attackers later returned the stolen tokens for $ 100 thousand. It turned out that it was just a test or warning. It seems that to the attacker this attack cost no more than $ 1 million, because supposedly there was about this amount on his wallet.

If someone can get a huge amountprocessing power at a relatively low price (compared with the money that can be stolen), then it offers incredible opportunities. Items 1 and 2 can be handled by a DDOS attack of "real" nodes, taking their place. As for paragraph 3, I propose to turn to history.

On the eDonkey network, the file name was not the key to it. Any member could change the file name. The file path was obtained using cryptographic hashes. Suppose you clicked on a verified link, for example, to download a new blockbuster. You will get the hash and file size.

After a few hours or days, the movie loaded,you made popcorn in the microwave and, sitting on the sofa with your soulmate, turned on the movie and ... They closed it right away, because it turned out to be hardcore porn.

After long explanations with, possibly, yourssoulmate, you will begin to understand what happened. And the following happened. To address the file, the MD4 hash algorithm was used. You cannot decrypt the hash into data due to loss of information, however you you canwith some effort finding data,which have the same hash and size. It is difficult to crack a hash by exhaustive search; it takes a lot of computing power and time. But for cracking the MD4 hash, it is not necessary to do exhaustive search: a few years ago a vulnerability was found in the algorithm that allowed to recreate the data of a specific hash.

Hacking modern hashes requires moreprocessing power than before. But, as we have already seen, the amount of computing power of the attackers is simply colossal and, it seems, to acquire it is much easier than it seems. For security, you need more than a couple of cryptographic hashes.

I am not saying it's impossible to scaleblockchain network. This can be done using conversion, sacrificing certain properties, which were already mentioned above. The result will be something other than the world's best duplicated distributed registry, a copy of which everyone has. Why is this necessary at all? It is much easier to solve all problems with traditional databases or private blockchains without spending a huge amount of electricity to prove the work done.

*: I was one of those people who provided ed2k services to the community that BitTorrent simply didn't need: server lists. Torrent files were an ugly intermediate step and included a path to finding trackers. Ed2k: // links were not like that, and at that time almost everyone had dynamic IP addresses, and DNS support in the edonkey protocol was severely lacking, which made it impossible to even enter the eDonkey network without an actual list of servers.

</p>