Notional Cosmos Hub Report - November 2023

Given the events of the past week, it is worth taking a bit of space in this report to address some elephants in the room. As you may have seen in this tweet, a significant number of engineers have decided to resign from Notional. One now-resigned engineer, Long, had responsibilities for some of our validator keys as he publicly noted here. As we described in this announcement, we have no reason to believe Long would harm our delegators (say, for example by double signing with the keys he controls and jailing a validator), but in the interest of security, we have been forced nonetheless to bring down all of our existing validators.

Some significant restructuring is now in the cards for us, but we are in the process of hiring and training new developers and will be standing up new validators across the Cosmos in the near future. Please stay tuned to our official Twitter account for updates on this and please be cautious of impersonator accounts.

With all that said, our work on protecting the Cosmos Hub has not ceased, and we have some important updates to share, so let’s dive in.

Peer-to-Peer Storms

Our CEO Jacob recently released a complete Twtter package containing all the work we have done in collaboration with other incredible teams on the security threat which has now been dubbed “Peer-to-Peer Storms” or (P2P for short). Now that everything is public, we can dive into this threat in more detail.

What is P2P Storm?

A P2P storm occurs when validators across a network struggle to keep up with the volume of transactions being broadcast. It is called ‘peer-to-peer’ because it originates at the CometBFT level where validators gossip transaction information to each other before coming to consensus on a given block. It can occur through natural (but abnormally high) network usage, but it can also be induced by carefully designed attacks. The Notional Security team has evidence of P2P storms having occurred on several different Cosmos mainnets including Sentinel, Stride, and Terra.

How can it happen?

The critical condition for a P2P storm to occur is for the mempool of a Cosmos blockchain to become full very quickly. In our complete vulnerability report, we identify three methods an attacker could employ to induce this state:

  1. BananaKing - Named after transactions created by a wallet on Osmosis, the BananaKing method involves filling the mempool with a small number of unusually large transactions. These will be successful because IBC-go does not check length limits of receiver and memo fields.
  2. Goldilocks - Transactions do not have to be crafted maliciously to be dangerous. Several valid transaction types currently exist which can also be spammed to produce a P2P Storm. These transactions are smaller than the BananaKing style ones, and so more of them are required to quickly fill the validator mempools.
  3. GattlingGun - Finally, perfectly normal transactions can also be used to fill the mempool, but need to be spammed extremely rapidly by an attacker or by anomalous user network activity. It is conditions like this that led to the halting of the Terra blockchain during the collapse of Luna in 2022.

There may be other ways to induce a P2P storm on Cosmos chains, but these are explicitly identified by our research.

What are the on-chain effects?

In their most extreme versions, P2P Storms can cause Cosmos blockchains to halt completely. In one instance on the Cosmos Hub Replicated Security testnet, a BananaKing-style attack halted the chain for over 30 minutes. More typically, these network conditions result in severely degraded block production times, which can put user funds at risk on chains where liveness and safety are closely intertwined like leveraged defi protocols.

Going forward?

Unfortunately, despite Jack Zampolin of StrangeLove Labs and Zarko Milosevic of Informal Systems claiming that P2P storms are not a bug, our research indicates that they can have devastating impacts on blockchain performance if executed as an attack. Our full vulnerability report outlines a more complete history, descriptions, and data analysis on P2P Storms, but the good news is that our efforts over the past month have been paying off. With Proposal 833, 834, and 835, we have introduced several parameter changes that help to mitigate the effect of storms on the Cosmos Hub. We are encouraged to see other Cosmos chains adopting similar changes as well. In Pull Request 2800, we are also working on some commits that can prevent consensus failures during P2P storms, and in PR 4917 in IBC-go we are trying to fix the issues that lead to BananaKing style transactions. Due to the complexity of the issue, a complete solution has not been identified yet and research is still ongoing.

Validator Keys

Our circumstances in the past weeks have revealed a deep weakness within the Cosmos. To understand this weakness, and how we intend to begin contributing to its repair, we need a short review of how validator nodes work.

Validator nodes, unlike normal accounts, have two sets of private keys and public keys. The first is the consensus key, which is used to sign blocks during the CometBFT consensus rounds. The second is the valoper key (validator + operator = valoper), which is used to sign normal transactions such as claiming rewards, bonding, unbonding, and so on. For validators to maintain property security practices, both of these key pairs should be alterable through a process called key rotation, but at present, in Cosmos, this process does not exist for either pair of keys.

For those unfamiliar, key rotation is the process which keys are changed for an account, and has a few basic steps:

  1. A new public-private key pair is generated
  2. A validator configuration is updated to use the new key pair
  3. key rotation transaction is created and executed by the chain.
  4. On completion of the transaction, all references to the old key are replaced by the new key.

There is good news though! The rotation of consensus keys is already being worked on and should be available in SDK version 0.51. Since it remains unclear when exactly Cosmos blockchains might start upgrading to this version, we may need to speed up its availability by back-porting it to older versions of SDK once we deem it safe.

Rotation of the valoper keys, however, is not currently being worked on by any team in Cosmos and is another essential piece in proper validator risk management. We intend to make its development a key focus under our Prop 104 mandate going forward to be made available so that the entirety of the Cosmos may benefit from greater security for its validators.

Conclusion

We believe in a safe, transparent, and flourishing Cosmos, and this most recent sequence of events will not deter us from contributing to it. We continue to view P2P storms as a pressing matter for the Cosmos and will continue our research into how to best temper the effects of these network conditions. And we are committed to helping other validators in the Cosmos avoid the issues we have recently faced by contributing to the development of key-rotation tools. As always, live tracking of our activity on the Cosmos Hub can be found here.