Analysis of price feed data requests #2071

drcpu-github · 2021-09-15T20:55:48Z

drcpu-github
Sep 15, 2021
Collaborator

While most people that read below post will be familiar with the terminology I am using, I want to first preface it with a small introduction to make sure everyone can follow it. The Witnet oracle operates using a commit-reveal scheme powered by crowd-attestation. Essentially this means a "random" number of nodes is (deterministically) selected to solve the data requests being sent into the network. When a node is eligble to solve a data request, it will first publishes a commit transaction and, one or more epochs later, publish the result referenced by this commit on the network. The miners in the network will then validate the revealed values using the rules described in the data request and one miner will include the final result into a block. Whenever a reveal is in accordance with these rules, the node is marked as being honest and is rewarded. However, when a reveal does not adhere to these rules, it is marked as a liar and the node that published this result will lose its collateral.

Recently I noticed that a significant part of the data requests for the price feeds maintained by the Witnet Foundation are composed of many reveals which are marked as liars. All of these data requests use a standard deviation filter of 1.5, but the number of witnesses is either 10 or 100 depending on the request.

I looked at some data requests in detail and one example of a data request that stood out is this one. The final revealed value is 2580284 (which is revealed by 8/10 witnesses). It also contains two liars which reveal the value 2580172. Technically, these two reveals should be marked as liars as they exceed the standard deviation times 1.5 (it's approximately 2.5x away from the average). However, it seems rather ridiculous to mark a reveal which deviates from the average by 0,004% as a liar.

I also looked at two of my nodes (running on the same VPS) and I noticed that while the revealed values are often the same, sometimes the reported prices are slightly different, even if they are fetched in the exact same second (as printed by the debug log). This means the can most likely be attributed to an almost simultaneous price update by the external API.

I then analyzed the data requests from the last 10-ish days and will summarize some numbers below.

Data requests analyzed: 9257
Perfect data requests: 1797
Data requests with liars: 7460
Data requests with only non-revealers: 635
Data requests with >10% liars: 2198

In total, I found 9.2k price feed data requests of which 1.8k did not contain a single liar. The overwhelming majority of 7.5k, however, did. I was specifically intrigued by the last category. In almost 25% of the data requests more than 10% of all witnesses are marked as liars. This seems excessive. Note that I have filtered all non-revealers from the list of reveals before building this category. Essentially, this means that the data requests in this category do indeed exclusively contain more than 10% 'wrong prices'.

In total, there are 167.967 reveals in this last category of which 28.839 (17.2%) are marked as lies. Below table summarizes how far they are from the average revealed value (note that this is different from the final tallied average value):

stdev times	amount
< 2.0	14373
< 2.5	9885
< 3.0	3648
>>>	933

From this table we can see that almost half of all liars would be gone if we take a standard deviation of 2. Maybe 2.5 is even better as that would reduce the amount of liars to only 2.7%.

Of course, standard deviation in itself does not show the complete picture. If the average standard deviation is big, we may indeed want filter using a strict 1.5x as to prevent price feed manipulation. Therefore, I also compared the difference in percent between the prices marked as liars and the final tallied price.

difference (%)	amount
< 0.01%	4228
< 0.03%	8479
< 0.05%	5350
< 0.1%	6507
< 0.2%	3288
>>>	987

Now, as evidenced by above table, the difference in reported price is quite often extremely small. More than half of all reveals marked as a liar differ by less than 0.05% from the final tally value. Linking back to my manual comparison of two nodes, this is most likely just normal price volatility and the fact that some nodes fetch price data from an external API a little bit later (but well within the default 2s data request solving window). Given the extremely small differences, the final reported value would hardly change if there would be less liars. 10% more witnesses being included in the average price calculation are not going to noticeably change the final reported value taking into account that they are often reporting a price that is less than 0.05% different.

In my opinion, we really should not be punishing nodes that are revealing a price that is so close to what other nodes are reporting. Obviously, there is no perfect solution. There will always be a (small) number of nodes that report a (small) outlier which will get caught by the standard deviation (percentage filters would really help here). However, I'd say that the default price feeds operated by the Witnet Foundation should really aim to reduce the number of liars. Unneccessarily punishing so many nodes should be avoided. My recommendation for now would be that a standard deviation of 2 to 2.5 is perfectly acceptable. What are your thoughts on this?

stiucsib86 · 2021-10-01T01:20:34Z

stiucsib86
Oct 1, 2021

In my opinion, we really should not be punishing nodes that are revealing a price that is so close to what other nodes are reporting.
Unneccessarily punishing so many nodes should be avoided.

Exactly my pain point. My node has been going no where and kept getting slashed (heavily) whenever the reputation slowly building up. Hope this can be considered and some form of improvement can be implemented soon.

0 replies

aesedepece · 2021-10-01T09:39:24Z

aesedepece
Oct 1, 2021
Maintainer

There are already two initiatives being worked on that will help mitigating this effect:

Next time that the price feeds will be updated, they'll be using much wider filters, in line with @drcpu-github's analysis and recommendations.
Next release of witnet-rust will handle timeouts more consistently, which should theoretically avoid some cases of unfair slashing (PR).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analysis of price feed data requests #2071

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Analysis of price feed data requests #2071

drcpu-github Sep 15, 2021 Collaborator

Replies: 2 comments

stiucsib86 Oct 1, 2021

aesedepece Oct 1, 2021 Maintainer

drcpu-github
Sep 15, 2021
Collaborator

stiucsib86
Oct 1, 2021

aesedepece
Oct 1, 2021
Maintainer