started on script changes

rserranon · Nov 6, 2018 · 82deada · 82deada
1 parent a0e98a9
commit 82deada
Show file tree

Hide file tree

Showing 5 changed files with 132 additions and 80 deletions.
diff --git a/ch06.asciidoc b/ch06.asciidoc
@@ -8,7 +8,7 @@
 [.lead]
 The ability to lock and unlock coins is the mechanism by which we transfer Bitcoin. Locking is leaving some Bitcoins to someone else. Unlocking is spending some Bitcoins that have been left for you.
 
-In this chapter we examine this locking/unlocking mechanism, which is often called a smart contract. Script is what combines what's covered in the first part of the book. Script is the glue that makes transactions work with digital signatures. Script essentially allows people to be able to prove that they have the right to spend certain outputs. We're getting a little ahead of ourselves, though, so let's start with how Script works and go from there.
+In this chapter we examine this locking/unlocking mechanism, which is often called a smart contract. Script is what allows the elliptic curve cryptography (Chapter 3) to be evaluated within the transaction (Chapter 5). Script is the glue that makes transactions work with digital signatures. Script essentially allows people to be able to prove that they have the right to spend certain outputs. We're getting a little ahead of ourselves, though, so let's start with how Script works and go from there.
 
 === Mechanics of Script
 
@@ -21,7 +21,7 @@ Bitcoin has the digital equivalent of a contract in Script. Script is a limited
 ====
 Turing completeness in a programming language essentially means that you have the ability to do loops. Loops are a useful construct in programming, so you may be wondering at this point why Script doesn't allow for loops.
 
-There are a lot of reasons for this, but let's start with program execution. Anyone can create a Script program that every full node on the network executes this program. If Script were Turing Complete, it's possible for the loop to go on executing forever. This would essentially cause every full node to enter and never leave that loop and would thus be an easy way to attack the network. A single script that has an infinite loop could take down Bitcoin! This would not be good, for obvious reasons and would be a large systematic vulnerabilty. Ethereum, which has Turing Completeness in its smart contract language, Solidity, handles this problem by forcing contracts to pay for program execution with something called "gas". An infinite loop will exhaust whatever gas is in the contract as by definition, it will run an infinite number of times.
+There are a lot of reasons for this, but let's start with program execution. Anyone can create a Script program that every full node on the network executes this program. If Script were Turing Complete, it's possible for the loop to go on executing forever. This would essentially cause every full node to enter and never leave that loop and would thus be an easy way to attack the network through what would be called a Denial of Service attack (DoS). A single script that has an infinite loop could take down Bitcoin! This would not be good, for obvious reasons and would be a large systematic vulnerabilty. Ethereum, which has Turing Completeness in its smart contract language, Solidity, handles this problem by forcing contracts to pay for program execution with something called "gas". An infinite loop will exhaust whatever gas is in the contract as by definition, it will run an infinite number of times.
 
 There are other reasons to avoid Turing Completeness and that's because smart contracts with Turing completeness are very difficult to analyze. A Turing Complete smart contract's execution conditions are very difficult to enumerate and thus easy to create behavior that's unintended, causing bugs. Bugs in a smart contract mean that it's vulnerable to being unintentionally spent, which means the contract writer would lose money.
 ====
@@ -49,11 +49,11 @@ A typical operation might be something like OP_DUP, which will duplicate the top
 .OP_DUP duplicates the top element
 image::op_dup.png[OP_DUP]
 
-At the end of processing all the items in the stack, the top element of the stack must be non-zero for the script to execute successfully. Having no elements on the stack or having the top element be zero would result in a failed execution. Failed execution generally means that the transaction which includes the unlocking script is invalid and not accepted on the network.
+At the end of processing all the instructions, the top element of the stack must be non-zero for the script to execute successfully. Having no elements on the stack or having the top element be zero would result in a failed execution. Failed execution generally means that the transaction which includes the unlocking script is invalid and not accepted on the network.
 
 === Example Operations
 
-There are many other operations besides OP_DUP. OP_HASH160 does a sha256 followed by a ripemd160 to the top element of the stack (consuming 1) and putting a new element back (putting 1). Note in the diagram that y = ripemd160(sha256(x))
+There are many other operations besides OP_DUP. OP_HASH160 does a sha256 followed by a ripemd160 to the top element of the stack (consuming 1) and putting a new element back (putting 1). Note in the diagram that y = ripemd160(sha256(x)).
 
 .OP_HASH160 does a SHA256 followed by RIPEMD160 to the top element
 image::op_hash160.png[OP_HASH160]
@@ -63,18 +63,60 @@ Another very important operation is OP_CHECKSIG. OP_CHECKSIG consumes 2 elements
 .OP_CHECKSIG checks if the signature for the pubkey is valid or not
 image::op_checksig.png[OP_CHECKSIG]
 
+==== Coding opcodes
+
+Given this, we can now code OP_DUP. This opcode simply duplicates the top element of the stack.
+
+[source,python]
+----
+def op_dup(stack):
+    if len(stack) < 1:  # <1>
+        return False
+    stack.append(stack[-1])  # <2>
+    return True
+----
+<1> We have to have at least one element to duplicate, otherwise, we can't execute this opcode.
+<2> This is how we duplicate the top element of the stack.
+
+Note that we return a boolean with this opcode, as a way to tell whether the operation was successful.
+
+Here's another one for OP_HASH256. This opcode will consume the top element, perform a hash256 operation on it and put the result on the stack.
+
+
+[source,python]
+----
+def op_hash256(stack):
+    if len(stack) < 1:
+        return False
+    element = stack.pop()
+    stack.append(hash256(element))
+    return True
+----
+
+==== Exercise {counter:exercise}
+
+Write the `op_hash160` function.
+
 === Parsing the script fields
 
-Both ScriptPubKey and ScriptSig are parsed the same way. If the byte is between 0x01 and 0x4B (which we call n), we read the next n bytes as an element. Otherwise, the byte represents an operation, which we have to look up. Here are some operations and their byte codes:
+Both ScriptPubKey and ScriptSig are parsed the same way. If the byte is between 0x01 and 0x4b (which we call n), we read the next n bytes as an element. Otherwise, the byte represents an operation, which we have to look up. Here are some operations and their byte codes:
 
 * 0x00 - OP_0
 * 0x51 - OP_1
-* 0x5F - OP_15
+* 0x60 - OP_16
 * 0x75 - OP_DUP
 * 0x93 - OP_ADD
 * 0xa9 - OP_HASH160
 * 0xac - OP_CHECKSIG
 
+[NOTE]
+.Longer than 75-byte elements
+====
+You might be wondering what would happen if you had an element that's greater than 0x4B (75 in decimal). There are specific 3 specific OP codes for this, namely, OP_PUSHDATA1, OP_PUSHDATA2 and OP_PUSHDATA4. OP_PUSHDATA1 means that the next byte contains how many bytes we need to read for the element. OP_PUSHDATA2 means that the next 2 bytes contain how many bytes we need to read for the element. OP_PUSHDATA4 means that the next 4 bytes contain how many bytes we need to read for the element.
+
+Practically speaking, this means if we have an element that's between 76 and 255 bytes inclusive, we use OP_PUSHDATA1, length of the element, element. For anything between 128 bytes and 520 bytes inclusive, we use OP_PUSHDATA2. Anything larger than 520 bytes is actually not allowed by consensus, so OP_PUSHDATA4 is unnecessary.
+====
+
 There are many more and the full list can be found at http://wiki.bitcoin.it
 
 ==== Coding a Script parser and serializer
@@ -85,53 +127,87 @@ Given this rule, we can write a very basic parser. We assume that we have some l
 ----
 class Script:
 
-    def __init__(self, items):
-        self.items = items  # <1>
-
-    def __repr__(self):
-        result = ''
-        for item in self.items:
-            if type(item) == int:
-                result += '{} '.format(OP_CODES[item])
-            else:
-                result += '{} '.format(item.hex())
-        return result
+    def __init__(self, instructions):
+        self.instructions = instructions  # <1>
 
     @classmethod
     def parse(cls, s):
-        length = read_varint(s)
-        items = []
+        length = read_varint(s)  # <2>
+        instructions = []
         count = 0
-        while count < length:
-            current = s.read(1)
+        while count < length:  # <3>
+            current = s.read(1)  # <4>
             count += 1
-            current_byte = current[0]
-            if current_byte >= 1 and current_byte <= 75:  # <2>
+            current_byte = current[0]  # <5>
+            if current_byte >= 1 and current_byte <= 75:  # <6>
                 n = current_byte
-                items.append(s.read(n))
+                instructions.append(s.read(n))
                 count += n
-            else:
+            elif current_byte == 76:  # <7>
+                data_length = little_endian_to_int(s.read(1))
+                instructions.append(s.read(data_length))
+                count += data_length + 1
+            elif current_byte == 77:  # <8>
+                data_length = little_endian_to_int(s.read(2))
+                instructions.append(s.read(data_length))
+                count += data_length + 2
+            else:  # <9>
+                # we have an op code. set the current byte to op_code
                 op_code = current_byte
-                items.append(op_code)
-        return cls(items)
+                # add the op_code to the list of instructions
+                instructions.append(op_code)
+        if count != length:  # <10>
+            raise SyntaxError('parsing script failed')
+        return cls(instructions)
+----
+<1> Each instruction is either an opcode to be executed or an element to be pushed onto the stack.
+<2> We get the length of the entire script.
+<3> We need to go until the right amount of bytes are consumed
+<4> The byte determines if we have an opcode or element
+<5> This converts the byte into an integer in Python
+<6> For a number between 1 to 75, we know the next n bytes are an element
+<7> 76 is OP_PUSHDATA1, so the next byte tells us how many bytes to read
+<8> 77 is OP_PUSHDATA2, so the next two bytes tell us how many bytes to read
+<9> We have an opcode that we store.
+<10> Script should have consumed exactly the length of bytes we expected, otherwise we raise an error.
+
+We can similarly write a very basic serializer.
 
-    def serialize(self):
+[source,python]
+----
+class Script:
+...
+    def raw_serialize(self):
         result = b''
-        for item in self.items:
-            if type(item) == int:
-                result += int_to_little_endian(item, 1)
+        for instruction in self.instructions:
+            if type(instruction) == int:  # <1>
+                result += int_to_little_endian(instruction, 1)
             else:
-                length = len(item)
-                prefix = int_to_little_endian(length, 1)
-                result += prefix + item
-        total = len(result)
-        return encode_varint(total) + result
-
+                length = len(instruction)
+                if length < 75:  # <2>
+                    result += int_to_little_endian(length, 1)
+                elif length > 75 and length < 0x100:  # <3>
+                    result += int_to_little_endian(76, 1)
+                    result += int_to_little_endian(length, 1)
+                elif length >= 0x100 and length <= 520:  # <4>
+                    result += int_to_little_endian(77, 1)
+                    result += int_to_little_endian(length, 2)
+                else:  # <5>
+                    raise ValueError('too long an instruction')
+                result += instruction
+        return result
 
-OP_CODES = {...}
+    def serialize(self):
+        result = self.raw_serialize()
+        total = len(result)
+        return encode_varint(total) + result  # <6>
 ----
-<1> The `items` attribute is a list of items in this script. p2pkh (later in this chapter), would be OP_DUP, OP_HASH160, 20-byte hash, OP_EQUALVERIFY, OP_CHECKSIG, or 5 items.
-<2> If the byte is between 1 and 75 inclusive, we have an element.
+<1> If the instruction is an integer, we know that's an opcode.
+<2> If the byte is between 1 and 75 inclusive, we just encode the length as a single byte
+<3> For anything from 75 to 255, we put OP_PUSHDATA1 first, and then encode the length as a single byte
+<4> For anything from 256 to 520, we put OP_PUSHDATA2 first, and then encode the length as two bytes in little endian.
+<5> Any element longer than 520 bytes cannot be serialized.
+<6> We prepend with the length of the entire script.
 
 === Combining the script fields
 

diff --git a/code-ch05/op.py b/code-ch05/op.py
@@ -1,5 +1,10 @@
 import hashlib
 
+from helper import (
+    hash160,
+    hash256,
+)
+
 
 def encode_num(num):
     if num == 0:
@@ -624,19 +629,14 @@ def op_sha256(stack):
 
 
 def op_hash160(stack):
-    if len(stack) < 1:
-        return False
-    element = stack.pop()
-    h160 = hashlib.new('ripemd160', hashlib.sha256(element).digest()).digest()
-    stack.append(h160)
-    return True
+    raise NotImplementedError
 
 
 def op_hash256(stack):
     if len(stack) < 1:
         return False
     element = stack.pop()
-    stack.append(hashlib.sha256(hashlib.sha256(element).digest()).digest())
+    stack.append(hash256(element))
     return True
 
 

diff --git a/code-ch05/script.py b/code-ch05/script.py
@@ -29,9 +29,6 @@ def __repr__(self):
                 result += '{} '.format(instruction.hex())
         return result
 
-    def __add__(self, other):
-        return Script(self.instructions + other.instructions)
-
     @classmethod
     def parse(cls, s):
         # get the length of the entire field
@@ -66,11 +63,6 @@ def parse(cls, s):
                 data_length = little_endian_to_int(s.read(2))
                 instructions.append(s.read(data_length))
                 count += data_length + 2
-            elif current_byte == 78:
-                # op_pushdata4
-                data_length = little_endian_to_int(s.read(4))
-                instructions.append(s.read(data_length))
-                count += data_length + 4
             else:
                 # we have an op code. set the current byte to op_code
                 op_code = current_byte
@@ -101,14 +93,10 @@ def raw_serialize(self):
                     # 76 is pushdata1
                     result += int_to_little_endian(76, 1)
                     result += int_to_little_endian(length, 1)
-                elif length >= 0x100 and length < 0x10000:
+                elif length >= 0x100 and length <= 520:
                     # 77 is pushdata 2
                     result += int_to_little_endian(77, 1)
                     result += int_to_little_endian(length, 2)
-                elif length >= 0x10000 and length < 0x100000000:
-                    # 78 is pushdata 4
-                    result += int_to_little_endian(78, 1)
-                    result += int_to_little_endian(length, 4)
                 else:
                     raise ValueError('too long an instruction')
                 result += instruction

diff --git a/code-ch06/op.py b/code-ch06/op.py
@@ -1,5 +1,10 @@
 import hashlib
 
+from helper import (
+    hash160,
+    hash256,
+)
+
 
 def encode_num(num):
     if num == 0:
@@ -624,19 +629,14 @@ def op_sha256(stack):
 
 
 def op_hash160(stack):
-    if len(stack) < 1:
-        return False
-    element = stack.pop()
-    h160 = hashlib.new('ripemd160', hashlib.sha256(element).digest()).digest()
-    stack.append(h160)
-    return True
+    raise NotImplementedError
 
 
 def op_hash256(stack):
     if len(stack) < 1:
         return False
     element = stack.pop()
-    stack.append(hashlib.sha256(hashlib.sha256(element).digest()).digest())
+    stack.append(hash256(element))
     return True
 
 

diff --git a/code-ch06/script.py b/code-ch06/script.py
@@ -29,9 +29,6 @@ def __repr__(self):
                 result += '{} '.format(instruction.hex())
         return result
 
-    def __add__(self, other):
-        return Script(self.instructions + other.instructions)
-
     @classmethod
     def parse(cls, s):
         # get the length of the entire field
@@ -66,11 +63,6 @@ def parse(cls, s):
                 data_length = little_endian_to_int(s.read(2))
                 instructions.append(s.read(data_length))
                 count += data_length + 2
-            elif current_byte == 78:
-                # op_pushdata4
-                data_length = little_endian_to_int(s.read(4))
-                instructions.append(s.read(data_length))
-                count += data_length + 4
             else:
                 # we have an op code. set the current byte to op_code
                 op_code = current_byte
@@ -101,14 +93,10 @@ def raw_serialize(self):
                     # 76 is pushdata1
                     result += int_to_little_endian(76, 1)
                     result += int_to_little_endian(length, 1)
-                elif length >= 0x100 and length < 0x10000:
+                elif length >= 0x100 and length <= 520:
                     # 77 is pushdata 2
                     result += int_to_little_endian(77, 1)
                     result += int_to_little_endian(length, 2)
-                elif length >= 0x10000 and length < 0x100000000:
-                    # 78 is pushdata 4
-                    result += int_to_little_endian(78, 1)
-                    result += int_to_little_endian(length, 4)
                 else:
                     raise ValueError('too long an instruction')
                 result += instruction