Skip to content

Commit

Permalink
started on script changes
Browse files Browse the repository at this point in the history
  • Loading branch information
jimmysong committed Nov 6, 2018
1 parent a0e98a9 commit 82deada
Show file tree
Hide file tree
Showing 5 changed files with 132 additions and 80 deletions.
156 changes: 116 additions & 40 deletions ch06.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
[.lead]
The ability to lock and unlock coins is the mechanism by which we transfer Bitcoin. Locking is leaving some Bitcoins to someone else. Unlocking is spending some Bitcoins that have been left for you.

In this chapter we examine this locking/unlocking mechanism, which is often called a smart contract. Script is what combines what's covered in the first part of the book. Script is the glue that makes transactions work with digital signatures. Script essentially allows people to be able to prove that they have the right to spend certain outputs. We're getting a little ahead of ourselves, though, so let's start with how Script works and go from there.
In this chapter we examine this locking/unlocking mechanism, which is often called a smart contract. Script is what allows the elliptic curve cryptography (Chapter 3) to be evaluated within the transaction (Chapter 5). Script is the glue that makes transactions work with digital signatures. Script essentially allows people to be able to prove that they have the right to spend certain outputs. We're getting a little ahead of ourselves, though, so let's start with how Script works and go from there.

=== Mechanics of Script

Expand All @@ -21,7 +21,7 @@ Bitcoin has the digital equivalent of a contract in Script. Script is a limited
====
Turing completeness in a programming language essentially means that you have the ability to do loops. Loops are a useful construct in programming, so you may be wondering at this point why Script doesn't allow for loops.
There are a lot of reasons for this, but let's start with program execution. Anyone can create a Script program that every full node on the network executes this program. If Script were Turing Complete, it's possible for the loop to go on executing forever. This would essentially cause every full node to enter and never leave that loop and would thus be an easy way to attack the network. A single script that has an infinite loop could take down Bitcoin! This would not be good, for obvious reasons and would be a large systematic vulnerabilty. Ethereum, which has Turing Completeness in its smart contract language, Solidity, handles this problem by forcing contracts to pay for program execution with something called "gas". An infinite loop will exhaust whatever gas is in the contract as by definition, it will run an infinite number of times.
There are a lot of reasons for this, but let's start with program execution. Anyone can create a Script program that every full node on the network executes this program. If Script were Turing Complete, it's possible for the loop to go on executing forever. This would essentially cause every full node to enter and never leave that loop and would thus be an easy way to attack the network through what would be called a Denial of Service attack (DoS). A single script that has an infinite loop could take down Bitcoin! This would not be good, for obvious reasons and would be a large systematic vulnerabilty. Ethereum, which has Turing Completeness in its smart contract language, Solidity, handles this problem by forcing contracts to pay for program execution with something called "gas". An infinite loop will exhaust whatever gas is in the contract as by definition, it will run an infinite number of times.
There are other reasons to avoid Turing Completeness and that's because smart contracts with Turing completeness are very difficult to analyze. A Turing Complete smart contract's execution conditions are very difficult to enumerate and thus easy to create behavior that's unintended, causing bugs. Bugs in a smart contract mean that it's vulnerable to being unintentionally spent, which means the contract writer would lose money.
====
Expand Down Expand Up @@ -49,11 +49,11 @@ A typical operation might be something like OP_DUP, which will duplicate the top
.OP_DUP duplicates the top element
image::op_dup.png[OP_DUP]

At the end of processing all the items in the stack, the top element of the stack must be non-zero for the script to execute successfully. Having no elements on the stack or having the top element be zero would result in a failed execution. Failed execution generally means that the transaction which includes the unlocking script is invalid and not accepted on the network.
At the end of processing all the instructions, the top element of the stack must be non-zero for the script to execute successfully. Having no elements on the stack or having the top element be zero would result in a failed execution. Failed execution generally means that the transaction which includes the unlocking script is invalid and not accepted on the network.

=== Example Operations

There are many other operations besides OP_DUP. OP_HASH160 does a sha256 followed by a ripemd160 to the top element of the stack (consuming 1) and putting a new element back (putting 1). Note in the diagram that y = ripemd160(sha256(x))
There are many other operations besides OP_DUP. OP_HASH160 does a sha256 followed by a ripemd160 to the top element of the stack (consuming 1) and putting a new element back (putting 1). Note in the diagram that y = ripemd160(sha256(x)).

.OP_HASH160 does a SHA256 followed by RIPEMD160 to the top element
image::op_hash160.png[OP_HASH160]
Expand All @@ -63,18 +63,60 @@ Another very important operation is OP_CHECKSIG. OP_CHECKSIG consumes 2 elements
.OP_CHECKSIG checks if the signature for the pubkey is valid or not
image::op_checksig.png[OP_CHECKSIG]

==== Coding opcodes

Given this, we can now code OP_DUP. This opcode simply duplicates the top element of the stack.

[source,python]
----
def op_dup(stack):
if len(stack) < 1: # <1>
return False
stack.append(stack[-1]) # <2>
return True
----
<1> We have to have at least one element to duplicate, otherwise, we can't execute this opcode.
<2> This is how we duplicate the top element of the stack.

Note that we return a boolean with this opcode, as a way to tell whether the operation was successful.

Here's another one for OP_HASH256. This opcode will consume the top element, perform a hash256 operation on it and put the result on the stack.


[source,python]
----
def op_hash256(stack):
if len(stack) < 1:
return False
element = stack.pop()
stack.append(hash256(element))
return True
----

==== Exercise {counter:exercise}

Write the `op_hash160` function.

=== Parsing the script fields

Both ScriptPubKey and ScriptSig are parsed the same way. If the byte is between 0x01 and 0x4B (which we call n), we read the next n bytes as an element. Otherwise, the byte represents an operation, which we have to look up. Here are some operations and their byte codes:
Both ScriptPubKey and ScriptSig are parsed the same way. If the byte is between 0x01 and 0x4b (which we call n), we read the next n bytes as an element. Otherwise, the byte represents an operation, which we have to look up. Here are some operations and their byte codes:

* 0x00 - OP_0
* 0x51 - OP_1
* 0x5F - OP_15
* 0x60 - OP_16
* 0x75 - OP_DUP
* 0x93 - OP_ADD
* 0xa9 - OP_HASH160
* 0xac - OP_CHECKSIG

[NOTE]
.Longer than 75-byte elements
====
You might be wondering what would happen if you had an element that's greater than 0x4B (75 in decimal). There are specific 3 specific OP codes for this, namely, OP_PUSHDATA1, OP_PUSHDATA2 and OP_PUSHDATA4. OP_PUSHDATA1 means that the next byte contains how many bytes we need to read for the element. OP_PUSHDATA2 means that the next 2 bytes contain how many bytes we need to read for the element. OP_PUSHDATA4 means that the next 4 bytes contain how many bytes we need to read for the element.
Practically speaking, this means if we have an element that's between 76 and 255 bytes inclusive, we use OP_PUSHDATA1, length of the element, element. For anything between 128 bytes and 520 bytes inclusive, we use OP_PUSHDATA2. Anything larger than 520 bytes is actually not allowed by consensus, so OP_PUSHDATA4 is unnecessary.
====

There are many more and the full list can be found at http://wiki.bitcoin.it

==== Coding a Script parser and serializer
Expand All @@ -85,53 +127,87 @@ Given this rule, we can write a very basic parser. We assume that we have some l
----
class Script:
def __init__(self, items):
self.items = items # <1>
def __repr__(self):
result = ''
for item in self.items:
if type(item) == int:
result += '{} '.format(OP_CODES[item])
else:
result += '{} '.format(item.hex())
return result
def __init__(self, instructions):
self.instructions = instructions # <1>
@classmethod
def parse(cls, s):
length = read_varint(s)
items = []
length = read_varint(s) # <2>
instructions = []
count = 0
while count < length:
current = s.read(1)
while count < length: # <3>
current = s.read(1) # <4>
count += 1
current_byte = current[0]
if current_byte >= 1 and current_byte <= 75: # <2>
current_byte = current[0] # <5>
if current_byte >= 1 and current_byte <= 75: # <6>
n = current_byte
items.append(s.read(n))
instructions.append(s.read(n))
count += n
else:
elif current_byte == 76: # <7>
data_length = little_endian_to_int(s.read(1))
instructions.append(s.read(data_length))
count += data_length + 1
elif current_byte == 77: # <8>
data_length = little_endian_to_int(s.read(2))
instructions.append(s.read(data_length))
count += data_length + 2
else: # <9>
# we have an op code. set the current byte to op_code
op_code = current_byte
items.append(op_code)
return cls(items)
# add the op_code to the list of instructions
instructions.append(op_code)
if count != length: # <10>
raise SyntaxError('parsing script failed')
return cls(instructions)
----
<1> Each instruction is either an opcode to be executed or an element to be pushed onto the stack.
<2> We get the length of the entire script.
<3> We need to go until the right amount of bytes are consumed
<4> The byte determines if we have an opcode or element
<5> This converts the byte into an integer in Python
<6> For a number between 1 to 75, we know the next n bytes are an element
<7> 76 is OP_PUSHDATA1, so the next byte tells us how many bytes to read
<8> 77 is OP_PUSHDATA2, so the next two bytes tell us how many bytes to read
<9> We have an opcode that we store.
<10> Script should have consumed exactly the length of bytes we expected, otherwise we raise an error.

We can similarly write a very basic serializer.

def serialize(self):
[source,python]
----
class Script:
...
def raw_serialize(self):
result = b''
for item in self.items:
if type(item) == int:
result += int_to_little_endian(item, 1)
for instruction in self.instructions:
if type(instruction) == int: # <1>
result += int_to_little_endian(instruction, 1)
else:
length = len(item)
prefix = int_to_little_endian(length, 1)
result += prefix + item
total = len(result)
return encode_varint(total) + result
length = len(instruction)
if length < 75: # <2>
result += int_to_little_endian(length, 1)
elif length > 75 and length < 0x100: # <3>
result += int_to_little_endian(76, 1)
result += int_to_little_endian(length, 1)
elif length >= 0x100 and length <= 520: # <4>
result += int_to_little_endian(77, 1)
result += int_to_little_endian(length, 2)
else: # <5>
raise ValueError('too long an instruction')
result += instruction
return result
OP_CODES = {...}
def serialize(self):
result = self.raw_serialize()
total = len(result)
return encode_varint(total) + result # <6>
----
<1> The `items` attribute is a list of items in this script. p2pkh (later in this chapter), would be OP_DUP, OP_HASH160, 20-byte hash, OP_EQUALVERIFY, OP_CHECKSIG, or 5 items.
<2> If the byte is between 1 and 75 inclusive, we have an element.
<1> If the instruction is an integer, we know that's an opcode.
<2> If the byte is between 1 and 75 inclusive, we just encode the length as a single byte
<3> For anything from 75 to 255, we put OP_PUSHDATA1 first, and then encode the length as a single byte
<4> For anything from 256 to 520, we put OP_PUSHDATA2 first, and then encode the length as two bytes in little endian.
<5> Any element longer than 520 bytes cannot be serialized.
<6> We prepend with the length of the entire script.

=== Combining the script fields

Expand Down
14 changes: 7 additions & 7 deletions code-ch05/op.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
import hashlib

from helper import (
hash160,
hash256,
)


def encode_num(num):
if num == 0:
Expand Down Expand Up @@ -624,19 +629,14 @@ def op_sha256(stack):


def op_hash160(stack):
if len(stack) < 1:
return False
element = stack.pop()
h160 = hashlib.new('ripemd160', hashlib.sha256(element).digest()).digest()
stack.append(h160)
return True
raise NotImplementedError


def op_hash256(stack):
if len(stack) < 1:
return False
element = stack.pop()
stack.append(hashlib.sha256(hashlib.sha256(element).digest()).digest())
stack.append(hash256(element))
return True


Expand Down
14 changes: 1 addition & 13 deletions code-ch05/script.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,6 @@ def __repr__(self):
result += '{} '.format(instruction.hex())
return result

def __add__(self, other):
return Script(self.instructions + other.instructions)

@classmethod
def parse(cls, s):
# get the length of the entire field
Expand Down Expand Up @@ -66,11 +63,6 @@ def parse(cls, s):
data_length = little_endian_to_int(s.read(2))
instructions.append(s.read(data_length))
count += data_length + 2
elif current_byte == 78:
# op_pushdata4
data_length = little_endian_to_int(s.read(4))
instructions.append(s.read(data_length))
count += data_length + 4
else:
# we have an op code. set the current byte to op_code
op_code = current_byte
Expand Down Expand Up @@ -101,14 +93,10 @@ def raw_serialize(self):
# 76 is pushdata1
result += int_to_little_endian(76, 1)
result += int_to_little_endian(length, 1)
elif length >= 0x100 and length < 0x10000:
elif length >= 0x100 and length <= 520:
# 77 is pushdata 2
result += int_to_little_endian(77, 1)
result += int_to_little_endian(length, 2)
elif length >= 0x10000 and length < 0x100000000:
# 78 is pushdata 4
result += int_to_little_endian(78, 1)
result += int_to_little_endian(length, 4)
else:
raise ValueError('too long an instruction')
result += instruction
Expand Down
14 changes: 7 additions & 7 deletions code-ch06/op.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
import hashlib

from helper import (
hash160,
hash256,
)


def encode_num(num):
if num == 0:
Expand Down Expand Up @@ -624,19 +629,14 @@ def op_sha256(stack):


def op_hash160(stack):
if len(stack) < 1:
return False
element = stack.pop()
h160 = hashlib.new('ripemd160', hashlib.sha256(element).digest()).digest()
stack.append(h160)
return True
raise NotImplementedError


def op_hash256(stack):
if len(stack) < 1:
return False
element = stack.pop()
stack.append(hashlib.sha256(hashlib.sha256(element).digest()).digest())
stack.append(hash256(element))
return True


Expand Down
14 changes: 1 addition & 13 deletions code-ch06/script.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,6 @@ def __repr__(self):
result += '{} '.format(instruction.hex())
return result

def __add__(self, other):
return Script(self.instructions + other.instructions)

@classmethod
def parse(cls, s):
# get the length of the entire field
Expand Down Expand Up @@ -66,11 +63,6 @@ def parse(cls, s):
data_length = little_endian_to_int(s.read(2))
instructions.append(s.read(data_length))
count += data_length + 2
elif current_byte == 78:
# op_pushdata4
data_length = little_endian_to_int(s.read(4))
instructions.append(s.read(data_length))
count += data_length + 4
else:
# we have an op code. set the current byte to op_code
op_code = current_byte
Expand Down Expand Up @@ -101,14 +93,10 @@ def raw_serialize(self):
# 76 is pushdata1
result += int_to_little_endian(76, 1)
result += int_to_little_endian(length, 1)
elif length >= 0x100 and length < 0x10000:
elif length >= 0x100 and length <= 520:
# 77 is pushdata 2
result += int_to_little_endian(77, 1)
result += int_to_little_endian(length, 2)
elif length >= 0x10000 and length < 0x100000000:
# 78 is pushdata 4
result += int_to_little_endian(78, 1)
result += int_to_little_endian(length, 4)
else:
raise ValueError('too long an instruction')
result += instruction
Expand Down

0 comments on commit 82deada

Please sign in to comment.