Skip to content
/ extended Public

Extended 80-bit floating-point numbers for Go

License

Notifications You must be signed in to change notification settings

depp/extended

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

80-bit Extended-Precision Floating-Point Numbers

This is a Go library that provides a type for representing 80-bit extended-precision floating-point numbers. It is licensed under the terms of the MIT license, see LICENSE.txt for details.

Example

package main

import (
    "fmt"
    "github.com/depp/extended"
)

func main() {
    e := extended.Extended{
        SignExponent: 0x3fff,
        Fraction:     0xC000000000000000,
    }
    // Prints "Value: 1.5"
    fmt.Println("Value:", e.Float64())
}

Rounding, Infinity, and NaN

This library uses round-to-even when converting from 80-bit floats to 64-bit floats. This should be what you’re used to, and what you expect! In round-to-even, when an 80-bit float is exactly half-way between two possible float64 values, the value with a zero in the least-significant bit is chosen (or the value with the larger exponent is chosen, if the values have different exponents).

Values which are outside the range of possible float64 values are rounded to infinity.

Infinity and NaN are preserved. Different types of NaN values are not distinguished from each other, but the sign of NaN values is preserved during conversion.

About

Extended 80-bit floating-point numbers for Go

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages