Skip to content
Snippets Groups Projects
  1. Sep 14, 2021
  2. Sep 13, 2021
  3. Sep 10, 2021
    • Alexander Huszagh's avatar
      Implement the big-integer arithmetic algorithm. · fc0c8680
      Alexander Huszagh authored
      Replaces the existing decimal implementation, for substantial
      performance improvements with near-halfway cases. This is especially
      fast with a large number of digits.
      
      **Big Integer Implementation**
      
      A small subset of big-integer arithmetic has been added, with the
      `bigint` struct. It uses a stack-allocated vector with enough bits to
      store the float with the large number of significant digits. This is
      log2(10^(769 + 342)), to account for the largest possible magnitude
      exponent, and number of digits (3600 bits), and then rounded up to 4k bits.
      
      The limb size is determined by the architecture: most 64-bit
      architectures have efficient 128-bit multiplication, either by a single
      hardware instruction or 2 native multiplications for the high and low
      bits. This includes x86_64, mips64, s390x, aarch64, powerpc64, riscv64,
      and the only known exception is sparcv8 and sparcv9. Therefore, we
      define a limb size of 64-bits on 64-bit architectures except SPARC,
      otherwise we fallback to 32-bit limbs.
      
      A simple stackvector is used, which just has operations to add elements,
      index, and truncate the vector.
      
      `bigint` is then just a wrapper around this, with methods for
      big-integer arithmetic. For our algorithms, we just need multiplication
      by a power (x * b^N), multiplication by a bigint or scalar value, and
      addition by a bigint or scalar value. Scalar addition and multiplication
      uses compiler extensions when possible (__builtin_add_overflow and
      __uint128_t), if not, then we implement simple logic shown to optimize
      well on MSVC. Big-integer multiplication is done via grade school
      multiplication, which is more efficient than any asymptotically faster
      algorithms. Multiplication by a power is then done via bitshifts for
      powers-of-two, and by iterative multiplications of a large and then
      scalar value for powers-of-5.
      
      **compute_float**
      
      Compute float has been slightly modified so if the algorithm cannot
      round correctly, it returns a normalized, extended-precision adjusted
      mantissa with the power2 shifted by INT16_MIN so the exponent is always
      negative. `compute_error` and `compute_error_scaled` have been added.
      
      **Digit Optimiations**
      
      To improve performance for numbers with many digits,
      `parse_eight_digits_unrolled` is used for both integers and fractions,
      and uses a while loop than two nested if statements. This adds no
      noticeable performance cost for common floats, but dramatically improves
      performance for numbers with large digits (without these optimizations,
      ~65% of the total runtime cost is in parse_number_string).
      
      **Parsed Number**
      
      Two fields have been added to `parsed_number_string`, which contains a
      slice of the integer and fraction digits. This is extremely cheap, since
      the work is already done, and the strings are pre-tokenized during
      parsing. This allows us on overflow to re-parse these tokenized strings,
      without checking if each character is an integer. Likewise, for the
      big-integer algorithms, we can merely re-parse the pre-tokenized
      strings.
      
      **Slow Algorithm**
      
      The new algorithm is `digit_comp`, which takes the parsed number string
      and the `adjusted_mantissa` from `compute_float`. The significant digits
      are parsed into a big integer, and the exponent relative to the
      significant digits is calculated. If the exponent is >= 0, we use
      `positive_digit_comp`, otherwise, we use `negative_digit_comp`.
      
      `positive_digit_comp` is quite simple: we scale the significant digits
      to the exponent, and then we get the high 64-bits for the native float,
      determine if any lower bits were truncated, and use that to direct
      rounding.
      
      `negative_digit_comp` is a little more complex, but also quite trivial:
      we use the parsed significant digits as the real digits, and calculate
      the theoretical digits from `b+h`, the halfway point between `b` and
      `b+u`, the next-positive float. To get `b`, we round the adjusted
      mantissa down, create an extended-precision representation, and
      calculate the halfway point. We now have a base-10 exponent for the real
      digits, and a base-2 exponent for the theoretical digits. We scale these
      two to the same exponent by multiplying the theoretixal digits by
      `5**-real_exp`. We then get the base-2 exponent as `theor_exp -
      real_exp`, and if this is positive, we multipy the theoretical digits by
      it, otherwise, we multiply the real digits by it. Now, both are scaled
      to the same magnitude, and we simply compare the digits in the big
      integer, and use that to direct rounding.
      
      **Rust-Isms**
      
      A few Rust-isms have been added, since it simplifies logic assertions.
      These can be trivially removed or reworked, as needed.
      
      - a `slice` type has been added, which is a pointer and length.
      - `FASTFLOAT_ASSERT`, `FASTFLOAT_DEBUG_ASSERT`, and `FASTFLOAT_TRY` have
        been added
        - `FASTFLOAT_ASSERT` aborts, even in release builds, if the condition
          fails.
        - `FASTFLOAT_DEBUG_ASSERT` defaults to `assert`, for logic errors.
        - `FASTFLOAT_TRY` is like a Rust `Option` type, which propagates
          errors.
      
      Specifically, `FASTFLOAT_TRY` is useful in combination with
      `FASTFLOAT_ASSERT` to ensure there are no memory corruption errors
      possible in the big-integer arithmetic. Although the `bigint` type
      ensures we have enough storage for all valid floats, memory issues are
      quite a severe class of vulnerabilities, and due to the low performance
      cost of checks, we abort if we would have out-of-bounds writes. This can
      only occur when we are adding items to the vector, which is a very small
      number of steps. Therefore, we abort if our memory safety guarantees
      ever fail. lexical has never aborted, so it's unlikely we will ever fail
      these guarantees.
      fc0c8680
  4. Sep 05, 2021
  5. Sep 03, 2021
  6. Sep 02, 2021
  7. Aug 31, 2021
  8. Aug 21, 2021
  9. Aug 03, 2021
  10. Jul 16, 2021
  11. Jul 02, 2021
  12. Jun 23, 2021
  13. Jun 21, 2021
    • Daniel Lemire's avatar
      Merge pull request #84 from musicinmybrain/system-doctest-option · 8159e8bc
      Daniel Lemire authored
      Add a SYSTEM_DOCTEST CMake option
      v1.1.2
      8159e8bc
    • Benjamin A. Beasley's avatar
      Add a SYSTEM_DOCTEST CMake option · fe8e477e
      Benjamin A. Beasley authored
      This option is off by default, maintaining the previous behavior. When
      enabled (along with FASTFLOAT_TEST), it bypasses the FetchContent
      machinery for doctest so that a system-wide installation of the doctest
      header can be easily used. In this case, the header doctest/doctest.h
      should be available on the compiler’s include path.
      
      This option is especially useful for Linux distributions and others that
      need to run the tests in fully offline build environments.
      
      Fixes #83.
      fe8e477e
  14. Jun 07, 2021
Loading