Fix ReduceByKey general algorithm to work with parallel ScanInclusive
Also using bit representation for states. This is using an advanced algorithm, and the fix has very little change from the current implementation.
Also using bit representation for states. This is using an advanced algorithm, and the fix has very little change from the current implementation.