PackageDescription: Transducers


Transducers

Last published: April 11, 2023 by 'stm'

Defines 37 Classes
Extends 47 Classes


Transducers encapsulate the processing of elements from a sequence independently of the data source and drain.


# Overview

## Encapsulate
Implementations of enumeration methods, such as #collect:, have the logic how to process a single element in common.
However, that logic is reimplemented each and every time. Transducers make it explicit and facilitate re-use and coherent behavior.
For example:
- #collect: requires mapping: (aBlock1 map)
- #select: requires filtering: (aBlock2 filter)


## Compose
In practice, algorithms often require multiple processing steps, e.g., mapping only a filtered set of elements.
Transducers are inherently composable, and hereby, allow to make the combination of steps explicit.s
Since transducers do not build intermediate collections, their composition is memory-efficient.
For example:
- (aBlock1 filter) * (aBlock2 map) "(1.) filter and (2.) map elements"


## Re-Use
Transducers are decoupled from the input and output sources, and hence, they can be reused in different contexts.
For example:
- enumeration of collections
- processing of streams
- communicating via channels



# Usage by Example

We build a coin flipping experiment and count the occurrence of heads and tails.

First, we associate random numbers with the sides of a coin.

| scale label sample collect steps result coin protocol |
scale := [:x | (x * 2 + 1) floor] map.
label := #(heads tails) replace.

"Scale is a transducer that maps numbers x between 0 and 1 to 1 and 2, respectively.
Label is a transducer that replaces the numbers with heads and tails by lookup in an array.
Next, we draw a number of samples."

sample := 1000 take.

"Sample is a transducer that takes 1000 elements from a source.
We keep track of the occurrences of heads and tails using a bag."

collect := [:bag :c | bag add: c; yourself].

"Collect is binary block (reducing function) that collects events in a bag.
We assemble the experiment by transforming the block using the transducers."

steps := (scale * label * sample) transform: collect.

"From left to right we see the steps involved: scale, label, sample and collect.
Transforming assembles these steps into a binary block (reducing function) we can use to run the experiment."

result := Random new
reduce: steps
init: Bag new.

"Here, we use #reduce:init:, which is mostly similar to #inject:into:.
To execute a transformation and a reduction together, we can use #transduce:reduce:init:."

result := Random new
transduce: scale * label * sample
reduce: collect
init: Bag new.

"We can also split the flow in (arbitrary) logical parts that can be re-used in other experiments."

coin := Random new transduce: scale * label.
protocol := collect initializer: [Bag new].
result := coin transduce: sample reduce: protocol.

Coin is an eduction, i.e., it binds transducers to a source and understands #reduce:init: among others.
Protocol is a transformed reduction, i.e., it binds transducers to a reducing function and an initial value.
Sending #transduce:reduce: yields a new Bag with 1000 samples.
For the common case of accumulating elements in collection or stream, we can just use #accumulate.

| col |
Bag accumulate. "accumulate in a new Bag"
col := Bag new.
col accumulate. "accumulate in col"



# Basic Concepts

## Reducing Functions

A reducing function represents a single step in processing a data sequence.
It takes an accumulated result and a value, and returns a new accumulated result.
For example:

collect := [:col :e | col add: e; yourself].
sum := #+.

A reducing function can also be ternary, i.e., it takes an accumulated result, a key and a value.
For example:

collect := [:dict :k :v | dict at: k put: v; yourself].

Reducing functions may be equipped with an optional completing action.
After finishing processing, it is invoked exactly once, e.g., to free resources.

stream := [:str :e | str nextPut: e; yourself] completing: #close.
absSum := #+ completing: #abs

A reducing function can end processing early by signaling Reduced with a result.
This mechanism also enables the treatment of infinite sources.

nonNil := [:res :e | e ifNil: [Reduced signalWith: res] ifFalse: [res]].

The primary approach to process a data sequence is the reducing protocol with the messages #reduce:init: and #transduce:reduce:init: if transducers are involved.
The behavior is similar to #inject:into: but in addition it takes care of:
- handling binary and ternary reducing functions,
- invoking the completing action after finishing, and
- stopping the reduction if Reduced is signaled.
The message #transduce:reduce:init: just combines the transformation and the reducing step.

However, as reducing functions are step-wise in nature, an application may choose other means to process its data.


## Reducibles

A data source is called reducible if it implements the reducing protocol.
Default implementations are provided for collections and streams.
Additionally, blocks without an argument are reducible, too.
This allows to adapt to custom data sources without additional effort.
For example:

"Xtreams adaptor"
xtream := filename reading.
reducible := [[xtream get] on: Incomplete do: [Reduced signal]].

"natural numbers"
n := 0.
reducible := [n := n+1].


## Transducers

A transducer is an object that transforms a reducing function into another.
Transducers encapsulate common steps in processing data sequences, such as map, filter, concatenate, and flatten.
A transducer transforms a reducing function into another via #transform: in order to add those steps.
They can be composed using #* which yields a new transducer that does both transformations.
Most transducers require an argument, typically blocks, symbols or numbers:

square := Map function: #squared.
take := Take number: 1000.

To facilitate compact notation, the argument types implement corresponding methods:

squareAndTake := #squared map * 1000 take.

Transducers requiring no argument are singletons and can be accessed by their class name.

flattenAndDedupe := Flatten * Dedupe.



# Advanced Concepts

## Reductions

A reduction binds an initial value or a block yielding an initial value to a reducing function.
The idea is to define a ready-to-use process that can be applied in different contexts.
Reducibles handle reductions via #reduce: and #transduce:reduce:
For example:

sum := #+ init: 0.
sum1 := #(1 1 1) reduce: sum.
sum2 := (1 to: 1000) transduce: #odd filter reduce: sum.

asSet := [:set :e | set add: e; yourself] initializer: [Set new].
set1 := #(1 1 1) reduce: asSet.
set2 := #(1 to: 1000) transduce: #odd filter reduce: asSet.

By combining a transducer with a reduction, a process can be further modified.

sumOdds := sum transduce: #odd filter.
setOdds := asSet transduce: #odd filter.


## Eductions

An eduction combines a reducible data source with a transducer.
The idea is to define a transformed (virtual) data source that needs not to be stored in memory.

odds1 := #(1 2 3) readStream transduce: #odd filter.
odds2 := (1 to: 1000) transduce: #odd filter

Depending on the underlying source, eductions can be processed once (streams, e.g., odds1) or multiple times (collections, e.g., odds2).
Since no intermediate data is stored, transducers actions are lazy, i.e., they are invoked each time the eduction is processed.



# Origins

Transducers is based on the same-named Clojure library and its ideas. Please see:
http://clojure.org/transducers