Version: 2.0.x

ZSet

`ZSet` is a version of a set that generalizes the measure of "how many times" each element appears in the set.

Conceptually, we can think of a `ZSet[A, B]` as a `Map[A, B]`, where `A` represents the elements of the set and `B` represents some notion of how many times each element appears.

We can then define a `Set` as a `ZSet` where the measure of how many times each element appears is a `Boolean`.

``import zio.prelude.ZSettype Set[+A] = ZSet[A, Boolean]``

That is, in a `Set` the measure of how many times an element appears is just yes or no. Either the element is in the set or it is not.

However, we can also define many other variants of sets by choosing different types to measure how many times an element appears.

For example, by using the `Natural` new type from ZIO Prelude, which describes integers greater than or equal to zero, we can define the type of a `MultiSet` or "bag" in some programming languages.

``import zio.prelude.ZSetimport zio.prelude.newtypes.Naturaltype MultiSet[+A] = ZSet[A, Natural]``

This ends up being a quite useful data type and can represent anything where the number of times an element appears matters but the order doesn't matter, such as a shopping cart.

By using other measures of how many times an item appears we can describe other variants of sets such as `Int` for sets where items can appear a negative number of times or `Double` for sets where the number of times an item appears can be fractional.

`ZSet` is focused on using the functional abstractions and new types in ZIO Prelude to describe operators on these generalizations of sets in as practical a way as possible.

Constructing ZSets​

The most basic way to construct a `ZSet` is from a `Map` where the keys in the map represent the elements in the `ZSet` and the values in the map represent the number of times each element appears.

``val shoppingList: ZSet[String, Int] =  ZSet.fromMap(Map("apples" -> 2, "oranges" -> 3, "bananas" -> 5))// shoppingList: ZSet[String, Int] = ZSet(apples -> 2, bananas -> 5, oranges -> 3)``

We can also construct a `ZSet` from existing collection types.

If we have a normal `Set` from the Scala standard library, we can convert it to a `ZSet` using the `fromSet` operator.

``val set: ZSet[Int, Boolean] =  ZSet.fromSet(Set(1, 2, 3))// set: ZSet[Int, Boolean] = ZSet(1 -> true, 2 -> true, 3 -> true)``

If we have an `Iterable` we can convert it into a `ZSet` using the `fromIterable` constructor. The measure of how many times each item appears will just be how many times it appears in the `Iterable`.

``val multiSet: MultiSet[Int] = ZSet.fromIterable(List(1, 1, 2))// multiSet: MultiSet[Int] = ZSet(1 -> 2, 2 -> 1)``

Finally, we can just construct an empty `ZSet` using the `empty` operator and combine `ZSet` with other `ZSet` values later.

``val empty: ZSet[Nothing, Nothing] =  ZSet.empty// empty: ZSet[Nothing, Nothing] = ZSet()``

Operators on ZSets​

As a generalization of sets, `ZSet` supports many of the same operators that exist on sets.

Set Membership​

We can determine how many times an element appears in the set using the `apply` operator. This requires that the measure type has an `Identity` defined for it.

If the element exists in the set the measure of how many times it appears will be returned. Otherwise the identity element will be returned.

Thus, `shoppingList("apples")` will return `2` and `shoppingList("eggs")` will return `0` while `set(1)` will return `true` and `set(4)` will return `false`.

Set Union​

One simple way of combining sets is the union of two sets.

The union of two sets contains all elements that appear in either set. If an element appears in both sets the measure of how many times it appears in the union will be the maximum of the two measures, as determined by an `Ord` defined on the measure type.

``val left: ZSet[Int, Int] =  ZSet.fromMap(Map(1 -> 2, 2 -> 3, 3 -> 4))// left: ZSet[Int, Int] = ZSet(1 -> 2, 2 -> 3, 3 -> 4)val right: ZSet[Int, Int] =  ZSet.fromMap(Map(2 -> 2, 3 -> 3, 4 -> 4))// right: ZSet[Int, Int] = ZSet(2 -> 2, 3 -> 3, 4 -> 4)val union: ZSet[Int, Int] =  left.union(right)// union: ZSet[Int, Int] = ZSet(1 -> 2, 2 -> 3, 3 -> 4, 4 -> 4)``

Here the result will be `ZSet(1 -> 2, 2 -> 3, 3 -> 4, 4 -> 4)` because all elements in both sets will be included in the union of the two sets.

Set Intersection​

Set intersection is another basic way of combining sets.

The intersection of two sets only contains the elements that appear in both sets. If an element appears in both sets the measure of how many times it appears in the intersection will be the minimum of the two measures, as determined by an `Ord` defined on the measure type.

``val left: ZSet[Int, Int] =  ZSet.fromMap(Map(1 -> 2, 2 -> 3, 3 -> 4))// left: ZSet[Int, Int] = ZSet(1 -> 2, 2 -> 3, 3 -> 4)val right: ZSet[Int, Int] =  ZSet.fromMap(Map(2 -> 2, 3 -> 3, 4 -> 4))// right: ZSet[Int, Int] = ZSet(2 -> 2, 3 -> 3, 4 -> 4)val intersection: ZSet[Int, Int] =  left.intersect(right)// intersection: ZSet[Int, Int] = ZSet(2 -> 2, 3 -> 3)``

The result will be `ZSet(2 -> 2, 3 -> 3)` because the only commonality of the two sets is that they contain these elements.

Set Difference​

We can also compute the difference of two sets.

The difference of two sets is obtained by subtracting the measure of how many times each element appears in the second set from how many times it appears in the first set. This requires an `Inverse` defined on the `Sum` of the measure type to tell us what it means to "subtract" for this type.

``val left: ZSet[Int, Int] =  ZSet.fromMap(Map(1 -> 2, 2 -> 3, 3 -> 4))// left: ZSet[Int, Int] = ZSet(1 -> 2, 2 -> 3, 3 -> 4)val right: ZSet[Int, Int] =  ZSet.fromMap(Map(2 -> 2, 3 -> 3, 4 -> 4))// right: ZSet[Int, Int] = ZSet(2 -> 2, 3 -> 3, 4 -> 4)val diff: ZSet[Int, Int] =  left.diff(right)// diff: ZSet[Int, Int] = ZSet(1 -> 2, 2 -> 1, 3 -> 1, 4 -> -4)``

The result will be `ZSet(1 -> 2, 2 -> 1, 3 -> 1, 4 -> -4)`.

Note that the measure of how many times `4` appears in the resulting set is negative. This is permissible because `Int` allows negative numbers.

If we didn't want this we could define the measure type to be a type like `Natural` that did not permit negative numbers.

``val left: MultiSet[String] =  ZSet.fromIterable(List("apple", "banana"))// left: MultiSet[String] = ZSet(banana -> 1, apple -> 1)val right: MultiSet[String] =  ZSet.fromIterable(List("banana", "banana", "banana"))// right: MultiSet[String] = ZSet(banana -> 3)val diff: MultiSet[String] =  left.diff(right)// diff: MultiSet[String] = ZSet(banana -> 0, apple -> 1)``

Now the measure of how many times each item appears is a natural number so "banana" cannot appear a negative number of times.

Set Combination​

Another helpful way of combining sets that makes more sense for generalized sets is the notion of "adding" two sets. This is less relevant for traditional sets because an element can never appear more than once in a traditional set, so "adding" reduces to set union.

However, for generalized sets this way of combining can be quite intuitive. For example if we think of a multiset as John's shopping list and another multiset as Jane's shopping list, the combination of the two includes everything on both shopping lists, with overlapping items appearing the total number of times they appear on both lists.

This requires a `Commutative` defined on the `Sum` of the measure type to tell us how to do this addition.

``val janeShoppingList: ZSet[String, Int] =  ZSet.fromMap(Map("apples" -> 4, "bananas" -> 5))// janeShoppingList: ZSet[String, Int] = ZSet(apples -> 4, bananas -> 5)val johnShoppingList: ZSet[String, Int] =  ZSet.fromMap(Map("eggs" -> 12, "bananas" -> 1))// johnShoppingList: ZSet[String, Int] = ZSet(eggs -> 12, bananas -> 1)val combinedList: ZSet[String, Int] =  janeShoppingList.combine(johnShoppingList)// combinedList: ZSet[String, Int] = ZSet(eggs -> 12, apples -> 4, bananas -> 6)``

The result is `ZSet("apples" -> 4, "bananas" -> 6, "eggs" -> 12)`, the combination of what each of them wants to buy.

Set Product​

We can also define the product of two sets.

This is the Cartesian product of the two sets, with each combination of elements from the two sets appearing in the product set and the measure of how many times an element appears being the product of the two measures. This requires a `Commutative` defined on the `Prod` of the measure type to tell us how to do this multiplication.

A nice example of this is modeling discrete probabilities. We can define the set of all possible outcomes of a dice roll along with their associated probabilities like this:

``val die: ZSet[Int, Double] =  ZSet.fromMap(    Map(      1 -> 1.0 / 6.0,      2 -> 1.0 / 6.0,      3 -> 1.0 / 6.0,      4 -> 1.0 / 6.0,      5 -> 1.0 / 6.0,      6 -> 1.0 / 6.0    )  )// die: ZSet[Int, Double] = ZSet(5 -> 0.16666666666666666, 1 -> 0.16666666666666666, 6 -> 0.16666666666666666, 2 -> 0.16666666666666666, 3 -> 0.16666666666666666, 4 -> 0.16666666666666666)``

The product of this set with itself represents the set of all possible outcomes of rolling two dice.

``val pair: ZSet[(Int, Int), Double] =  die.zip(die)// pair: ZSet[(Int, Int), Double] = ZSet((5,4) -> 0.027777777777777776, (2,6) -> 0.027777777777777776, (4,1) -> 0.027777777777777776, (5,2) -> 0.027777777777777776, (1,1) -> 0.027777777777777776, (1,6) -> 0.027777777777777776, (6,4) -> 0.027777777777777776, (3,2) -> 0.027777777777777776, (3,3) -> 0.027777777777777776, (4,3) -> 0.027777777777777776, (5,6) -> 0.027777777777777776, (2,2) -> 0.027777777777777776, (3,4) -> 0.027777777777777776, (2,1) -> 0.027777777777777776, (6,1) -> 0.027777777777777776, (2,5) -> 0.027777777777777776, (4,2) -> 0.027777777777777776, (4,4) -> 0.027777777777777776, (1,2) -> 0.027777777777777776, (6,5) -> 0.027777777777777776, (6,2) -> 0.027777777777777776, (1,4) -> 0.027777777777777776, (4,5) -> 0.027777777777777776, (4,6) -> 0.027777777777777776, (2,4) -> 0.027777777777777776, (6,3) -> 0.027777777777777776, (1,5) -> 0.027777777777777776, (3,1) -> 0.027777777777777776, (5,1) -> 0.027777777777777776, (3,6) -> 0.027777777777777776, (2,3) -> 0.027777777777777776, (3,5) -> 0.027777777777777776, (1,3) -> 0.027777777777777776, (5,5) -> 0.027777777777777776, (5,3) -> 0.027777777777777776, (6,6) -> 0.027777777777777776)``

This set has thirty six elements, corresponding to each possible combination of outcomes from the two dice rolls, each with a probability of one in thirty six.

This gets more interesting if we say that matters is not the individual dice rolls but their sum.

We can express this in code with the `zipWith` operator, which allows us to combine the elements from the product of the two sets with a function. If multiple elements from the product are mapped to the same value the measures of how many times each value appears will be added, using a `Commutative` instance defined on the `Sum` of the measure type.

``val sum: ZSet[Int, Double] =  die.zipWith(die)(_ + _)// sum: ZSet[Int, Double] = ZSet(5 -> 0.1111111111111111, 10 -> 0.08333333333333333, 6 -> 0.1388888888888889, 9 -> 0.1111111111111111, 2 -> 0.027777777777777776, 12 -> 0.027777777777777776, 7 -> 0.16666666666666669, 3 -> 0.05555555555555555, 11 -> 0.05555555555555555, 8 -> 0.1388888888888889, 4 -> 0.08333333333333333)``

The resulting set has eleven elements, corresponding to the possible sums from two dice rolls of two to twelve. The measure for each value corresponds to the probability of that sum being rolled.

So for example the probability of seven is about 16.7% whereas the probability of rolling two or twelve is only about 2.8%.

We can take this even further using the `flatMap` operator.

The `zipWith` operator lets us combine two sets when the sets are independent of each other. With the `flatMap` operator we can create a new set based on each element of the first set and then "flatten" the resulting set down.

We could rewrite the example above using `flatMap` and a for comprehension like this:

``val sum: ZSet[Int, Double] =  for {    a <- die    b <- die  } yield a + b// sum: ZSet[Int, Double] = ZSet(5 -> 0.1111111111111111, 10 -> 0.08333333333333333, 6 -> 0.1388888888888889, 9 -> 0.1111111111111111, 2 -> 0.027777777777777776, 12 -> 0.027777777777777776, 7 -> 0.16666666666666669, 3 -> 0.05555555555555555, 11 -> 0.05555555555555555, 8 -> 0.1388888888888889, 4 -> 0.08333333333333333)``

This allows us to express conditional logic, such as the probability distribution of outcomes for a subsequent event depending on the outcome of a previous event.

Transforming Elements​

We can transform set elements using the `map` operator.

For example, we could add one to each value in a set like this:

``val setPlusOne: ZSet[Int, Boolean] =  set.map(_ + 1)// setPlusOne: ZSet[Int, Boolean] = ZSet(2 -> true, 3 -> true, 4 -> true)``

This requires a `Commutative` instance defined for the `Sum` of the measure type so that we can combine the measures if we map two elements to the same value.

Transforming Measures​

We can also transform the measure we use to represent how many times an item appears using the `transform` operator. For example, perhaps we have constructed a set where the measure is an `Int` but we would like to transform it to a `Double` so we can compose it with other `ZSet` values with `Double` as the measure type.

We can do that with `transform` like this:

``def constant(n: Int): ZSet[Int, Double] =  ZSet(n).transform(_.toDouble)``

Extracting ZSet Values​

If the number of times a value appears in the set is a natural number then we can use normal collection operators to fold over its values, treating it much like a normal set

``val fruits: MultiSet[String] =  ZSet.fromIterable(List("apple", "banana", "orange", "apple", "orange"))// fruits: MultiSet[String] = ZSet(banana -> 1, orange -> 2, apple -> 2)val fruitCount: Int =  fruits.foldLeft(0)((n, _) => n + 1)// fruitCount: Int = 5``

This will return `5` because there are five elements in the set we are folding over.

We can convert a `ZSet` back to a `Map` from elements to a measure of how many times they appear using the `toMap` operator.

We can also always convert a `ZSet` back to a `Set` using the `toSet` operator. Any elements in the `ZSet` with a measure equal to zero based on an `Identity` defined on the `Sum` of the measure type will be discarded from the resulting set.