Mahir Exercism • julia

Interfaces

Ringkasan Pelajaran

# Introduction

About

In many object-oriented languages, it is common to extend behaviors by subclassing a parent class, inheriting methods from the parent while also adding new ones. Thus we get an object hierarchy.

Julia has no classes, so by extension it has no subclasses. How do we inherit and extend existing behaviors?

We saw in previous concepts how to create a type hierarchy, as well as learning how Julia uses multiple dispatch to choose methods based on argument types.

Julia also has several informal interfaces, which let your custom types hook into behaviors already defined for standard types.

This is most easily illustrated with the Iterator interface.

Iterators

Previously in the syllabus, we have seen several ways that Julia can operate on each element of a collection.

Clearly, Julia knows how to step through standard collections item by item: vectors and ranges in the above examples, also Sets, Dicts, etc.

How do we add this capability to our own custom collections, especially if they can be lazily iterated like a generator?

Minimal requirements

To create a sequence for iteration, we need (at least) two things:

  1. A way to get the first item.
  2. A way to get the next item from the current state.

Julia achieves these tasks with methods of the Base.iterate() function with the following steps:

  1. Base.iterate(iter) is automatically called first.
    • If there is nothing to iterate, returns nothing.
    • Else, returns the Tuple of the first returned item and the first state: (firstitem, firststate).
  2. Base.iterate(iter, state) is automatically called if an item was returned the previous iteration, taking firststate as its state argument.
    • If iteration terminates, returns nothing.
    • Else, returns the Tuple of the next item and the next state: (nextitem, nextstate).
  3. Repeats step 2 until termination.

These steps are all done “under-the-hood”, which means the method calls and the state are managed internally by the iteration protocol (with the state remaining opaque), and only the item is exposed to high-level constructs, such as loops. In other words, we only ever see the item while everything else remains in a black box.

Multiple dispatch is central to this. The type of iter determines which methods are called, and the number of arguments determines whether the first or next item is returned.

Methods for standard types are already defined: Vector, Range, Set, etc (currently 14 types in total).

For a custom MyType to share the same behaviors, we first need to define:

  • Base.iterate(iter::MyType)
  • Base.iterate(iter::MyType, state)

All of this probably sounds confusing in the abstract, but it can be quite simple. For illustration, imagine that we want a type that gives powers of a given integer n, up to a maximum count.

First, define a custom type:

struct Powers
    n::Int      # base number to raise to a power
    count::Int  # maximum length of the sequence
end

We can now define the iterate methods for this type, but what are the first item and first state?

To get the first item, some iterators need a complicated setup, but in this toy case, we can notice that the first item returned is just the base number, so we simply need to check if the value of count is greater than or equal to one.

Now we can consider the first state, which carries information that is used for the next iteration. It is up to you to decide what information you need to keep track of to do this, and it can be any type.

In this case, we could just keep track of the “index” to raise n to, or we can find a more performant option by using the most recent power of n (i.e. the returned item) and the next “index” in the sequence. These two numbers can be wrapped in a Tuple and returned as the state.

function Base.iterate(p::Powers)
    # if iteration should continue
    if 1 p.count
        # return (firstitem, firststate)
        return (p.n, (p.n, 2)) 
    end
    # else terminate iteration
    return nothing
end

In this example, the first item (p.n) is returned to the loop and the first state (p.n, 2) is passed to the iterate(P::Powers, state) method. There we can process the state to produce the next item by multiplying the current item by the base number, thereby increasing its power by 1.

function Base.iterate(p::Powers, state)
    # unpack the state
    curr, index = state

    # create next item                 
    next = curr * p.n

    # if iteration should continue
    if index  p.count
        # return (nextitem, nextstate)
        return (next, (next, index + 1))
    end
    # else terminate iteration
    return nothing
end

If the index value is still within count, the Tuple is returned with the next item and the incremented (next, index + 1) state. When the index value exceeds count, then nothing is returned instead, to indicate iteration has terminated.

Maybe you have noticed that the iterate(p::Powers) and iterate(p::Powers, state) methods have fairly similar function bodies. In fact, if we’re clever, this similarity allows for a streamlined implementation by using only the two argument version with a default argument for the state: Base.iterate(p::Powers, state=(1, 1)) With that, we wouldn’t need the separate single argument iterate(p::Powers). This is commonly done in practice when possible.

When we introduced optional arguments in the [Functions][concept-functions] concept, nothing was said about implementation.

Since then, we learned about multiple dipatch, so it now makes more sense to know that Julia automatically converts a function _with_ optional arguments to multiple methods _without_ optional arguments.

Defining `Base.iterate(P::Powers, state=(1, 1))` is exactly equivalent to defining both `Base.iterate(P::Powers, state)` and `Base.iterate(P::Powers)` separately.

[concept-functions]: https://exercism.org/tracks/julia/concepts/functions

So far, we already have some useful functionality: for the first 4 powers of 3, we can loop through the results, check if a given number is in the results, and apply aggregate functions such as sum().

julia> for p in Powers(3, 4)
            println(p)
       end
3
9
27
81

julia> 9 ∈ Powers(3, 4) # ∈ is a synonym for in
true

julia> sum(Powers(3, 4))
120

Additional methods

Despite this good start, not everything works yet.

julia> [p for p in Powers(3, 4)]
ERROR: MethodError: no method matching length(::Powers)
The function `length` exists, but no method is defined for this combination of argument types.
You may need to implement the `length` method or define `IteratorSize` for this type to be `SizeUnknown`.

Sure enough, length(Powers(3, 4)) gives exactly the same error message.

This is easily fixed, as the length is part of the type constructor: 4 in this case. We just need to define Base.length() for this type, then we can use comprehensions and collect() on the sequence.

julia> Base.length(P::Powers) = P.count

julia> length(Powers(3, 4))
4

julia> [p for p in Powers(3, 4)]
4-element Vector{Int64}:
  3
  9
 27
 81

julia> collect(Powers(3, 4))
4-element Vector{Any}:
  3
  9
 27
 81

julia> Set(Powers(3, 4))
Set{Int64} with 4 elements:
  81
  27
  9
  3

For efficiency, it is best to also define Base.eltype(): the element type of the result. This allows functions such as collect() to pre-assign space for the output Vector{Int}, instead of repeatedly doing a push!() to a Vector{Any}.

Note the syntax, which you may not have seen before: this method is defined on the type, not an instance of the type.

julia> Base.eltype(::Type{Powers}) = Int

Iterators are not required to have finite length. Use of “infinite” sequences is more limited, but looping over them may be useful so long as there is some way to break out of the loop. For example, we could define a variant of Powers that generated values smaller than a certain limit (perhaps typemax(Int), to avoid integer overflow).

Other interfaces

Indexing

The Iterators interface allows us to take an arbitrary custom type and have it behave as a sequence, just by defining a few simple methods.

Suppose we also wanted to be able to index into the sequence:

julia> P = Powers(3, 4)
Powers(3, 4)

julia> P[2]
ERROR: MethodError: no method matching getindex(::Powers, ::Int64)
The function `getindex` exists, but no method is defined for this combination of argument types.

We can add this behavior, and the manual explains how.

Minimally, we need to define getindex().

julia> function Base.getindex(P::Powers, i::Int)
          1 ≤ i ≤ P.count || throw(BoundsError(P, i))
          return p.n ^ i
       end

julia> p = Powers(3, 4)
Powers(3, 4)

julia> p[3]
27

We also need to define:

  • firstindex() and lastindex() to support P[begin] and P[end].
  • Variants of getindex() to support indexing with vectors and ranges.
  • In theory, setindex!(), though it is hard to see what that would even mean in this example.

However, by this point we are doing a lot of work just to make our iterator look more like a Vector.

AbstractArray

It may be worth thinking about alternative interfaces that provide a better starting point.

In this case, we could consider making Powers a subtype of AbstractArray, which gives us a lot of array-like behavior for free.

For the Powers sequence, the corresponding array is of integers, with 1 dimension. You will sometimes see this written AbstractArray{Int, 1}, but the aliases AbstractVector{Int} for 1-D and AbstractMatrix{Int} for 2-D are more common in newer Julia code.

julia> struct PowersArray <: AbstractVector{Int}
           n::Int
           count::Int
       end

Rather than length(), we now need to define size() with a tuple of lengths along each dimension: only one dimension in this case.

julia> Base.size(PA::PowersArray) = (PA.count,)

How do we want to index the entries? In the Multi Dimensional Arrays Concept we introduced the two main approaches:

  • One index per dimension, so M[row, col] for a Matrix M.
  • A single index, corresponding to the layout of entries in memory

These are called Cartesian and Linear respectively.

We only have one dimension in the example, so linear indexing is appropriate.

julia> Base.IndexStyle(::Type{PowersArray}) = IndexLinear()

julia> Base.getindex(PA::PowersArray, i::Int) = PA.n ^ i

julia> p = PowersArray(3, 4)
4-element PowersArray:
  3
  9
 27
 81

julia> p[3]
27

Note that we did not define iterate() in this case. Julia supplies a default implementation automatically, based on our definition of getindex().

Defining our own version of iterate() is still possible, and in some cases might be more efficient.

Using the AbstractArray interface is especially valuable if we want higher-dimensional iteration. At the risk of stretching the toy example beyond its breaking point, this might mean the first m powers of numbers in 1:n, which will iteratively define a size (m, n) array.

struct PowersMatrix{Int} <: AbstractMatrix{Int}
    ns::AbstractVector{Int}
    count::Int
end

Base.size(PM::PowersMatrix) = (PM.count, length(PM.ns))
Base.IndexStyle(::Type{PowersMatrix}) = IndexCartesian()
Base.getindex(PM::PowersMatrix, i::Int, j::Int) = PM.ns[j] ^ i

julia> PM = PowersMatrix(2:6, 4)
4×5 PowersMatrix{Int64}:
  2   3    4    5     6
  4   9   16   25    36
  8  27   64  125   216
 16  81  256  625  1296

julia> PM[3, 2]
27

julia> PM[7] # column major linear indexing
27

julia> PM[21]
BoundsError: attempt to access 4×5 PowersMatrix{Int64} at index [21]
[...]

Rounding

Previous concepts have mentioned various sorts of rounding for numeric types.

julia> round(1.7)
2.0

julia> floor(1.7)
1.0

julia> ceil(1.7)
2.0

# specify output type
julia> round(Int, 1.7)
2

julia> round(2.4 + 7.3im)
2.0 + 7.0im

The round() function is more flexible than this, allowing you to specify the RoundingMode (of the 7 currently supported), decimal places in the output, and the number base.

We might want to have the same functions handle our custom types. Commonly this means composite types such as coordinates.

julia> struct Point3D
           x::Float64
           y::Float64
           z::Float64
       end

julia> p = Point3D(7.63, 9.24, 4.50)
Point3D(7.63, 9.24, 4.5)

julia> round(p)
ERROR: MethodError: no method matching round(::Point3D, ::RoundingMode{:Nearest})
The function `round` exists, but no method is defined for this combination of argument types.

Unsurprisingly, there is a Rounding interface, which minimally requires us to define round() for our type.

julia> Base.round(P3D::Point3D, r::RoundingMode) = Point3D(round(P3D.x, r), round(P3D.y, r),round(P3D.z, r))

julia> round(p)
Point3D(8.0, 9.0, 4.0)

julia> ceil(p)
Point3D(8.0, 10.0, 5.0)

Because we include RoundingMode in the definition, floor() and ceil() work automatically.

Of course, we are free to define more complicated forms of rounding.

  • We might want to round Point3D to be an integer distance from the origin, in the same direction.
  • For angles, we might want to round to the 16 directions used by weather forecasters (“wind SSW at 15 knots”), meaning multiples of π/8 radians.
  • We might want to add the ability to specify digits, sigdigits or base.
  • There are special options for time periods

Summary

Interfaces are quite confusing to explain, because the Julia approach is so different to most other languages. In particular, the name of the interface might never be mentioned in the implementation: very unlike subclassing or mixins.

However, they are surprisingly simple in practice:

  1. Define a suitable type, deciding whether subtyping is useful or not.
  2. Find out which methods are required (from the documentation).
  3. Implement those methods for your custom type; typically just a few methods, and quite short and simple.
  4. Let Multiple Dispatch take care of the rest, automatically.

Originally from Exercism julia concepts