Intermediate Exercism • kotlin

Chars

Lesson Overview

# Introduction

About chars

This is potentially a big subject! It is possible to write a long book about it, and several people have done so (search Amazon for “unicode book” to see some examples).

A very brief history

Handling characters in computers was much simpler in earlier decades, when programmers assumed that English was the only important language. So: 26 letters, upper and lower case, 10 digits, several punctuation marks, plus a code (0x07) to ring a bell, and it all fitted into 7 bits: the ASCII character set.

Naturally, people started asking what about à, ä and Ł, then other people started asking about ऄ, ஹ and ญ, and young people wanted emojis 😱. What to do?

To cut a long story short, many smart and patient people had to serve on committees for years, working out the details of the Unicode character set, and of encodings such as UTF-8, and lots of software needed a very complicated rewrite. Also, lots of new bugs were introduced.

To prevent everything breaking, the Unicode/UTF-8 design ensures that the first 127 codes are identical to ASCII (even the bell).

Characters in Kotlin

Languages designed after about 2005 have the huge advantage that a reasonably stable Unicode standard already existed.

Kotlin (first released in 2011) was able to assume that users would use a variety of (human) languages, and would need Unicode to express them.

[Characters][ref-char] in Kotlin are 16-bit (UTF-16) [`codepoints`][wiki-codepoint], the same as a JVM `char`.
This is enough to express most written alphabets, but not the entire range of emojis.

The full Unicode standard uses up to six bytes (48 bits) per character (called a [`grapheme`][wiki-grapheme]).

Kotlin `Strings` support this full standard by using multiple codepoints per character, when necessary.
For example, 😱 would be `\uD83D` and `\uDE31`.

Unfortunately, Java has no built-in grapheme support, and for compatibility neither does Kotlin.

[wiki-codepoint]: https://en.wikipedia.org/wiki/Code_point
[wiki-grapheme]: https://en.wikipedia.org/wiki/Grapheme
[ref-char]: https://kotlinlang.org/docs/characters.html

Character literals are written in single-quotes, and are distinct from strings written in double quotes. This is probably obvious to people from the C/C++ world, but potentially confusing to Python and JavaScript programmers.

val a = 'a'
a::class.qualifiedName  // => kotlin.Char
a.code  // => 97

val jha = 'झ'  // Devanagari alphabet
jha.code  // => 2333

val heart = '❤'  // heart emoji
heart.code  // => 10084

Char.MAX_VALUE.code // => 65535 (64k, the largest code point allowed)

val not_char = 'abc' // => Too many characters in a character literal.

Converting between Char and Int is straightforward:

a.code  // => 97
Char(97) // => 'a'

The compiler allows some forms of integer arithmetic on Chars:

'a' + 5     // => 'f'
'c' - 'a'   // => 2
'c' + 'a'   // => error!

'f' + ('A' - 'a')  // => 'F' (same as 'f'.uppercase()

'f'.dec() // => 'e' (decrement)
'f'.inc() // => 'g' (increment)

Some functions for Char

As always, there are far too many functions to discuss here, so this is just a selection.

  • For appropriate alphabets, change case with uppercase() and lowercase().
  • Test case with isUpperCase() and isLowerCase().
  • Test character type with:
    • isLetter(), covers many alphabets (the Lu, Ll, Lt, Lm, and Lo categories in unicode)
    • isDigit(), in range 0..9 (the Nd category in unicode)
    • isLetterOrDigit(), combines the previous two
    • isWhitespace(), any whitespace character (the Cc, Zp, Zl, and Zs categories in unicode)
'झ'.isLetter()      // => true
'A'.isLowerCase()   // => false
'4'.isDigit()       // => true
'\t'.isWhitespace() // => true  (tab character)

Also, regular expressions (which will be the subject of a later Concept) allow powerful search and manipulation.

Char List and String interconversions

To convert from a String to a List of Chars, we can use toList().

To convert a List of Chars to a String, there is the joinToString() function, which takes a separator (often the empty string) as argument.

val kt = "kotlin".toList()  // => [k, o, t, l, i, n]
kt.joinToString("")   // => "kotlin"
kt.joinToString("_")  // => "k_o_t_l_i_n"

Note that joinToString() operates on a List or Array. To cast a single Char to a 1-character string, use toString().

'a'.toString() // => "a"

To check if a character is present in a String, or a Char list or array, we have in, which maps to the contains() function.

val clist = "kotlin".toList()  // => [k, o, t, l, i, n]
't' in clist     // => true
't' in "kotlin"  // => true

Originally from Exercism kotlin concepts