Menengah Exercism • rust

Char

Ringkasan Pelajaran

# Introduction

About

Rust implements the char type to represent a single character. A char literal is placed within single quotes, like 'a'. Each char is four bytes in size and represents a single Unicode Scalar Value.

However, a char is not always what we think of as a letter. There are some languages, e.g. Hindi, that use diacritics, which are special symbols which modify the character they are attached to. Although the diacritic in Rust is a separate char, it is the diacritic and the character it modifies that we commonly think of as a letter.

The term for a character and its diacritic is grapheme cluster. There are external crates that can be used to process grapheme clusters, such as unicode-segmentation.

Example

pub fn main() {
    let text = "ü"; // a "u" with a diacritic
    let text_vec: Vec<char> = text.chars().collect(); // this gets the chars in "ü"
    println!("{:?}", text_vec.len()); // this prints the number of chars in "ü"
    println!("{:?}", text_vec[0]); // this prints the first char in "ü"
    println!("{:?}", text_vec[1]); // this prints the second char in "ü"
}

prints

2
'u'
'\u{308}'

‘\u{308}’ is another way of writing a char literal. \u indicates it is a unicode char with {308} being the unique Unicode number for that character or diacritic.


Originally from Exercism rust concepts