You are on page 1of 5

3/8/2016

Binaries, strings and char lists - Elixir

HOME

INSTALL

GETTING STARTED

LEARNING

DOCS

BLOG

PACKAGES

Binaries, strings and char


lists

News: Elixir v1.2 released


Search...

G ETTI N G ST A RTED

1 UTF-8 and Unicode


2 Binaries (and bitstrings)

1. Introduction

3 Char lists

2. Basic types
3. Basic operators

In Basic types, we learned about strings and used the is_binary/1 function

4. Pattern matching

for checks:

5. case, cond and if


6. Binaries, strings and char

iex> string = "hello"


"hello"
iex> is_binary(string)
true

lists
7. Keywords and maps
8. Modules
9. Recursion

In this chapter, we will understand what binaries are, how they associate with

10. Enumerables and streams

strings, and what a single-quoted value, 'like this', means in Elixir.

11. Processes
12. IO and the file system

UTF-8 and Unicode

13. alias, require and import


14. Module attributes

A string is a UTF-8 encoded binary. In order to understand exactly what we


mean by that, we need to understand the difference between bytes and code
points.

15. Structs
16. Protocols
17. Comprehensions

The Unicode standard assigns code points to many of the characters we know.
For example, the letter a has code point 97 while the letter has code point
322. When writing the string "heo" to disk, we need to convert this code

18. Sigils
19. try, catch and rescue

point to bytes. If we adopted a rule that said one byte represents one code point,

20. Typespecs and behaviours

we wouldnt be able to write "heo", because it uses the code point 322 for

21. Erlang libraries

, and one byte can only represent a number from 0 to 255. But of course,

22. Where to go next

given you can actually read "heo" on your screen, it must be represented
somehow. Thats where encodings come in.

M IX A N D OTP

When representing code points in bytes, we need to encode them somehow.

1. Introduction to Mix

Elixir chose the UTF-8 encoding as its main and default encoding. When we say

2. Agent

http://elixir-lang.org/getting-started/binaries-strings-and-char-lists.html

1/5

3/8/2016

Binaries, strings and char lists - Elixir

a string is a UTF-8 encoded binary, we mean a string is a bunch of bytes


organized in a way to represent certain code points, as specified by the UTF-8
encoding.

3. GenServer
4. Supervisor and Application
5. ETS

Since we have code points like assigned to the number 322, we actually
need more than one byte to represent it. Thats why we see a difference when we
calculate the byte_size/1 of a string compared to its String.length/1:

6. Dependencies and umbrella


apps
7. Task and gen-tcp
8. Docs, tests and with

iex> string = "heo"


"heo"
iex> byte_size(string)

9. Distributed tasks and


configuration

7
iex> String.length(string)
5

M ETA -PROG RA M MI N G I N
EL IX I R

1. Quote and unquote

Note:ifyouarerunningonWindows,thereisachanceyour
terminaldoesnotuseUTF8bydefault.Youcanchangethe
encodingofyourcurrentsessionbyrunning chcp 65001before
entering iex( iex.bat).

2. Macros
3. Domain Specific Languages

S PON SORS

UTF-8 requires one byte to represent the code points h, e and o, but two
bytes to represent . In Elixir, you can get a code points value by using ?:

iex> ?a
97
iex> ?
322

EL IX I R RA D AR

A weekly Elixir email newsletter


with content curated by
Plataformatec. Subscribe

You can also use the functions in the String module to split a string in its code

below.

points:

iex> String.codepoints("heo")

Elixir
Radar

["h", "e", "", "", "o"]

weekly

Subscribe
now

newsletter
You will see that Elixir has excellent support for working with strings. It also
supports many of the Unicode operations. In fact, Elixir passes all the tests
showcased in the article The string type is broken.
However, strings are just part of the story. If a string is a binary, and we have
used the is_binary/1 function, Elixir must have an underlying type
empowering strings. And it does. Lets talk about binaries!

http://elixir-lang.org/getting-started/binaries-strings-and-char-lists.html

2/5

3/8/2016

Binaries, strings and char lists - Elixir

Binaries (and bitstrings)


In Elixir, you can define a binary using <<>>:

iex> <<0, 1, 2, 3>>


<<0, 1, 2, 3>>
iex> byte_size(<<0, 1, 2, 3>>)
4

A binary is just a sequence of bytes. Of course, those bytes can be organized in


any way, even in a sequence that does not make them a valid string:

iex> String.valid?(<<239, 191, 191>>)


false

The string concatenation operation is actually a binary concatenation operator:

iex> <<0, 1>> <> <<2, 3>>


<<0, 1, 2, 3>>

A common trick in Elixir is to concatenate the null byte <<0>> to a string to see
its inner binary representation:

iex> "heo" <> <<0>>


<<104, 101, 197, 130, 197, 130, 111, 0>>

Each number given to a binary is meant to represent a byte and therefore must
go up to 255. Binaries allow modifiers to be given to store numbers bigger than
255 or to convert a code point to its utf8 representation:

iex> <<255>>
<<255>>
iex> <<256>> # truncated
<<0>>
iex> <<256 :: size(16)>> # use 16 bits (2 bytes) to store the
number
<<1, 0>>
iex> <<256 :: utf8>> # the number is a code point
""
iex> <<256 :: utf8, 0>>
<<196, 128, 0>>

http://elixir-lang.org/getting-started/binaries-strings-and-char-lists.html

3/5

3/8/2016

Binaries, strings and char lists - Elixir

If a byte has 8 bits, what happens if we pass a size of 1 bit?

iex> <<1 :: size(1)>>


<<1::size(1)>>
iex> <<2 :: size(1)>> # truncated
<<0::size(1)>>
iex> is_binary(<< 1 :: size(1)>>)
false
iex> is_bitstring(<< 1 :: size(1)>>)
true
iex> bit_size(<< 1 :: size(1)>>)
1

The value is no longer a binary, but a bitstring just a bunch of bits! So a binary
is a bitstring where the number of bits is divisible by 8!
We can also pattern match on binaries / bitstrings:

iex> <<0, 1, x>> = <<0, 1, 2>>


<<0, 1, 2>>
iex> x
2
iex> <<0, 1, x>> = <<0, 1, 2, 3>>
** (MatchError) no match of right hand side value: <<0, 1, 2,
3>>

Note each entry in the binary is expected to match exactly 8 bits. However, we
can match on the rest of the binary modifier:

iex> <<0, 1, x :: binary>> = <<0, 1, 2, 3>>


<<0, 1, 2, 3>>
iex> x
<<2, 3>>

The pattern above only works if the binary is at the end of <<>>. Similar results
can be achieved with the string concatenation operator <>:

iex> "he" <> rest = "hello"


"hello"
iex> rest
"llo"

This finishes our tour of bitstrings, binaries and strings. A string is a UTF-8
encoded binary, and a binary is a bitstring where the number of bits is divisible
http://elixir-lang.org/getting-started/binaries-strings-and-char-lists.html

4/5

3/8/2016

Binaries, strings and char lists - Elixir

by 8. Although this shows the flexibility Elixir provides for working with bits
and bytes, 99% of the time you will be working with binaries and using the
is_binary/1 and byte_size/1 functions.

Char lists
A char list is nothing more than a list of characters:

iex> 'heo'
[104, 101, 322, 322, 111]
iex> is_list 'heo'
true
iex> 'hello'
'hello'

You can see that, instead of containing bytes, a char list contains the code points
of the characters between single-quotes (note that IEx will only output code
points if any of the chars is outside the ASCII range). So while double-quotes
represent a string (i.e. a binary), single-quotes represents a char list (i.e. a list).
In practice, char lists are used mostly when interfacing with Erlang, in
particular old libraries that do not accept binaries as arguments. You can
convert a char list to a string and back by using the to_string/1 and
to_char_list/1 functions:

iex> to_char_list "heo"


[104, 101, 322, 322, 111]
iex> to_string 'heo'
"heo"
iex> to_string :hello
"hello"
iex> to_string 1
"1"

Note that those functions are polymorphic. They not only convert char lists to
strings, but also integers to strings, atoms to strings, and so on.
With binaries, strings, and char lists out of the way, it is time to talk about keyvalue data structures.
Previous

Top

Next

2012-2016 Plataformatec. All rights reserved.

http://elixir-lang.org/getting-started/binaries-strings-and-char-lists.html

5/5

You might also like