Iso-Iec 8859-1

ISO/IEC 8859-1
ISO/IEC 8859-1
ISO/IEC 8859-1:1998
MIME Alias(es) ISO-8859-1 iso-ir-100, csISOLatin1, latin1, l1, IBM819, CP819
Standard ISO/IEC 8859 v t [1]
ISO/IEC 8859-1:1998, Information technology 8-bit single-byte coded graphic character sets Part 1: Latin alphabet No. 1, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is generally intended for Western European languages (see below for a list). It is the basis for most popular 8-bit character sets, including Windows-1252 and the first block of characters in Unicode. ISO-8859-1 is the IANA preferred name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429. The following other aliases are registered for ISO-8859-1: iso-ir-100, csISOLatin1, latin1, l1, IBM819, CP819. The Windows-1252 codepage coincides with ISO-8859-1 for all codes except the range 128 to 159 (hex 80 to 9F), where the little-used C1 controls are replaced with additional characters including all the missing characters provided by ISO-8859-15. Code page 28591 aka Windows-28591 is the actual ISO-8859-1 codepage.
Coverage
ISO 8859-1 encodes what it refers to as "Latin alphabet no. 1," consisting of 191 characters from the Latin script. This character-encoding scheme is used throughout the Americas, Western Europe, Oceania, and much of Africa. It is also commonly used in most standard romanizations of East-Asian languages. Each character is encoded as a single eight-bit code value. These code values can be used in almost any data interchange system to communicate in the following European languages (with a few exceptions due to missing characters, as noted):
Languages with complete coverage

Afrikaans Albanian Basque Breton Catalan Corsican Danish Faroese Galician German Icelandic Indonesian Irish (new orthography) Italian Latin (basic classical orthography) Leonese Malay Manx Norwegian (Bokml and Nynorsk) Occitan Portuguese Rhaeto-Romanic Scottish Gaelic Spanish Swahili Swedish Walloon
English (UK and US)
Luxembourgish (basic classical orthography)
ISO/IEC 8859-1
Languages commonly supported but with incomplete coverage

Language Catalan Czech Dutch Estonian Missing characters , (deprecated) , , , , , , , , ch , , , , (only present in loanwords) , , , (only present in loanwords) , , and the very rare , , , L, l digraph ch digraphs IJ, ij Sh, sh, Zh, zh ISO-8859-15, Windows-1252 ISO-8859-2, Windows-1250 Typical workaround Supported by
Finnish
Sh, sh, Zh, zh
ISO-8859-15, Windows-1252
French Hungarian
digraphs OE, oe, and Y without the diaeresis
ISO-8859-15, Windows-1252
, (or , ; sometimes , ), , (sometimes ISO-8859-2, Windows-1250 , ) Bh, bh, Ch, ch, Dh, dh, Fh, fh, Gh, gh, Mh, mh, Sh, sh, Th, th ISO-8859-14
Irish (traditional orthography) Latin with macrons Mori Turkish
, , , , , , , , , , , , , -, , , , , , , , , , , , , , , , , , , , , , , , ,
ISO-8859-13, Windows-1257 , , , , , , , , , I, i, G, g, S, s ISO-8859-13, Windows-1257 ISO-8859-3, ISO-8859-9, Windows-1254 ISO-8859-14
Welsh
, , , , , , ,
Quotation marks
For some languages listed above the correct typographical quotation marks are missing, as only , " ", and ' ' are included. Also this scheme does not provide for oriented (6- or 9-shaped) single or double quotation marks. Some fonts will display the spacing grave accent (0x60) and the apostrophe (0x27) as a matching pair of oriented single quotation marks, however this is not considered part of the modern standard.
History
ISO 8859-1 was based on the Multinational Character Set used by Digital Equipment Corporation in the popular VT220 terminal. It was developed within ECMA, the European Computer Manufacturers Association, and published in March 1985 as ECMA-94, by which name it is still sometimes known. The second edition of ECMA-94 [2] (June 1986) also included ISO 8859-2, ISO 8859-3, and ISO 8859-4 as part of the specification. In 1985 Commodore adopted ISO 8859-1 for its new AmigaOS operating system. The Seikosha MP-1300AI impact dot-matrix printer, used with the Amiga 1000, included this encoding. [citation needed] In 1992, the IANA registered the character map ISO_8859-1:1987, more commonly known by its preferred MIME name of ISO-8859-1 (note the extra hyphen over ISO 8859-1), a superset of ISO 8859-1, for use on the Internet. This map assigns the C0 and C1 control characters to the unassigned code values thus provides for 256 characters via every possible 8-bit value. ISO-8859-1 is (according to the standards at least) the default encoding of documents delivered via HTTP with a MIME type beginning with "text/" (however the draft HTML 5 specification requires that documents advertised as ISO-8859-1 actually be parsed with the Windows-1252 encoding.[3]) It is the default encoding of the values of certain descriptive HTTP headers, and defines the repertoire of characters allowed in HTML 3.2 documents (HTML 4.0, however, is based on Unicode). It and Windows-1252 are often assumed to be the encoding of text on Unix and Microsoft Windows in the absence of locale or other information, this is only gradually being replaced with Unicode
ISO/IEC 8859-1 encoding such as UTF-8 or UTF-16.
Codepage layout
ISO/IEC 8859-1 _0 0_ 1_ 2_ SP ]] [[Exclamation " 0020 0021 0022 32 mark 33 34 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
#
0023 35
$
0024 36
%
0025 37
&
0026 38
'
0027 39
(
0028 40
)
0029 41
*
002A 42
+
002B 43
,
002C 44
002D 45
.
002E 46
/
002F 47
3_
0
0030 48
1
0031 49
2
0032 50
3
0033 51
4
0034 52
5
0035 53
6
0036 54
7
0037 55
8
0038 56
9
0039 57
:
003A 58
;
003B 59
<
003C 60
=
003D 61
>
003E 62
?
003F 63
4_
@ A
0040 64 0041 65
B
0042 66
C
0043 67
D
0044 68
E
0045 69
F
0046 70
G
0047 71
H
0048 72
I
0049 73
J
004A 74
K
004B 75
L
004C 76
M
004D 77
N
004E 78
O
004F 79
5_
P
0050 80
Q
0051 81
R
0052 82
S
0053 83
T
0054 84
U
0055 85
V
0056 86
W
0057 87
X
0058 88
Y
0059 89
Z
005A 90
[
005B 91
\
005C 92
]
005D 93
^
005E 94
_
005F 95
6_
`
0060 96
a
0061 97
b
0062 98
c
0063 99
d
0064 100
e
0065 101
f
0066 102
g
0067 103
h
0068 104
i
0069 105
j
006A 106
k
006B 107
l
006C 108
m
006D 109
n
006E 110
o
006F 111
7_
p
0070 112
q
0071 113
r
0072 114
s
0073 115
t
0074 116
u
0075 117
v
0076 118
w
0077 119
x
0078 120
y
0079 121
z
007A 122
{
007B 123
|
007C 124
}
007D 125
~
007E 126
8_ 9_ A_ NBSP 00A0 160 B_
00A1 161
00A2 162
00A3 163
00A4 164
00A5 165
00A6 166
00A7 167
00A8 168
00A9 169
00AA 170
00AB 171
00AC 172
SHY 00AD 173
00AE 174
00AF 175
00B0 176
00B1 177
00B2 178
00B3 179
00B4 180
00B5 181
00B6 182
00B7 183
00B8 184
00B9 185
00BA 186
00BB 187
00BC 188
00BD 189
00BE 190
00BF 191
C_
00C0 192
00C1 193
00C2 194
00C3 195
00C4 196
00C5 197
00C6 198
00C7 199
00C8 200
00C9 201
00CA 202
00CB 203
00CC 204
00CD 205
00CE 206
00CF 207
D_
00D0 208
00D1 209
00D2 210
00D3 211
00D4 212
00D5 213
00D6 214
00D7 215
00D8 216
00D9 217
00DA 218
00DB 219
00DC 220
00DD 221
00DE 222
00DF 223
E_
00E0 224
00E1 225
00E2 226
00E3 227
00E4 228
00E5 229
00E6 230
00E7 231
00E8 232
00E9 233
00EA 234
00EB 235
00EC 236
00ED 237
00EE 238
00EF 239
F_
00F0 240
00F1 241
00F2 242
00F3 243
00F4 244
00F5 245
00F6 246
00F7 247
00F8 248
00F9 249
00FA 250
00FB 251
00FC 252
00FD 253
00FE 254
00FF 255
ISO/IEC 8859-1
4
_2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
_0
_1
Similar character sets

ISO-8859-1 was incorporated as the first 256 code points of ISO/IEC 10646 and Unicode. The lower range 32 to 126 (hex 20 to 7E, the G0 subset) maps exactly to the same coded G0 subset of the ISO646 US variant (commonly known as ASCII), whose ISO2022 standard switch sequence is "ESC ( B". The higher range 160 to 255 (hex A0 to FF, the G1 subset) maps exactly to the same subset initiated by the ISO2022 standard switch sequence "ESC . A". ISO/IEC 8859-1 is missing some characters for French and Finnish text and the euro sign. In order to provide some of these characters, ISO/IEC 8859-15 was developed as an update of ISO/IEC 8859-1. This required, however, the removal of some infrequently used characters from ISO/IEC 8859-1, including fraction symbols and letter-free diacritics: , , , , , , , and . The popular Windows-1252 character set adds all the missing characters provided by ISO/IEC 8859-15, plus a number of typographic symbols, by replacing the rarely used C1 controls in the range 128 to 159 (hex 80 to 9F). It is very common to mislabel text data with the charset label ISO-8859-1, even though the data is really Windows-1252 encoded. Many web browsers and e-mail clients will interpret ISO-8859-1 control codes as Windows-1252 characters in order to accommodate such mislabeling but it is not standard behaviour and care should be taken to avoid generating these characters in ISO-8859-1 labeled content. The Apple Macintosh computer introduced a character encoding called Mac Roman, or Mac-Roman, in 1984. It was meant to be suitable for Western European desktop publishing. It is a superset of ASCII, like ISO-8859-1, and has most of the characters that are in ISO-8859-1 but in a totally different arrangement. A later version, registered with IANA as "Macintosh", replaced the generic currency sign with the euro sign . The few printable characters that are in ISO 8859-1 but not in this set are often a source of trouble when editing text on websites using older Macintosh browsers (including the last version of Internet Explorer for Mac). However the extra characters that Windows-1252 has in the C1 codepoint range are all supported in MacRoman. DOS had code page 850, which had all printable characters that ISO-8859-1 had (albeit in a totally different arrangement) plus the most widely used graphic characters from code page 437.
References
[1] http:/ / en. wikipedia. org/ w/ index. php?title=Template:Infobox_character_encoding& action=edit [2] http:/ / www. ecma-international. org/ publications/ files/ ECMA-ST/ Ecma-094. pdf [3] HTML 5 Draft Recommendation 12 April 2010, 8.1 Character encodings (http:/ / dev. w3. org/ html5/ spec/ Overview. html#character-encodings-0), retrieved [2010-04-12].
External links
ISO/IEC 8859-1:1998 (http://www.iso.org/iso/en/CatalogueDetailPage. CatalogueDetail?CSNUMBER=28245&ICS1=35&ICS2=40&ICS3=) ISO/IEC 8859-1:1998 (ftp://std.dkuug.dk/JTC1/sc2/wg3/docs/n411.pdf) - 8-bit single-byte coded graphic character sets, Part 1: Latin alphabet No. 1 (draft dated February 12, 1998, published April 15, 1998) Standard ECMA-94 (http://www.ecma-international.org/publications/standards/Ecma-094.htm): 8-Bit Single Byte Coded Graphic Character Sets - Latin Alphabets No. 1 to No. 4 2nd edition (June 1986) ISO-IR 100 (http://www.itscj.ipsj.or.jp/ISO-IR/100.pdf) Right-Hand Part of Latin Alphabet No.1 (February 1, 1986) Windows Code pages (http://msdn.microsoft.com/goglobal/bb964656)
ISO/IEC 8859-1 Differences between ANSI, ISO-8859-1 and MacRoman Character Sets (http://www.alanwood.net/demos/ charsetdiffs.html) The Letter Database (http://www.eki.ee/letter/) The ISO 8859 Alphabet Soup (http://czyborra.com/charsets/iso8859.html) - Roman Czyborra's summary of ISO character sets
Article Sources and Contributors
Article Sources and Contributors

ISO/IEC 8859-1 Source: http://en.wikipedia.org/w/index.php?oldid=589926607 Contributors: Achurch, Adam78, Adelton, Al shopov, Alxeedo, Amakuha, Andre Engels, Anon user, Anthony, Anrion, Athantor, Auslli, Avjewe, Babak info, Barklund, Basil.bourque, Bearcat, Ben morphett, Bennylin, Bgwhite, BiT, Brion VIBBER, Brycen, Bukzor, Burzuchius, Caoimhin, Ceplm, Cfsenel, Choster, ChrisGualtieri, Christian List, Chrullrich, Circular17, Conversion script, Copyeditor42, Crissov, Curps, CyberSkull, Dakart, DanielPharos, Dbachmann, Deh, Denelson83, Diberri, Docu, Don4of4, Droll, Dthomsen8, Dtobias, Dysprosia, Elektron, Ellmist, Emk (ja), Evertype, Fool4jesus, Furrykef, GPHemsley, Gaius Cornelius, Goh wz, GregorB, Gwinkless, Gyopi, Gtz, Harris7, Harryboyles, Here, Icairns, Incnis Mrsi, Indefatigable, IronGargoyle, Ixfd64, JTN, Jasen betts, Jeronimo, Jkl, John, Jor, Keka, Khukri, Konxykogure, Kooo, Ksn, Kwamikagami, Kwi, LauraALo, Lee Daniel Crocker, Liftarn, Liliana-60, LittleBenW, Livajo, Lmatt, Loadmaster, LoveEncounterFlow, Madacs, Magioladitis, ManuelGR, Martin.Budden, Mat cross, Matthiaspaul, Michael Peter Fustumum, Mikeo, Miles, Mjb, Monedula, Mxn, Mzajac, Naohiro19, Natural Cut, NatusRoma, Nbarth, Nickj, Nikevich, Nikola Smolenski, Nohat, Nsaa, OwenBlacker, Oz1cz, Paddu, Patrick, Paul Magnussen, Pengo, Perey, Phenry, Phil Boswell, PierreAbbat, Pjacobi, Plugwash, Pne, Poccil, Polluks, Poogis, Prof Wrong, Proxyma, QuartierLatin1968, Quota, R'n'B, RARPSL, Raffaele Megabyte, Raise exception, Rama, Red King, RedWolf, Rgrg, Rick Block, RickBeton, Rje, RoToRa, Rogper, Ruhrjung, Ruud Koot, Sandrarossi, Saric, Sburke, Shaun, Simo Kaupinmki, Sl, Sladen, Smb1001, Some jerk on the Internet, Spitzak, Stuartyeates, Stubblyhead, Suruena, TJRC, Tamfos, Tedickey, Telfordbuck, Tevildo, The Nut, Theopolisme, Thistheman, TimR, Timc, Tobias Conradi, Toby Bartels, Torzsmokus, Tox, Truthflux, UTF-8, Urhixidur, Vanisaac, Wavelength, Woohookitty, WorldlyWebster, Wrp103, Yop83, ZanderSchubert, ZeroUm, Zundark, var Arnfjr Bjarmason, , 170 anonymous edits
License
Creative Commons Attribution-Share Alike 3.0 //creativecommons.org/licenses/by-sa/3.0/

Iso-Iec 8859-1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Iso-Iec 8859-1

Uploaded by

Copyright:

Available Formats

ISO/IEC 8859-1

Standard ISO/IEC 8859 v t [1]

Languages with complete coverage

English (UK and US)

Luxembourgish (basic classical orthography)

Languages commonly supported but with incomplete coverage

Sh, sh, Zh, zh

digraphs OE, oe, and Y without the diaeresis

Irish (traditional orthography) Latin with macrons Mori Turkish

ISO-8859-13, Windows-1257 , , , , , , , , , I, i, G, g, S, s ISO-8859-13, Windows-1257 ISO-8859-3, ISO-8859-9, Windows-1254 ISO-8859-14

ISO/IEC 8859-1 encoding such as UTF-8 or UTF-16.

8_ 9_ A_ NBSP 00A0 160 B_

SHY 00AD 173

Similar character sets

Article Sources and Contributors

Article Sources and Contributors

You might also like