You are on page 1of 2

How to: Convert Between Legacy Encodings and Unicode (C# Program...

http://msdn.microsoft.com/en-us/library/cc488003

How to: Convert Between Legacy Encodings and Unicode (C# Programming Guide)
Visual Studio 2010 This topic has not yet been rated - Rate this topic In C#, all strings in memory are encoded as Unicode (UTF-16). When you bring data from storage into a string object, the data is automatically converted to UTF-16. If the data contains only ASCII values from 0 through 127, the conversion requires no extra effort on your part. However, if the source text contains extended ASCII byte values (128 through 255), the extended characters will be interpreted by default according to the current code page. To specify that the source text should be interpreted according to a different code page, use the System.Text.Encoding class as shown in the following example.

Example
The following example shows how to convert a text file that has been encoded in 8-bit ASCII, interpreting the source text according to Windows Code Page 737.

class ANSIToUnicode { static void Main() { // Create a file that contains the Greek work (psyche) when interpreted by using // code page 737 ((DOS) Greek). You can also create the file by using Character Map // to paste the characters into Microsoft Word and then "Save As" by using the DOS // (Greek) encoding. (Word will actually create a six-byte file by appending "\r\n" at the en System.IO.File.WriteAllBytes(@"greek.txt", new byte[] { 0xAF, 0xAC, 0xAE, 0x9E }); // Specify the code page to correctly interpret byte values Encoding encoding = Encoding.GetEncoding(737); //(DOS) Greek code page byte[] codePageValues = System.IO.File.ReadAllBytes(@"greek.txt"); // Same content is now encoded as UTF-16 string unicodeValues = encoding.GetString(codePageValues); // Show that the text content is still intact in Unicode string // (Add a reference to System.Windows.Forms.dll) System.Windows.Forms.MessageBox.Show(unicodeValues); // Same content "" is stored as UTF-8 System.IO.File.WriteAllText(@"greek_unicode.txt", unicodeValues); // Conversion is complete. Show the bytes to prove the conversion. Console.WriteLine("8-bit encoding byte values:"); foreach(byte b in codePageValues) Console.Write("{0:X}-", b); Console.WriteLine(); Console.WriteLine("Unicode values:"); string unicodeString = System.IO.File.ReadAllText("greek_unicode.txt"); System.Globalization.TextElementEnumerator enumerator =

1 of 2

5/25/2012 11:48 AM

How to: Convert Between Legacy Encodings and Unicode (C# Program...

http://msdn.microsoft.com/en-us/library/cc488003

System.Globalization.StringInfo.GetTextElementEnumerator(unicodeString); while(enumerator.MoveNext()) { string s = enumerator.GetTextElement(); int i = Char.ConvertToUtf32(s, 0); Console.Write("{0:X}-", i); } Console.WriteLine(); // Keep the console window open in debug mode. Console.Write("Press any key to exit."); Console.ReadKey();

} /* * Output: 8-bit encoding byte values: AF-AC-AE-9E Unicode values: 3C8-3C5-3C7-3B7 */

See Also
Other Resources Strings (C# Programming Guide)

Did you find this helpful?

Yes

No

Community Content
2012 Microsoft. All rights reserved.

2 of 2

5/25/2012 11:48 AM

You might also like