You are on page 1of 2

Byte order mark (BOM) consists of the character code U+FEFF at the beginning of

a data stream, where it can be used as a signature defining the byte order and e
ncoding form, primarily of unmarked plaintext files.
BOM is useful at the beginning of files that are typed as text, but for which it
is not known whether they are in big or little endian formatit can also serve as
a hint indicating that the file is in Unicode.
While loading through Informatica, If the reader does not recognize the target f
ile as UTF-8 based as no BOM is provided, you can manually put a byte order mark
in the file to make trade mark/UTF-8 encoded characters read properly .
Use post session command in conjunction with the Output Field Names.
sed -i '1s/^/\xEF\xBB\xBF/' TargetFileName
Example:
~]$ vi BOM
~]$ file BOM
BOM: ASCII text
~]$ sed -i '1s/^/\xEF\xBB\xBF/' BOM
~]$ file BOM
BOM: UTF-8 Unicode text
we need to generate a UTF8 plain text file, what i did was, define target as Fla
t file. set code page to UTF-8 encoding of Unicode.
if I put a 3 byte BOM at the begining of the file, the file can be properly iden
tified. But it seems that Informatica does not automatically insert BOM in the f
ile; and if I use string cancatenate function to add 3 bytes, informatica will e
ncode them so that they are not BOM anymore.
In Informatica, if you want to export a file that has UNICODE characters to a UT
F-8 file with BOM character, you can use the following method:

Chr(65279) || '??'

Just use UTF-16 BOM 0xFEFF or 65279 and use Chr function to convert it to string
. Informatica will convert Chr(65279) to 3 characters UTF-8 BOM (0xEFBBBF ) at w
riting file.

Set Chr(65279)|| 'file header' as first line in Expression and then load into t
arget file (format UTF-8).which is heximal 0xfeff.
Im using the output headers option in the session so I probably build the BOM gen
eration there with a custom command. The issue can be considered resolved.
Since Informatica would not pass backslashes (converting them to forward slashes
), I couldnt use the following in the Header command
echo -e \xEF\xBB\xBFMy,Header,Fields,Comma,Delimited
Instead I had to use a post session command in conjunction with the Output Field
Names Header Option (this leaves the # prefix)
sed -i '1s/^/\xEF\xBB\xBF/' MyTargetFile
Set Chr(65279)|| 'file header' as first line in Expression and then load into ta
rget file (format UTF-8)
If the reader I was using does not recognize the target file as UTF-8 based beca
use no BOM is provided. After manually putting a byte order mark in the file, t
he UTF-8 encoded characters were read properly. Is there a setting in Informati
ca to automatically create the BOM at file creation based on outputted code page
? If not, how would you recommend prepending the output with it?
I am able to read and write the data. I imported your wf ran it without any cha
nges.
Source : UTF-8
Target : UTF-8
Integration Service : Unicode

You might also like