You are on page 1of 24

History of Code pages and

Unicode in SAP
Sumit Kothiyal, Basis Consultant

1
History of Code pages in SAP

With the need for Multi National Language Support in SAP, SAP developed
blended code pages as a workaround to the problems caused by the
MNLS. It also suggested the MDMP solution, which was not much popular
at that time. In the meantime, MDMP has become more stable and is
preferred to blended code pages. As of the Unicode-enabled Basis
release 6.10, the universal solution to the problem is available.
Pre-Unicode Solutions by SAP
Single Code Page System
Blended Code Page System ( Release 3.0D)
MDMP System Configuration ( Release 3.1l).

Language Combinations before Unicode


• It is also possible to specify a customer specific language; this language must use one of the code
pages that SAP supports, see Note 0112065 for more information.

2
First question First. What is a code page ?
A good question to start with, well we should be aware of a code page before we start
with different types of coding techniques implemented in SAP or used in SAP to store the
character data.

• Code Page : All data in the database are stored as a sequence of bytes/numbers,
including characters. For character data, a code page defines the mapping between a byte
sequence and a character (a letter, symbol, ideograph, dingbat, etc.). A code page is a
matrix of code points, which are the combination of the coordinate values for a given
space in a code page matrix (see example). A code page is used whenever character
data is processed on the application server, displayed on the front end, or rendered by
the printer. To ensure there is no error while processing character data SAP uses code
pages based on the ISO standard. Each code page is defined a unique four-digit number.

In the example code page, the character "A"


has the code point '41' in hexadecimal notation,
and the character "}" has the code point '7D'.
The empty fields are reserved for non-printing
characters, such as END OF LINE, or they have
not been assigned a character. The ASCII
characters are shaded in the example.

3
Single Code Page System
System using one standard code page which can support a specific set of languages. In a
single code page system, all application servers and the database use one standard system
code page. This may be a 1-byte code page like Latin 1 (for Western Europe) or Latin 2 (for
Eastern Europe), or a multi-byte Japanese code page. If your system landscape goes
beyond one of these regions, however, this single code page system will no longer be
sufficient.
If you’re using single code page systems, the conversion to Unicode is straightforward and, in
fact, mostly automated

4
SAP Blended Code Pages (R/3 3.0D – R/3 4.6D)

We will start with Blended Code Pages first as this was the first solution for the multi
language support in SAP

From R/3 3.0D on, SAP application servers could run multi-byte blended code pages, which contain
characters from several standard code pages. Blended code pages are not standard code pages, but
SAP-customized pages created to support an increased number of possible language combinations
in a single code page. But such an approach covers only a fixed set of language combinations and
does not allow any flexibility regarding additional code pages. There are two types of SAP Blended
Code Pages: Ambiguous Blended Code Pages and Unambiguous Blended Code Pages

Ambiguous Blended Code Pages Unambiguous Blended Code Pages

6230 SAP Asian UnificationT (trad. Chinese + SJIS1)

6100 SAP Unification (ISO1+ ISO2 + ISO7 + SJIS1)


6240 SAP Asian UnificationC (simp. Chinese + SJIS1)

6250 SAP Asian UnificationK (Korean + SJIS1)

6300 SAP Eurojapan (ISO1 + SJIS1)


6200 SAP Asian Unification (ASCII+SJIS1 + Asian)

6400 SAP Silk Road (ISO7 + SJIS1)

6600 SAP Nagamasa (Thai + SJIS1)


6500 SAP Diocletian (ISO1 with ISO7 D7 and F7)

6700 SAP Trans Siberian (ISO5 + SJIS1)

5
What is the difference ?

When you use an Ambiguous Blended Code Page, several characters can be assigned to one
and the same byte sequence. Each character can be represented by different byte sequences,
or in simple language two characters can share the same code point.
When you use an Unambiguous Blended Code Page, each byte sequence is assigned exactly
one character. Each character can be represented by different byte sequences, or in simple
terms each code point refers exactly to one character.

6
MDMP - Multiple Display/Multiple Processing

History: The MDMP solution has been introduced with R/3 3.1I when the Blended Code
Pages solution turned out to be no longer sufficient, as the size of a code page limits the
number and the combination of languages that can be supported in a single code page
system as is shown in the diagram in the previous slide. MDMP was initially treated as
temporary solution for R/3 systems with the restrictions as explained in SAP Notes 747036,
745030 and 73606. CRM, SCM, BI and other non-R/3 components never supported MDMP.
Since Web AS 6.20, the standard code page technology for SAP systems is Unicode.

Support: Existing MDMP installations are supported up to SAPNetWeaver 2004 (ERP 2004).
With SAP NetWeaver 7.0 (ERP 6.0) and all higher releases including all enhancement
packages, MDMP is out of support: SAP Note 79991 provides the detail of MDMP support. SAP
systems with more than one system code page must be converted to Unicode before or
during the upgrade to SAP NetWeaver 7.0 (ERP 6.0). See SAP Notes 838402 and 928729 for
more information

7
Technical Information

In an MDMP system, in order to allow more languages in the system, more than one code page
is used, the catch is : characters used by these languages are not in the same code page. The
code page used on the application server is selected by the user’s logon language or can say
dynamically. To sum up, only the characters that are in the active code page can be displayed
properly, but on the database they are correct.
Let’s look into this from this perspective, If a user wants to enter Japanese, he/she must log
on in Japanese. To insure that no data corruption occurs, the following restrictions must be
followed: Global data must contain only 7-bit ASCII characters, which are in all code pages,
Users may use only the characters of their log-in language or 7-bit ASCII. Batch processes
must be assigned with the correct user ID and language.

Let us take an example to understand this: MDMP functionality has the ability to dynamically
assign a code page, mapping the hex value to a natural language character, based on the code
page containing the language of the user’s logon session. A Japanese user working in an R/3
MDMP system (logged on with the Japanese language) can view texts that were originally
entered in Kanji (Japanese), but this user cannot correctly view text data originally entered by a
user logged on in Russian. This is because the hex values that represent the Russian text data
would not map correctly to Kanji characters; the Japanese user would see “garbage” characters
if they tried to, for instance, view a customer name that was entered with Russian characters.

8
End of Support for MDMP SAP Systems

Restrictions in MDMP:
• Users can only use the characters of their logon language or 7-bit ASCII.
• Incorrect or faulty locales can lead to data corruption.
• English texts are "fixed" in one code page and therefore must be repeatedly translated.
• Handling errors are likely: e.g. users must log on with the correct language, Batch processes must
be assigned with the correct user ID and language, the correct device type must be used.
• Global data (from tables without language flag) must contain only 7-bit ASCII characters, which are
in all code pages, otherwise data corruption can occur.
• As MDMP is an SAP proprietary solution, mixed MDMP data can not be interpreted by most third party
products. Only RFC and BAPI communication is possible.
• SAP cannot guarantee that Java texts containing Unicode data are properly interpreted by MDMP
systems. Java is always based on Unicode.
• Integration of WebDynpro and MDMP systems is problematic.
• SAP cannot guarantee that data coming from the internet containing Unicode data are properly
interpreted by MDMP systems. The relationships between the language keys and code pages in MDMP
systems are only well-defined with SAP systems.

9
Introduction of Unicode in SAP
The interesting feature of human society is a language or the manner in which we all
communicate and for that matter of fact there are so many languages, divided into many
language families, but every language inevitably changes over even a relatively short time- thus
proving only thing which is constant is change, and the reason is continuous communication between
different language speaking people trying to speak in one language and thus resulting in a small
change, for example an Indian speaking in British English resulting in a different modulation and
intonation of speech and with time includes his own native words into the language. A
significant challenge during the fast-paced development of information technology and
computers was therefore to try to encode language and the characters associated with it into a
form suitable for machines, so as to be able to store and exchange data. Data exchange was,
and still is, challenging, as one must define certain standards in order to ensure the smoothest
possible data exchange between different computers and programs. With time, it became clear
that the variety of different formats introduced— mainly due to increasing globalization—were
still unable to represent languages sufficiently well, and that there were even errors during data
exchange between heterogeneous IT platforms. The solution to this omnipresent problem was
to find Unicode. For the first time, it was accepted and agreed globally to create a uniform
standard, the IDEA was the fixed assignment of one number to every character, guarantees that texts in
any language can be displayed and transmitted without error, both today and in the future.

Today all the sap applications support and are available in UNICODE-based versions, also the
new products to like SAP XI and SAP N/W Portal are delivered in only UNICODE versions. It is a
plan by SAP to end the support for obsolete solutions for combinations of languages and code
pages like MDMP, single code page, Blended code page in R/3, and the termination is being
done step by step. Now the ERP2005 no longer support for MDMP and after 2007 all new
installations of applications based on SAP NetWeaver will only be possible under Uniocde.

10
What is Unicode exactly?

Unicode = universally encoded character set to store information from any language

Unicode defines:
• Properties for each character
• Standardizes script behavior
• Provides a standard algorithm for bi directional text
• Defines cross-mappings for other standards
• Unicode defines a unique code value for every character, regardless of platform,
program or programming language used
•The Unicode standard primarily encodes scripts rather than languages
• Scripts comprise several languages that historically share the same set of symbol
• In many cases a script may serve to write dozens of languages (e.g. the Latin script)
• In other cases one script complies to one language (e.g. Hangul)
•Additionally it also includes punctuation marks, diacritics, mathematical symbols,
technical symbols, musical symbols, arrows, dingbats etc.
• In all, the Unicode Standard comprises >95.000 characters, ideograph sets, symbols.

The Unicode Standard


•The Unicode Standard is a character coding system designed to support the worldwide interchange,
processing and display of written text of the diverse languages and technical disciplines of the
modern world.
• In addition, it supports classical and historical texts of many written languages.

11
What is Unicode exactly? Contd.

Where is Unicode used?


• The Unicode standards has been adopted by many software and hardware vendors
• Most of the OS support Unicode
• Unicode is required for international document and data interchange, the Internet and the WWW, and
therefore by modern standards such as:
Java, C#, Perl, Python
Markup languages such as XML, HTML, XHTML, MathML, WML etc.
JavaScript
LDAP
CORBA etc.

12
Unicode-compliant SAP products (SAP Note79991)

mySAP Business Intelligence (BW)


• The Unicode version of mySAP BW 3.5 is available via Ramp-Up
• the conversion of existing BW installations as customer project
• SAP Note 643813 has a collection of all relevant SAP notes concerning Unicode-based SAP BW
installations

mySAP Product Lifecycle Management (PLM)


• The Unicode version of mySAP PLM 4.0 is available via Ramp-Up

SAP R/3 Enterprise (Ext. 1.10 & higher)


SAP Exchange Infrastructure

•Why do we need Unicode?


•Answer to this question is pretty straight forward as explained in the below
points:
1. The Global support of the IT systems that has multi lingual data with
any restrictions.
2. It uses the web interfaces that opens the door to global customer base
and thus support multi region and multi languages simultaneously.
3. SAP has integrated J2EE and can not support web standards fully, with
UNICODE it can take advantage of XML and Java in the functionality.
4. Only UNICODE be able to integrate inhomogeneous SAP and non-SAP
system landscapes.

13
Guidelines for Unicode Conversion Projects

As was explained earlier today, all SAP applications are available in Unicode-based
versions, and new software products from SAP such as the SAP NetWeaver Exchange
Infrastructure (SAP XI) or the SAP NetWeaver Portal are now only delivered as Unicode
versions. The support of the MDMP based R/3 system is also getting terminated, so the
need of UNICODE system has increased to globalised the system without any restrictions.

Unicode Conversion: Below is a rough overview of the conversion of one SAP system,
which shows the phases of a conversion. The strategy for a conversion remains the same:
Preparation remains very important, followed by the conversion itself, and then the phase
of post processing.

14
Information Gathering , evaluation and analysis

Before Unicode conversion starts we must gather as much as possible and clarify specific
situation in the client. In general cost and efforts are the focus. It is very essential to have a
business justification for the conversion.

The following points describe the possible factors to be taken into account in this step:
Unicode conversion process and its outcome
Acquiring relevant customer-specific information like as follows
• Overview of the system landscape (systems, releases, support packages, front-end
software, and so on)
• Database sizes (in GB), the 50 largest tables, and the hardware configuration of all
relevant systems
• Requirements pertaining to tolerable downtime for individual systems and their impact on
the business
• Code page setup of all systems (MDMP, single code page, blended code page)
• Description and configuration of the interfaces between the systems and to non-SAP
systems.
• Existing add-on solutions in the systems (SAP and non-SAP)
• Number and type of existing custom and modified developments in SAP system.
• Existing rollout plans in other countries for the different systems
• Planned system mergers
• Possible conversion strategies

15
Information Gathering , evaluation and analysis contd.

Gathering experience of other customers


Creating the first “rough” estimate of effort
Defining the business case and creation of initial project plan
Evaluating the consequences if the Unicode is postponed, for more information read note 79991

16
Determining Factors of a Conversion Project

There are certain factors involved in deciding the Unicode conversion, the best way out is
to collect answers to the below basic questions/reasons before you start or even plan for
the conversion.
Your company is planning to upgrade from existing MDMP system to SAP ERP 5.0 and ERP 6.0,
which does not support MDMP, more information in sapnote 79991
You want to use English as the central logon language for all countries or languages.
You want to use Java technologies as ESS/MSS in the MDMP environment.
Support for dialects is needed ( such as Canadian French )
Needs to display certain characters that are not supported in MDMP.
Internet connection is needed.
Java integration is also needed.
Needs to consolidate systems with different code-page configurations.

17
Determining Factors contd.

The duration of project depends on many factors, few main factors are shown below of
course it depends on the availability of the resource and their state of knowledge. As is
defined and categorized below the duration depends on Language used, SAP solution
used and Platform used. The hardware requirement for the Unicode is different, so it has
to be looked into as well in the planning stage and should be accurately sized.

18
Determining Factors contd.

As a minimum value for the conversion of a three-system landscape, you can assume
about four weeks of project runtime. On average, these projects take about three to four
months. For very large MDMP systems with many custom ABAP objects or interfaces to
other MDMP systems, the runtime can even be more than a year.

19
Determining Factors contd.

Specialist needed: A Unicode project not only need Basis/NetWeaver experts but also
requires expertise in the area of ABAP enabling as well as in the interface area. Transaction
SPUMG/SPUM4, SUMG are generally executed by SAP NetWeaver/Basis experts. For the
preparation of the system vocabulary, however, experts in Vocabulary creation is needed.
The export and import procedures and optimization are comparable to an upgrade and
require technical knowledge. Testing is generally the responsibility of the application team.

20
Release Changes and Unicode Conversion

We need to find the best possible way of combining the conversion and upgrade of our system
or should ask this question “ How can an upgrade and Unicode conversion be combined ? “

Upgrades (release change) and Unicode conversions are both projects during the course of which a
great deal of application testing is necessary. Although these are two logically independent steps
there is still the question of how well the two tasks can be combined. This are particularly
interesting in an upgrade from a non-Unicode-capable release with MDMP to SAP ERP 6.0, because
in the target release MDMP is no longer supported (see SAP Note 79991).

There are certain possibilities as mentioned below for deciding the strategy for the upgrade and
Unicode conversion:

1.Separate Projects : upgrade is treated as a separate project from Unicode conversion or vice versa.
This is the greatest possible separation of upgrade and Unicode conversion.

2.Upgrade and Unicode on same weekend : This depends if the runtime of upgrade and conversion
can be accommodated in the weekend. Normally the Unicode conversion itself takes around 40 hours of
downtime and including upgrade downtime it is highly unexpected to finish both the procedures in 48 hrs
time during the weekend. Another drawback in this approach is increase in the complexity of the project
for example handling the ABAP objects during the upgrade from the source release of 4.6C.

21
Release Changes and Unicode Conversion Contd.
3.Upgrade and Unicode conversion of different weekend: It is possible to perform the conversion
and the upgrade in one project, but on different weekend for the conversion of the production system. Assuming that
the upgrade is performed before the conversion, this means that tests must be performed both in the non-Unicode
and the Unicode systems, as in this case the non-Unicode system will be going live on the new release. The
advantages of this approach would be that a sandbox system could be used both for the upgrade and for the
conversion, and that tests may be performed twice, but otherwise would still be performed shortly after one another
in an identical procedure.
4. Combined upgrade and Unicode Conversion: The CU&UC method is primarily developed for the
MDMP customers who are on SAP R/3 4.6 C and going towards the target release of ECC6.0, refer to sapnote 928729
for more information. The major component of this approach is SPUM4 which is equivalent to SPUMG. The principle
behind the SPUM4 and SPUMG is that the transaction will be performed online during the production operation.
Because the runtimes of Transaction SPUMG for MDMP customers will run at least for a matter of days, the
performance of SPUMG in the target release is impossible during the downtime. Thus SPUMG was implemented in
SAP R/3 4.6C as SPUM4, so that online performance would be possible under this release. But Unicode enabling
transaction like UCCHECK is not available under this release so this has to be done on the sandbox or the upgraded
system and then the results can be transported into the production later on. Refer to the diagram shown below for a
sample procedure:

22
Release Changes and Unicode Conversion Contd.

5. Twin Upgrade and Unicode Conversion: As explained earlier CU&UC cannot be performed to
the release prior to SAP R/3 4.6C. Now for these releases the method TU&UC has been developed and
used successfully. In this method the Twin system is created as a copy of the production system and an
upgrade is performed without Unicode conversion. As explained earlier transaction SPUM4 was available
only on SAP R/3 4.6C so it could not be used on the release prior to this, so the Idea is to get the system
upgraded to the target release and then use SPUMG which is available starting from ECC5.0 and then do
the Unicode conversion. The results of the SPUMG can then be transported later to the Upgraded
production system and then the SUMG can be used to make any corrections if needed in the target
upgraded and Unicode converted Production system. For more information on TU&UC limitations and FAQ
see sapnote 959698. Refer to the below diagram for the sample procedure of TU&UC:

23
Summary

The focus of this Presentation was to make clear what is coding and its implications in SAP
starting from Single code pages, Blended code pages ( Ambiguous and Unambiguous code
pages ) and MDMP code pages in SAP system and the challenge involved in the conversion
of these systems to Unicode systems in SAP. Different conversion procedure and
complexities involved. A new installation is relatively simple, because it hardly differs from
the installation of a non-Unicode system. In the conversion of a three-system landscape,
on the other hand, there are already many different options for the implementation of
Unicode as explained in the later part of the presentation. How to estimate the effort of the
Unicode project and the various factors involved. Here, database size, the possible use of
MDMP, the number of custom programs, and the type and number of interfaces all play
significant parts. At the end comparison of Unicode conversion with an upgrade project
made it clear that, depending on the conditions, the Unicode conversion may be easily done
with proper planning and pre-analysis of the impact. The difference between CU&UC and
TU&UC is explained and overview if these conversion procedure was also given.

24

You might also like