You are on page 1of 28

Migrating From C++ To C#

Introduction

Since its beginning in the 1980s, C++ has come a long way. It has a large
established user base, tested software, its own tools (compilers, etc), and lots of
experienced programmers. It has also developed its own idioms and techniques
for programmers to write effective software. C++ programmers are comfortable
in getting things done with the facilities that are provided with it in an efficient
manner.

.NET is a powerful new platform with a great deal of promise. C# is designed


from the ground up to harness the power of this new framework. It provides a
whole host of features and is strongly based on C++. C# is an object oriented
language and is the first component-oriented language in the C family. It also
makes writing Windows and Web applications faster and easier. C# is gaining
wide acceptance and it is clear that it is here to stay for a long time.

C# is not a replacement for C++, and it is more than likely that both will be used
widely for the foreseeable future. However, there are many practical cases where
there is a necessity to migrate from C++ to C#. For instance, your company's
policy may be to change all existing code to .NET, or perhaps you wish to take
advantage of some of the facilities made available in .NET.

The question is, how do we make the transition as smooth as possible while
getting the best results? Adopting a new language doesn't just mean converting
the existing code from C++ to C#. By just knowing the syntax, a C++
programmer cannot straightaway start programming in C#. These two
languages differ largely by their design and approach towards problem solving,
which makes the language transition harder.

System Requirements

It is preferable that the reader has access to the C# compiler available in


Microsoft Visual C#.NET. This case study is for those programmers coming from
a C++ background, who are new to C# or have just started programming in it.
The programmers with a good understanding of C# are in a better position to
understand the approaches taken in the conversion process.

Case Study Structure

The case study consists of three main sections:


• The approach

In this section, we briefly cover the basic theory that is necessary for
understanding the issues in conversion. It is possible that you may not be clear
about few of the C# features mentioned here - they are covered in the following
section.

• Comparing C++ and C# features:

In this section we look at the different features of the two languages which are
necessary to make the conversion possible.

• Steps in converting existing code:

The steps that are required for converting the existing code from C++ to C# are
covered in this section. An example of converting class hierarchies from C++ to
C# is also covered.

The Approach

What is the best approach for getting equipped for a smooth transition from C++
to C#? Understanding! Migrating from one language to another involves a
considerable effort. This is not because of a change in syntax, rather because of
changes in methodology - design approach, underlying technology, and the
approach towards problem solving. Understanding that there is such a
fundamental shift, and having the knowledge of where the major differences lie,
will help a lot.

The underlying translation models for C++ and C# are quite different. C++
follows a static linkage model, meaning that the source code is compiled by the
compiler to result in object code. The object files are linked to result in an
executable file. The operating system loads and controls the execution of the
program. The language features are designed with this approach in mind. For
example, there is no support for reflection. Moreover, the code is only source
code portable and not much runtime support is available.

C# follows an entirely different translation model - it combines compilation and


interpretation. The source code is converted to an intermediate format known as
MSIL (Microsoft Intermediate Language). A virtual machine, referred to as the
CLR (Common Language Runtime), takes over to execute the instructions. The
execution is thus in the hands of CLR and the code executed is referred to as
managed code. This change in translation model is reflected in the language
features as well.
A very important difference is in the area of memory management: the
programmer no longer has the complete control of the lifetime of the objects in
the heap. The garbage collector takes care of deleting objects whose lifetime is
over. So there is no need for the keyword "delete". However, there are
destructors in C#. If the "delete" keyword is not available in C#, then what is the
use of destructors? In reality, the destructor syntax in C# is very misleading,
especially for programmers from a C++ background. They are actually finalizers
that are called before an object is garbage collected.

Another issue to understand is the change in design criteria. C++ is designed for
experienced programmers in mind and 'trusts the programmer' (in the C
tradition). So no extensive runtime checking is done, there are implicit casts and
promotions in function calls. These features have proven to be very useful, but
also very bug-prone. Therefore, only experienced programmers should use them.
However, C# is designed so that even novice users can learn it fairly easily, and
is also designed with robust software in mind. It performs extensive runtime
checking with very few implicit conversions and tries to make the life of the
programmer easier.

How can this understanding of language design change help in transition from
C++ to C#? Let us use an example. A single argument constructor also serves the
purpose of a conversion operator in C++. When a conversion is required, that
constructor will be called implicitly because, it 'trusts the programmer': it is
assumed that the C++ programmer is aware of it. Such implicit calls may lead to
subtle bugs, like:

class Stack{
public:
Stack (int initivalCapacity);
// constructor that takes int as an argument
// other members
};

// now consider the code


Stack s;
s = 25;

// implicit conversion, a new Stack object is created with int as


argument
// s = Stack(25);
// beware! the programmer may have programmed without being aware that
the
// constructor with int argument is called for the conversion operation
// from int to Stack

To avoid such problems, you cannot use single argument constructors as


conversion operators in C#. You have to support explicit conversion operators
for that. Also, you can appreciate the use of implicit and explicit keywords in C#
better. With this knowledge you are better equipped now. When you write
equivalent C# code, you will also need to examine if a conversion operator needs
to be implemented and decide if it should be declared as explicit or implicit if the
original C++ code had any single argument constructors.

The problem solving approach also differs considerably in these two languages.
Consider writing a simple calculator program. You require a postfix expression
evaluator, and for that you may prefer to have your own reusable version of
Stack. The interface for Stack is well-known and the logic is pretty straight
forward. Still, your approach towards solving such problems may be entirely
different depending on the language you use.

In C++, you would write a template class for the stack. If you want to evaluate an
integral expression then you will instantiate an integer version from that Stack
template class. It has its own benefits like static type checking. You can use this
same implementation for any type of expression, for example floating point
expression, without any changes. It is also extensible.

In C#, all objects come from the common base class 'object', and so you can write
a Stack class which stores 'objects'. Since all the objects inherit from this class, you
can store virtually any object in that Stack. When you retrieve the elements, you
have to employ dynamic type checking to make sure that the types don't mix-up.

As you can see, even for the same well-defined problem, the problem solving
approach differs considerably and you make a different set of decisions and an
entirely different implementation depending on the language you are using!

Another important factor in the transition from C++ to C# is that it is a transition


from an unmanaged environment to a managed environment. In C++ there is
only trivial support from the runtime available, whereas C# has the sophisticated
.NET runtime environment. C++ programmers need to make special efforts to
understand the advantages with the managed environment. For example,
reflection is a powerful feature which can be used to generate and execute
assemblies dynamically. Runtime checks ensure that the security privileges are
available for providing access to resources. You have array bounds checking,
versioning support and most important of all - components that are created from
any language can interact freely. However, it should be noted that the managed
environment also comes with restrictions: you can no longer allocate objects
anywhere you wish - you can only allocate to the heap. Also, you cannot do
generic programming with templates as you could in C++, as .NET doesn't
support it yet. The concept and benefits of a managed environment are new to
C++ programmers, and hence exposure to the facilities with the underlying
framework is essential to get the most out of C#.
In essence, having a broad picture of these two languages and understanding the
differences in the underlying technology and approaches to design and problem
solving are essential for migrating from C++ to C#.

Comparing C++ and C# Features

The first requirement that is needed to move from C++ to C# is a shift in your
mindset. C++ is a language which trusts the programmer. This provides the
programmer with the ability to do whatever he wants. This power does have
drawbacks though - it can be misused and can end up causing major headaches.
C# on the other hand, doesn't trust the programmer as much. It takes many of
the responsibilities from the programmer and enables him to concentrate on the
bigger picture. It removes a few features that were error prone, and introduces
new ones that simplify programming.

Let us now compare the features available in the two languages.

Data Types

The types in C++ can be subdivided into three categories: primitive types,
aggregate types, and pointer types. The primitive types are: bool, char, int, float,
double, wchar_t. The aggregate types are those that are composed of other types.
These include arrays, structures, unions and enums. Both pointers and references
are called as pointer types. In C#, things are a little different, as it only has value
and reference types. The value type is capable of storing data by itself, whereas
the reference type cannot. It stores a reference, which points to the actual data.
The value types can be thought of as equivalent to the primitive types in C++.
They are derived from the class System.ValueType. These types can be stored in
the stack frame of a method. The reference types cannot be stored in the stack
frame, only in the heap.

However, a difference between C++ and C# data types are their size. While the
size of most of the types is implementation-dependent in C++, we have fixed
sizes in C#. We need to be cautious while converting between the available
types. For example, in C++ there is long double, which is 10 bytes. There is no
long double type in C#, and double occupies 8 bytes. There is a new type,
decimal, available in C# that occupies 16 bytes. Since the decimal type occupies 6
more bytes than the long double (in C++) you may think that you should be able
to store a value in decimal in 16 bytes what long double stores in 10 bytes.
However, decimal isn't used to give a wider range, rather it's used for getting a
more precise value, as in the case of currency values. If your intention of using
long double is for higher precision, you don't have any problems, however if it
was for a wider range you may have trouble.
Unsigned types are supported in C#, but they are better avoided because using
them makes the code non CLS-compliant (Common Language Specification -
compliant).

References

We have seen many C# programmers considering C++ references equivalent to


C# references. This is wrong! Actually C# references are closer to C++ pointers.
Remember that the references in C++ serve as a name alias. They are sure to
point to an object, and sure to point to the same object throughout the scope of
the reference. However, it's different in C#. Just like pointers, they can be defined
without initializers, but they can point to different objects at different times, and
they can even point to nothing - the null (actually they throw
NullPointerException and not NullReferenceException when you attempt to
refer a null reference!). So:

//C++
MyClass &ref = null; // error, references cannot be null
MyClass &ref = obj; // needs an initializer
ref = anotherObj // Error: can't change the reference

//C#
MyClass ref; // OK initializer not needed
ref = anObj;
ref = anotherObj; // OK change the reference
ref = null; // allowed

You can think of C# references as 'restricted and safe C++ pointers':

// C++
string * s;
s = new string;

// C#
string s;
s = new string();

Declarations and Definitions

With the discussion of C# references, you may have noticed one drastic
difference in the semantics of the following statement:

string str;

This same statement will be interpreted in different ways by these languages. A


C++ compiler sees this statement as a definition of a variable called str. It
allocates a new stack object (or data area if declared globally) and calls the
default constructor on the allocated object (a string object in this case). A C#
compiler sees the same statement as a declaration for the reference variable str. It
allocates space for the reference alone. It neither allocates space for an object nor
calls the constructor. This should be done explicitly be the programmer:

str = new string();

This statement now allocates memory for the object in the heap and then calls the
default constructor.

C# combines declarations and definitions together, whereas C++ clearly


distinguishes between the two. For this reason, there are no function prototypes
and forward declarations. The C# compiler carefully checks for definite
assignment - you cannot use a variable without initializing it. Such facilities help
avoid bugs, and greatly simplify the life of the programmer.

Structs

Except for the default access specifier, C++ never differentiates between structs
and classes. Both of them are functionally the same. A struct can contain methods
and can be inherited by a class, but C# takes a different path. Here the structs are
just placeholders of other data types and no behavior can be specified. This
means that the structs can no longer contain any methods. No classes can inherit
from them. The advantage is that, as they are value types, they can be stored in
the stack frame. They do not require any indirection and so are more efficient
than classes.

When you want to group some related data where no methods have to be
associated with them, structs are the best solution. When we want to model a real
world entity with both data and methods, classes have to be used. For example,
'Point' in a graph is a simple aggregate type, and for that a struct can be used.
Implementing a 'Vehicle' type may require encapsulating lots of data and
methods operating on it, and for that, classes are better suited. Actually there are
no hard-and-fast rules for deciding between structs and classes. A good rule-of-
thumb is to use structs for the simplest aggregate types and classes for any non-
trivial types.

One notable advantage when you use structs is that they are allocated on the
stack itself and there is no memory overhead. Lots of memory will be saved
when hundreds of objects are created, for example a big array of struct type.
When you use a class type, the objects will be allocated on the heap and hence a
lot of memory overhead is involved (In the current version of .NET, 10 more
bytes are occupied for each heap object compared to an equivalent stack object!).
So, using structs for small types can lead to saving significant amounts of
memory.
MyStruct [] sArr = new MyStruct[10];

Whereas for the class type:

MyClass [] oArr = new MyClass[10];


for(int i = 0; i < 10; i++)
oArr[i] = new MyClass();

Arrays

Arrays are the simplest data structures that are widely used in programming. In
C++, arrays are treated as a contiguous memory location. The low level nature of
arrays create problems with object oriented programming. A base class pointer
cannot be used to iterate through the array of derived class objects:

class Base{
public:
// Base class data members
virtual void boo();
};

class Derived: public Base{


public:
// Derived class data members
virtual void boo();
};

void foo(){
Derived dArr[10];
Base * bPtr = dArr;
for(int i =0; i<10; i++)
bPtr[i]->boo();
// Will not work properly
}

This is because, the size of the base class object may not be equal to the size of the
derived class object. The compiler cannot identify the proper object at the time of
compilation. C# is a dynamic language and has fully-fledged support from the
run time. Arrays are no longer contiguous location in the memory. This makes
such operations legal and safe.

C# does not treat arrays as mere continuous memory locations. It adds object-
oriented characteristic by giving a class System.Array, from which all the arrays
inherit. This class abstracts the operations on an array and can be casted into any
of the arrays. Remember that arrays of all types are also derived from this class.
As arrays are instances of a class, they are always reference types and this holds
good for the arrays of value types. This helps in bound checking for every access
in an array, but a problem is that has it to be allocated on the heap only.
Both the languages support rectangular and jagged arrays. For rectangular
arrays, a chunk of plain memory locations are allocated and indexing is done on
it. In C++, jagged arrays can be implemented by having a pointer array and
allocating memory dynamically for each array. The same idea is followed in C#,
but instead of pointers, references are used. This makes optimal use of space,
since the sub-arrays may be of varying length. The compromise is that additional
indirections are needed to refer to access sub-arrays. This access overhead is not
there in rectangular array since all the sub-arrays are of same size.

// C++ language example for 'rectangular arrays'


float rectArr[5][20];

// C# rectangular arrays, note the difference in syntax


float [,] rect = new float [5,20];

// C++ language example for 'jagged arrays'


float **ptr;
ptr = new float *[5];
for (i=0; i< 5; i++)
ptr[i] = new float [20];

// C# example for 'jagged arrays'


float [] [] ptr;
ptr = new float[5][];
for(int i=0; i<5; i++)
ptr[i] = new float[20];

When more than one method of representation is supported, at some point the
user will require to switch from one representation to another. Here, to convert
from one array type to another, techniques called boxing and un-boxing are used
(discussed later). It also should be noted that C# supports 'Indexer' members that
allow array-like access to data structures.

Enums

Enumerations are of the type int in C and in C++; its type depends on the
number of enumeration constants declared. C#, as an improvement over the old
enumeration, allows you to specify the type of the enumeration:

enum holidays : ubyte{


Sunday = 0,
Saturday = 1
}

C# enums differ from C/C++ enums in that the enumerated constants need to be
qualified by the name of the enumeration when they are used.

enum workingDay { mon,tue,wed,thur,fri };


workingDay today;
today = workingDay.mon;
//note that mon is qualified by workingDay

This name.member syntax helps the enumeration constants to remain in a


separate namespace, thus preventing them from polluting the global namespace.
Furthermore, it prevents name clashes between two different enums:

// C#: no name clashes with other enum members


enum Days { mon, tue, wed, thur, fri, sat, sun };
enum CosmicObjs { earth, mars, jupiter, sun, moon};
enum Companies {sun, microsoft, dell, digital, compaq};
myDay = Days.sun;
computer = Companies.sun;
cosmicObject = CosmicObjs.sun;

Variable Length Argument Lists

Experience has shown that programmers prefer C style printf format, because it
is convenient for exact format specification and is easy to use. C# provides
'params' for the support of variable length argument lists. So you can write your
functions using this facility as in:

int MyPrintf(string format, params object [] args);

For printing, C follows the format string with variable length argument strings;
C++ uses << with cout; Java has overloaded the + operator. In C#, to print the
arguments, the numbering should be as follows:

Console.WriteLine("{1} {2} {3}", i, obj, "someString);

Writing 'Unsafe' Code

C++ is good for writing low-level code, which is useful for programming
systems with features like pointer arithmetic. C# understands the importance of
that, and allows 'unsafe casts', pointers, and pointer arithmetic to be performed
in code segments that are explicitly labeled as unsafe. Note that the keyword
'unsafe' may be misleading - it just specifies that is isn't managed code and that it
may perform low-level operations. Also, it is not as easy nor as powerful as in C+
+.

Argument Passing

When we pass a variable to a method, we are not sure whether it will get
modified or not. To ensure that the variable should not be modified, the
programmer should use the const qualifier for that argument in that method. The
absence of such const qualifiers indicate that the variable could be used for
multiple return values in C#. It introduces two new keywords to achieve these
multiple return values. If the method has multiple return values, it should
explicitly use the ref or out keyword.

Furthermore, C# supports two new types of arguments: ref and out. When we
pass an argument to a method, the caller should be aware that the parameter
may be modified. The ref keyword indicates this. As wekk as during the method
definition, the ref keyword is also used in the method invocation:

//C++
void foo(MyClass & arg1, MyClass & arg2){
// other code;
arg1 = newValue1;
arg2 = newValue2;
}

foo(obj1, obj2);
// Note: the caller may not expect obj1 and obj2 will change

//C#
int foo(ref MyClass arg1, ref MyClass arg2){
arg1 = newValue1;
arg2 = newValue2;
}

foo(ref obj1, ref obj2);


// Now the programmer is aware that obj1 & obj2 may be changed

In a few cases, we may want to initialize the arguments only in the method. The
use of the ref keyword will be flagged as an error by the compiler as a definite
assignment has to be done before the first use. One elementary way to avoid the
error is to initialize the variable with the default value and then to pass it to the
method. C# introduces a new keyword for this situation. Instead of ref, we can
use out, which doesn't force the caller to initialize the variable. However, it is
mandatory for the method to assign some value to it.

//C#
void foo(ref MyClass arg1, out MyClass arg2){
// other code;
arg1 = someValue; // optional
arg2 = someValue; // need to assign some value
}

MyClass obj1, obj2;


obj1 = aValue; // need to initialize
foo(ref obj1, out obj2); // note obj2 is not initialized

Class Abstraction
Just like C++, the basic unit of abstraction is a class. The access specifiers public,
protected and private have the same meaning in both the languages. In addition,
C# provides internal and protected internal access specifiers. The internal
members are available to the whole assembly and the protected internal to the
assembly and the derived classes. Why do you ever need these access specifiers?
There are few cases where you need to access members of other classes in the
same assembly but shouldn't be exposed to the external classes. Since friend
access is not there in C#, this can be a useful feature particularly when you are
designing libraries.

Inheritance

C# doesn't support multiple class inheritance. It only supports single inheritance,


but you can still inherit from multiple interfaces. Pure abstract classes in C++ can
be treated as interfaces in C#. There are many restrictions in using interfaces for
inheritance. You can only have public abstract methods, and no fields are
allowed (not even const fields). However, one interface can inherit from another
interface.

C# only supports public inheritance. Not having private or protected inheritance


doesn't affect the functionality as such. There are a few inconveniences with this
approach, for example, once you implement ICloneable, all the classes that
inherit from that class becomes automatically cloneable, as only public
inheritance is available.

The Object Base Class

C# doesn't support templates as .NET doesn't support it yet. However, a weaker


form of generic programming is supported in C# through the System.Object base
class. This is the apex class for all the objects. This includes the value types like
structs and ints and reference types like arrays and strings. This property is
exploited in the Collections provided in the framework that works in terms of
Objects.

The standard libraries of both C++ and C# provide support for the container
classes. Consider this example of using the vector class:

MyClass obj;
string str = "string object";
const int size = 5;

vector<MyClass> vect(size);

vect[0] = obj;
vect[1] = str;
// Compiler Error: vect can store only MyClass and not others
// insert more elements
// iterator provides a pointer-like syntax for
//traversing the container

cout<<vect[0]<<vect[1]<<endl;
vector<int>::iterator iter = vect.begin();
while(iter != vect.end()){
cout << *iter;
// calls overloaded << operator of MyClass
iter++;
}

Thus, you can have elements of only one type, and the traversing and accessing
is done through iterators. With C#, .NET provides an equivalent container class
for vector - the ArrayList container:

// Creates and initializes a new ArrayList


MyClass obj;
string str = "string object";
ArrayList arrLst = new ArrayList();
arrLst.Add(obj);
arrLst.Add(str);

// can simply use foreach statement for traversing the colletion


foreach(MyClass elem in arrLst){
Console.WriteLine( " {0} ", elem);
}
// throws 'InvalidCastException' as the second element is a string

Operator Overloading

In C++ almost all the operators can be overloaded - there are only a few
operators like the conditional operator, . operator, .* and .-> operators that
cannot be overloaded. C# provides support for operator overloading but to a
limited extent. The syntax for overloading the operators is:

// C++
<return type> ClassName::operator <the operator> (arguments)
// usage example
class MyClass{
public:
MyClass operator + (MyClass &rhs);
};

// C#
<return type> public static operator <the operator>(arguments)
// usage example
class MyClass{
public static MyClass operator + (MyClass lhs, MyClass rhs){}
}
The main difference is that while you can have member or global (mostly friend)
functions in C++, you have static methods for overloading in C#.

Although the syntax looks similar there are a few constraints imposed by C# for
operator overloading. The most important are:

• The methods should be declared as public and static.


• Many of the operators are required to be overloaded in pairs. For
example, if you define == you should overload the != operator also.
• If you define the + operator, the compiler defines the += operator for you
to make things easier.

//C++
class CPPClass{

protected: // can be public or protected or private

bool operator ==(CPPClass &rhs);


// the another argument is passed implicitly by 'this' pointer
// note no != operator defined

bool operator ++(); // type of the return value is not forced


bool operator +(CPPClass &rhs);
bool operator +=(CPPClass &rhs);
// += is not implicitly defined

static int operator-(CPPClass &lhs, CPPClass &rhs);


// both static and non static methods are allowed
};
//C#
class CSharpClass{

//note that all the operators are public and static

public static bool operator ==( CSharpClass lhs, CSharpClass rhs){}


public static bool operator !=( CSharpClass lhs, CSharpClass rhs){}
// relational operators should be overloaded in pair

public static CSharpClass operator ++(CSharpClass arg);


// return type and argument types are forced for few operator

public static bool operator +(CSharpClass rhs);


// += is implicitly defined by the compiler when
// binary + is defined
}

Exceptions

Exception handling in C# is similar to C++. The exception specification of a


method lists all the possible exceptions that the method might throw. In C++,
when the method doesn't lists any exceptions, beware that it is then allowed to
throw any exception, and there is no constraint for a method to catch the
exceptions thrown. Further, exceptions are not only thrown in the form of
classes, but also in the form of primitive types.

The C# exception handling mechanism is much simpler and more elegant.


Firstly, a method cannot throw the exceptions that are not listed in the exception
specification of the method. Catching of the exceptions is mandatory and only
objects of Exception (or derived from) are thrown.

//C++
void foo(){
throw 10;
throw MyException();
throw "This is an Error";
}

void boo() throw (int, Exception){


throw "Something is wrong";
//Error: can only throw int / Exception
}

void doo() throw (){


// guaranteed that no exceptions will be thrown
}

//C#

void foo(){
//will not throw any exception
}

void boo() throws IOException{


throw new IOException(); //OK
throw new MyException();
// Error: allowed to throw only IOException
}

Namespaces

Namespaces are supported in C++ for better organizing the code and are
valuable in large-scale programming. In C#, the syntax for declaring and
organizing classes in a namespace is similar to that of C++. There is no concept of
header files (C# design is such that there is no need for header files, for example,
it combines declarations and definitions) and you have to use the using directive
to open up the members in the namespace for access in the code. You can also
have aliases:

using alias_name = namespace_or_type;

Just like in C++, you can have nested namespaces. The syntax is a bit different:
namespace outer.inner{
// some members
}

Note that you have to use one namespace within another for a similar goal in C+
+:

namespace outer{
namespace inner{
// some members
}
}

There is an importance difference between the namespaces in C++ and C#. In C+


+, namespaces are logical entities and no physical enforcement of namespaces
exists. However in C#, in addition to logical separation, a physical separation of
namespace members and enforcement of hierarchy is there in the form of
assemblies and sub-assemblies. This enables the namespace rules to be enforced
at the physical level.

Properties

It is common for a C++ programmer to give the get and set methods for data
members. Not only does this help in abstracting the details, but it also gives a
few advantages such as that the user cannot assign illegal values to the field,
such as 500 to a field called age, or the programmer can give a read-only version
of the member, such as size of a container, etc.

class MyClass{
private:
int someInt;
int length;
public:
inline int getLength(){
return length;
}
inline int getSomeInt(){
return someInt;
}
inline void setSomeInt(int arg){
if(arg >= minValue && arg <= maxValue)
someInt = arg;
else
error("illegal value");
}
};

//usage:
MyClass anObj;
anObj.setSomeInt(100);
int len = anObj.getLength();
As most of these methods are inlined, the performance isn't affected. However
there are two problems with the usage of such functions. The first is that the
syntax of accessing them is a bit unwieldy. The next is that the approach itself
violates the object oriented programming guidelines. An object is supposed to
expose a behavior and not the implementation. By these methods, obviously the
object exposes its private fields to the user. C# provides a whole new way to
handle this situation through properties.

Properties are very much like the get-set methods, but syntactically different.
Consider this example written with properties in C#:

class MyClass{
private int someInt;
private int length;
public int Length{
get{
return length;
}
}
public int SomeInt(){
get{
return someInt;
}
set{
if(value >= minValue && value <= maxValue)
someInt = value;
else
error("illegal value");
}
}
}
//usage:
MyClass anObj;
anObj.SomeInt = 100;
// set the value of the field through mutator property
int len = anObj.Length;
// get the value of the field through accessor property

Note that a variable value is used in the set method. It is the implicit parameter
passed to the method by the compiler. Its type is the same as that of the property.
As we can see, the syntax is more intuitive to use.

Indexers

We tend to have many container classes that are used to hold a set of objects.
Stacks, Queues, Maps and Hashtables are just a few such important containers.
There are many other objects that can also be viewed as containers. For example,
a menu can be thought of a container of the menu items. In most cases we will
need to access the objects in the containers through an indexer. In C++ this can
be done by overriding the array subscript operator []. We can override it not only
with integers, but with any object we want, which sometimes makes the
subscripting more meaningful:

class EmployeeContainer{
private:
Employee emp[100];
public:
Employee& operator[](int empNo){
//return the employee with the empNo
}
Employee& operator[](string name){
// return the employee with the name
}
};

void foo(){
EmployeeContainer empCont;
// add the employees to the container
Employee emp1 = empCont[5];
empCont["Pranni"].age = 24;
}

C# introduces the indexers to fit this problem of indexing a container. The


equivalent Employee class can be written in C# as:

class EmployeeContainer{
private Employee emp[100];
public Employee this[int empNo]{
// implement it like a property
get{
return emp[empNo];
}
set{
emp[empNo] = Employee;
}
}

public Employee operator[string name] {


// implement it like a property
get{
// getting Employee index mapped by string info
}
set{
// code for setting Employee detail at index position
}
}
}

void foo(){
EmployeeContainer empCont = new EmployeeContainer();
// add the employees to the container
Employee emp1 = empCont[5];
empCont["Pranni"].age = 24;
}
Attributes

Attributes are a significant addition to C#. When you are creating your own
types or components, there is a necessity to associate related details of the
components and their elements. In COM you used type libraries to achieve such
functionality. Traditionally, comments and macros are used in C++
programming for storing the metadata about the class and/or its members. C#'s
attributes are far more powerful and you can give meta-information for many
language elements: fields, methods, events, etc. You can retrieve and examine
such meta-information at runtime using reflection (discussed later). There are
two types of attributes: intrinsic (predefined) and custom attributes.

C# supports a preprocessing facility but there is no separate tool - it is handled


by the compiler itself. The preprocessor support has restricted use though, for
example you cannot have macros. One of the uses of the preprocessor is
conditional methods and that is achieved through Conditional attributes. It is an
intrinsic attribute used for including the method depending on the condition. In
C++ you use preprocessor facilities directly.

//C# code

#define DEBUG
// such definitions should occur only in the beginning

class MyClass{
[Conditional("DEBUG")]
public static void debugFunction(string message){
cout<<message<<endl;
}
// other members
}

C#'s conditional methods are very powerful when used with the Debug and
Trace classes available with the System.Diagnostics namespace. There are many
such useful attributes; one is Serializable, which is discussed later.

You can define your own custom attributes. You have to derive your class from
the AttributeUsage class. Here is one simple example for maintaining the code
comments from the author of the code:

using System;
[AttributeUsage(AttributeTargets.All, AllowMultiple=true)]
// tells that this attribute can be used on any program element
// and there can be multiple entries for each use of attribute

public class CommentAttribute : Attribute{


public CommentAttribute(string comment){
this.commentText = comment;
}
private string commentText;
public string CommentText{
get{
return commentText;
}
}
}

[Comment("Written by Ganni and Pranni")]


class GuineaPig {
// ...
}

class Test {
public static void Main(){
Attribute[] attributes =
Attribute.GetCustomAttributes(typeof(GuineaPig));
//This static method GetCustomAttributes
//is used to retrieve the attribute info
foreach(CommentAttribute attribute in attributes)
Console.WriteLine(attribute.CommentText);
}
}

You can use the custom attributes with the same special syntax as in intrinsic
attributes and there is no need to call the constructor explicitly - you can initialize
the attribute directly. The static method GetCustomAttributes of the Attribute
class is used for retrieving the attributes by passing the type.

Callback Functions

Function pointers are a useful facility in C/C++. The following example shows a
real world example of using function pointers. Say you want to write a menu
program. The aim is to write a program that will call a corresponding function
that is selected in the menu at runtime. Therefore, we have to declare a function
pointer whose signature matches the functions that are written for the menu:

void (*menuSelector)( );
// get the input from the user - selection of the menu item

switch(select){
case NEW : menuSelector = & New( ); break;
case OPEN : menuSelector = & Open( ); break;
// assign the address of the corresponding function to menuSelector
}

menuSelector( );
// now call the selected functionality

The calling of functions using these function pointers, whose value is determined
at runtime, is known as 'call back'. C# provides support for callback functions
and it is called 'delegates' (you can also consider it as an improved version of the
'function objects' in C++).

Delegates closely resemble function pointers, and C# promises that delegates are
type-safe, secure, and object-oriented. A delegate is capable of a holding a
reference to another function so that function can be called later. Even multiple
functions can be installed like that. Callbacks are valuable for event handling. C#
also supports events that are useful in the case of event driven programming like
Windows Forms:

public delegate void Selector();


// Selector is the type that can be used to instantiate
// delegates that take no arguments and return nothing

public Selector menuSelector;

public void New(){


Console.WriteLine("You selected 'New' option");
}

public void Open(){


Console.WriteLine("You selected 'Open' option");
}

string select;
// get the value of select from calling the menu...

Test t = new Test();


switch(select){
case "New" : t.menuSelector = new Selector(t.New);
break;
case "Open" : t.menuSelector = new Selector(t.Open);
break;
// ...
// register the selected method to menuSelector
}
t.menuSelector( );
// call the delegate and it will inturn call the registered method

Reflection and RTTI

When doing object oriented programming, we treat an object as if it were an


more general type. So, for example, we can view a Dog as a mammal, an animal,
or even simply a living thing. So when we have a more generalized version,
sometimes we would like to know what the exact type is and act accordingly.
Say if we have a living thing, we would perform some operations on a mammal,
that we wouldn't on an amphibian. We would perform even more specific
operations if it were a Dog. In such cases, RTTI (Run Time Type Identification)
comes into the picture. C++ provides the typeid operator and a set of classes that
enable the querying of the type of an object at runtime. This operator will return
the exact type of the object only if there is at least one virtual function in it:

class Base{
// no virtual methods
void Base1();
void Base2();
};
class Derived1 : public Base{
virtual void vMethod();
};
class Derived2: public Derived1{
};

void foo(){
Derived2 d2Obj;
Base* bPtr;
bPtr = &d2Obj;
cout<<typeof(*bPtr)<<endl;
Derived1 *dPtr;
dPtr = &d2Obj;
cout<<typeof(*dPtr)<<endl;
}
//output:
// class Base
// class Derived

Reflection is a feature available only in dynamic (interpreted) languages.


Reflection is a powerful facility as we can dynamically load classes, create
objects, change their properties, and invoke methods on it. Although fully
exploiting the power of reflection will not be explored in this case study (see
http://www.csharptoday.com/content.asp?
id=1852&WROXEMPTOKEN=1518115ZIn19JBRkpiV5wX71qkfor a whole piece
on the topic), here is a sample that loads an assembly and invokes its methods
dynamically:

using System;
using System.Reflection;

class ReflectionTest{
// this method will be called dynamically
public void InvokeDynamic(){
Console.WriteLine("Hello, dynamic world!");
}
public static void Main(){
Type t = Type.GetType("ReflectionTest");
// get the type by passing the name of this class
MethodInfo m = t.GetMethod("InvokeDynamic");
object o = Activator.CreateInstance(t);
// Activator is a class defined in System namespace
// you can use it to create objects (remote or local)
m.Invoke(o, null);
// the second argument is the list of arguments passed
// to Invoke - null in this case
}
}
// output:
// Hello, dynamic world!

Memory Management

Moving from C++ to C# takes away a lot of the programmer's freedom. C++
allows you to determine whether to create an object in the stack or on the heap,
whereas C# doesn't. The change from unmanaged to managed environment has
drawbacks to. C# is a dynamic language and all the allocation is done on the
heap. Only value types are allocated on the stack. So, you have to allocate the
memory for all the objects on the heap manually, even for those objects you used
to allocate statically in C++. In C#, in addition to using 'new' for dynamic
allocation for heap objects, you can use it for stack objects (structs) to call the
constructors.

The difference in where the objects are allocated is significant. For example,
when casting is done from a value type to a reference type, memory needs to be
allocated on the heap and initialized. This process is referred to as 'boxing'. For
example:

int i = 10;
object o = i;

Note that you don't need an explicit cast here, as it is an 'upcast'. When the
conversion is done from reference type to value type, it is referred to as
'unboxing'. However you need explicit casting to do that as it is a 'downcast':

int i = 10;
object iRef = i;
int j = iRef + 100; // doesn't compile, needs explicit cast
int k = (int)iRef +100; // now OK

Such conversions are not possible in C++ as there is no common base class.
Boxing and unboxing are costly operations and need to be avoided whenever
possible as it involves creation and destruction of objects.

Garbage Collection

The burden of managing the memory is greatly reduced in C#, as the garbage
collector automatically reclaims the unused/unreferenced objects. With garbage
collection, most of the problems with managing the memory like dangling
pointers and memory leaks are gone. Garbage collection is only for memory
objects, but there are other resources like network connections that need to be
released when the object is recollected. This is done in the finalize method. C#
still supports C++'s destructor syntax, but C# destructors are 'syntactic sugar' for
finalizers.

~MyClass(){
// release resources like database connections}

is equivalent to:

protected override void Finalizer(){


try{
// release resources like database connections }
finally{
base.Finalize();
}
}

which is little tedious to type, and hence the destructor syntax is convenient. The
meaning of destructors is not the same in these languages even though the
syntax is the same. There is no assurance that the object will be garbage collected
or finalizers will be called immediately when there are no more references to that
object. If there are important resources like file handles or database connections
that are released in C++ destructor code, you shouldn't go for Finalize in C#.
Rather, you have to implement the IDisposable interface, override the Dispose
method, and write the code for releasing such connections or handles.

using System.Runtime.InteropServices;

class MyClass : IDisposable{

MyClass(){
// get resources
}

public void Deallocate(){


// code for releasing resources here
}
public void Dispose(){
Deallocate();
GC.SuppressFinalize(this);
// since Dispose is called, the Finalize method should
// not be called... so tell GC to suppress call to
// Finalizer method
}

~MyClass(){
Deallocate();
}

public static void Main(String []args){


MyClass obj = new MyClass();
// use obj;
obj.Dispose();
}
}

To be more precise: it is not possible to determine exactly when the garbage


collector will be called, and so C# doesn't have deterministic finalization. To
overcome this, you have to implement the IDisposable interface and provide the
implementation for the Dispose method. After you use the object, you can release
it by calling the Dispose method explicitly. Who is responsible for calling this
method for objects that are from various sources? The time honored C++
principle of disposing heap objects applies to this also: 'whoever allocated the
memory has to recollect it'.

Steps in Converting Existing Code

There are cases where systems that are written in C++ need to be ported to C#.
The .NET environment can use C++ code directly in two cases:

• When the classes are written in Managed Extensions to C++


• If they are COM components

If the application is written as COM components, then the component can be


used directly in .NET. In the case of COM components, you can use the Type
Library Importer (tlbimp.exe) utility. It reads the COM type library information
and converts it to an equivalent .NET assembly as a proxy class that contains the
necessary metadata. However, it should be noted that the code is still
unmanaged.

'Managed extensions to C++' (MEC) is a set of extensions to the C++ language


provided by Microsoft that can be compiled to code targeting .NET environment.
Most of the existing C++ code is not for component programming; so the code
cannot be used directly in C#. MEC is new to the programming world and hence
there is no possibility that legacy code is written in that.

C# provides support for low-level programming and has facilities to make use of
legacy code. For example, the methods that are available in the DLLs can be
accessed by declaring such methods with the DllImport attribute. You have to
declare such methods as extern - it has a similar use as in C++ for accessing
methods from other languages. It can be applied only to methods implemented
externally. Say, you want to use your favorite MessageBox in traditional
Windows programming:

[DllImport("User32.dll")]
public static extern int MessageBox
(int h, string m, string c, int type);
// now you can use it in your C# code

This feature is of great use if yours is a code library or framework and not a full-
fledged application. You just need to declare the methods in your C# code and
can make use of them by storing them in DLLs.

When you want to convert existing C++ code to run under the .NET platform,
the following decisions need to be made. If the code is simple enough that it can
be rewritten without much effort, then you can go for C#. Practically, C++ code
may involve low-level programming like accessing hardware features. Such
functionality can be done in C# itself to some extent due to its support of C like
structures and allowing restricted use of native pointers. At the level where full-
control over resources is required, you can do explicit memory management as
well. Such code should be done in 'unsafe' blocks. If it is complex enough that it
cannot be handled with the facilities that are available in 'unsafe' then direct
conversion could be made from C++ to Managed Extensions to C++. Code
written like that is accessible from C# code. All this means that the tested, legacy
C++ code need not be discarded and you can still use it under .NET
environment, albeit as unmanaged code.

Thinking of one-to-one correspondence of functionality leads to poor design and


fragile code. Translating C++ code on a line-by-line basis is not feasible as the
two languages differ considerably in their functionality and support. Let us
illustrate this with an example. In C#, all the functions have to be abstracted
inside classes, as no global functions or data is supported. C# doesn't support
global variables/functions because it strictly enforces class as the basic
abstraction mechanism. So, when you are moving to C#, it is better to stick to the
C# mindset - don't think in terms of C++. To illustrate how these ideas
materialize, consider the following example of converting the class hierarchies.

Converting the class hierarchies

Designing class hierarchies differs drastically in C++ and C#. This is because
multiple class inheritance is not supported in C#, only public inheritance is.
Consider the following hierarchy available in C++:

class Base1{
// pure abstract base class
}

class Base2{
// abstract base class
}

class Base3{
// concrete class
}

class Derived: public Base1, protected Base2, private Base 3{


}

Base1 can be represented as an interface as a C++ pure abstract base, which is


equivalent to an interface in C#. The Base2 can be an abstract class in C#. The
problem arises here because multiple class inheritance is involved, as there can
be only one base class in C#. If possible, try to convert Base2 into an interface.
That implementation is available for a few of the methods. In the other cases,
those implementations can be provided in the concrete class, thus making Base2
as an interface feasible. The problem arises when there are data members. In that
case, having it as an interface is not feasible - moving data members is not
advisable.

In general, this can be solved by having Base3 inheriting from Base2. Since Base2
is an abstract class, it can better serve as base, rather than Base3 serving as a base
class for Base2. The C++ code has private, protected, and public inheritance.
How can they be handled in C#? Note that C# supports only public inheritance.
So, you are forced to use public inheritance for all the three types of inheritance
supported in C++, public, private and protected. Using public inheritance
doesn't affect the functionality. The real difference lies in abstraction. In C#
solution, all the members are exposed and the hierarchy looks like this:

interface IBase1{
}
// the naming convention in C# suggests interfaces to use I prefix
before name

abstract class Base2{


}

class Base3 : Base2 {


}

class Derived: IBase1, Base3{


}

Having the exact C++ hierarchy in C# is not possible. However, this can be
achieved to some extent by understanding the inheritance model supported in
these languages.

Case Study Review

Migrating from C++ to C# is not easy as it may seem. C# is strongly based on C+


+, but the two languages differ in their design. The syntactic similarities between
the two languages can be misleading, as there are many semantic and pragmatic
differences. There are many places where the C++ programmer will truly get lost
when he starts programming in C#.

A C++ programmer needs to have a good understanding of the migration


process and should be clear in his/her approach to get best results from such a
transition. The two languages differ in many fundamental ways: design
approach, memory management, problem solving approach, and the underlying
translation technology are just a few differences. To get the best results, it is
essential that the programmer has an overall view of such issues.

The second section of the case study is not just looking at the differences in
features. Rather, it's a discussion of how the transition can be done from C++ to
C# by analyzing its features. Naturally, a clear picture emerges of what to expect
and what not to expect in such a transition.

When there is a necessity to convert the existing code from C++ to C#, a set of
decisions needs to be made. If the code is available as COM components, it can
be used directly instead of manually converting the code. If the code is a library/
framework available as DLLs, then no conversion needs to be done and it can be
used directly in C#. Managed extensions to C++ can be used for minimal
changes in the code and the application becomes available in the .NET
environment. A decision needs to be made if it is necessary to rewrite the whole
code in C#. In that case, line-by-line conversion of code is not feasible and such
transition will need significant effort on the programmers part. It will also
necessitate a change in design approach and new strategies.

All rights reserved. Copyright Jan 2004.

You might also like