You are on page 1of 14

A simple makefile

Suppose you are writing a C++ program which has two source modules,
processing.cxx and gui.cxx, along with numerous include files. If you were to build
your program from scratch, you would need to execute something like these commands:

c++ -c processing.cxx -o processing.o


c++ -c gui.cxx -o gui.o
c++ processing.o gui.o -o my_program

The first two commands are compilation commands, and the third invokes the linker to
combine the two object files into a single executable. If you make changes to gui.cxx
but not to processing.cxx, then you don't need to reexecute the first command, but you
do need to execute the last two commands. makepp can figure this out for you
automatically.

(If you've never worked with make before, you may be thinking that you could combine
the above three commands into a single command, like this:

c++ processing.cxx gui.cxx -o my_program

When you omit the -c option to the compiler, it combines the compilation and linking
step. This is often quite convenient when you are not writing a makefile. However, it's
not a good idea to do this in a makefile, because it always recompiles both modules even
if one of them hasn't changed, and this can take a significant amount of extra time.)

In order to use makepp to control the build process, you'll need to write a <imakefile</i.
The makefile is a text file that contains the recipe for building your program. It usually
resides in the same directory as the sources, and it is usually called Makefile.

Each one of these commands should be a separate rule in a makefile. A rule is an


instruction for building one or more output files from one or more input files. Makepp
determines which rules need to be reexecuted by determining whether any of the input
files for a rule have changed since the last time the rule was executed.

A rule has a syntax like this:

output_filenames : input_filenames
actions

The first line of the rule contains a space-separated list of output files, followed by a
colon, followed by a space-separated list of input files. The output files are also called
targets, and the input files are also called dependencies; we say that the target file
depends on the dependencies, because if any of the dependencies change, the target must
be rebuilt.

The remaining lines of the rule (the actions) are shell commands to be executed. Each
action must be indented with at least one space (traditional make requires a tab
character). Usually, there's just one action line, but there can be as many as you want;
each line is executed sequentially, and if any one of them fails, the remainder are not
executed. The rule ends at the first line which is not indented.

You can place the rules in any order in the makefile, but it is traditional to write the rule
that links the program first, followed by the compilation rules. One reason for this is that
if you simply type “makepp”, then makepp attempts to build the first target in the file,
which means that it will build your whole program and not just a piece of it. (If you want
to build something other than the first target, you have to specify the name of the target
on the command line, e.g., “<A&NBSP;HREF="MAKEPP.HTML">makepp processing.o”.)

The above compilation commands should be written as three separate rules. A makefile
for building this program could look like this:

# Link command:
my_program: processing.o gui.o
c++ processing.o gui.o -o my_program

# Compilation commands:
processing.o: processing.cxx
c++ -c processing.cxx -o processing.o

gui.o: gui.cxx
c++ -c gui.cxx -o gui.o

(Characters on a line following a # are ignored; they are just comments. You do not need
the “# Link command:” comment in the makefile at all.)

To use this makefile, simply cd to the directory and type “makepp”. Makepp will attempt
to build the first target in the makefile, which is my_program. (If you don't want it to
build the first target, then you have to supply a the name of the target you actually want
to build on the command line.)

When makepp attempts to build my_program, it realizes that it first must build
processing.o and gui.o before it can execute the link command. So it looks at the other
rules in the makefile to determine how to build these.

In order to build processing.o, makepp uses the second rule. Since processing.o
depends on processing.cxx, makepp will also try to make processing.cxx. There is
no rule to make processing.cxx; it must already exist.
Makepp checks whether processing.cxx has changed since the last time processing.o
was built. By default, it determines this by looking at the dates on the file. Makepp
remembers what the date of processing.cxx was the last time processing.o was made
by storing it in a separate file (in a subdirectory called .makepp). Makepp will execute
the actions to build the target if any of the following is true:

• The target does not exist.


• The target exists, but makepp does not have any information about the last build.
• The date on any input file has changed since the last build.
• The date on any target has changed since the last build.
• The actions have changed since the last build.
• The last build occured on a different architecture (different CPU type or operating
system type).

It might seem a little funny that makepp executes the action if either the output file or the
input files have changed since the last build. Makepp is designed to guarantee that your
build is correct, according to the commands in the makefile. If you go and modify the file
yourself, then makepp can't guarantee that the modified file is actually correct, so it
insists on rebuilding. (For more information on how makepp decides whether to rebuild,
and how you can control this, see makepp_signatures.)

Now processing.o might not depend only on processing.cxx; if processing.cxx


includes any .h files, then it needs to be recompiled if any of those .h files has changed,
even if processing.cxx itself has not changed. You could modify the rule like this:

# Unnecessary listing of .h files


processing.o: processing.cxx processing.h simple_vector.h list.h
c++ -c processing.cxx -o processing.o

However, it is a real nuisance to modify the makefile every time you change the list of
files that are included, and it is also extremely error prone. You would not only have to
list the files that processing.cxx includes, but also all the files that those files include,
etc. You don't have to do this. Makepp is smart enough to check for include files
automatically. Any time it sees a command that looks like a C or C++ compilation (by
looking at the first word of the action), it reads in the source files looking for #include
directives. It knows where to look for include files by scanning for -I options on your
compiler command line. Any files which are included are automatically added to the
dependency list, and any files which those include. If any of them has changed, the file
will be recompiled.

Once makepp knows that processing.o is up to date, it then determines whether gui.o
needs to be rebuilt by applying the same procedure to the third rule. When both
processing.o and gui.o are known to be built correctly, then makepp applies the same
procedure to see if the link command needs to be reexecuted.
The above makefile will work, but even for this simple problem, an experienced user is
not likely to write his makefile this way. Several improvements are discussed in the next
sections.

Using variables

So far, our makefile for compiling our program of two modules looks like this:
# Link command:
my_program: processing.o gui.o
c++ processing.o gui.o -o my_program

# Compilation commands:
processing.o: processing.cxx
c++ -c processing.cxx -o processing.o

gui.o: gui.cxx
c++ -c gui.cxx -o gui.o

This works wonderfully, but suppose now we want to change some compilation options.
Or maybe we want to use a different compiler. We'd have to change all three compilation
lines.

Similarly, suppose we want to change the list of modules to compile. We'd have to
change it in two places.

Duplication of information like this is a recipe for disaster. If you go and change your
makefile, it's pretty much guaranteed that at some point, you or someone else will forget
to change one of the places. Depending on what the change is (especially if it affects
preprocessor definitions), this can lead to subtle and hard-to-debug problems in your
program.

The way to avoid duplication of information is to specify the information only once and
store it in a variable, which can be accessed each time the information is needed.

# Define the symbols we might want to change:


CXX := c++
CXXFLAGS := -g

OBJECTS := processing.o gui.o

my_program: $(OBJECTS)
$(CXX) $(OBJECTS) -o my_program

processing.o: processing.cxx
$(CXX) $(INCLUDES) $(CXXFLAGS) -c processing.cxx -o processing.o

gui.o: gui.cxx
$(CXX) $(CXXFLAGS) -c gui.cxx -o gui.o
Here $(CXX) expands to be the value of the variable CXX, and similarly for $(CXXFLAGS)
and $(OBJECTS). Now we can just change one line in the makefile, and all relevant
compilation commands are affected.

In fact, we don't even need to change the makefile to change compilation options.
Assignments specified on the command line override assignments in the makefile. For
example, we could type this to the shell:

makepp CXXFLAGS="-g -O2"

which overrides the setting of CXXFLAGS in the makefile. It is as if the makefile contained
the line

CXXFLAGS := -g -O2

instead of the definition it does contain.

CADENCE COMMAND LINE OPTIONS.

Multiple Step mode uses the ncvlog and ncelab commands to compile and elaborate your
design; Single Step mode uses the ncverilog command.

COMMANDS FOR MUTIPLE STEP MODE:

For explaining the commands design file assumed is - tb_spi_ifc_top.v, and all the
commands are given in italic.

Creation of new project:

nclaunch -new

Creates work library (INCA_libs/worklib), cds.lib, and nclaunch.key.


Before you can simulate your design, you must compile and elaborate it.

Compile:

Compiling the design produces an internal representation for each HDL design unit in the
source files.
ncvlog -work worklib -cdslib cds.lib -logfile ncvlog.log (-logfile ncvlog.log
-apeend_log) (-nolog) -errormax 15 -update -linedebug -status tb_spi_ifc_top.v
-status option gives following information

ncvlog: Memory Usage - 6.4M program + 4.9M data = 11.2M total


ncvlog: CPU Usage - 0.0s system + 0.1s user = 0.1s total (0.1s, 44.1% cpu)

Elaborate:
Elaborating the design constructs a design hierarchy based on the instantiation and
configuration information in the design, establishes signal connectivity, and computes
initial values for all objects in the design.
ncelab -Message -work worklib -cdslib cds.lib -logfile ncelab.log -errormax 15
-access +wc -status worklib.tb_spi_ifc_top:module

Simulate - run mode ‘Graphical UI’

ncsim -Message -gui -cdslib cds.lib -logfile ncsim.log -errormax 15 -licqueue -status
worklib.tb_spi_ifc_top:module
run

Simulate - run mode ‘Batch’

ncsim -Message -batch -cdslib cds.lib -logfile ncsim.log -errormax 15 -licqueue


-status worklib.tb_spi_ifc_top:module
-licqueue option puts the simulation in wait mode till we get the license.

Dumping the waveform file for debugging

ncsim -Message -batch -cdslib cds.lib -logfile ncsim.log -input run.cmd -errormax 15
-licqueue -status worklib.tb_spi_ifc_top:module
Contents of run.cmd file
database -open waves -into waves.shm -default
probe -create -shm -all -variables -depth all
run
Options for code coverage:

For code coverage elaboration and simulation options are different and following are the
commands,
Instrument the design by passing the coverage configuration file named covfile.cf during
the elaboration,
ncelab -Message -covfile covfile.cf -work worklib -cdslib cds.lib -logfile ncelab.log
-errormax 15 -access +wc -status worklib.tb_spi_ifc_top:module
Simulate the design by passing the simulation input file named sim_code.tcl,
ncsim -Message -input sim_code.tcl -cdslib cds.lib -logfile ncsim.log -errormax 15
-licqueue -status worklib.tb_spi_ifc_top:module

Contents of file covfile.cf:

select_coverage -block -expr -toggle -instance *…


select_fsm -module *
set_hit_count_limit 4
set_assign_scoring
set_glitch_strobe 100 ps
set_toggle_strobe 100 ps

Particular instances and modules of interest for coverage can be specified instead of *…
and * respectively.

Contents of sim_code.tcl
coverage -setup -dut tb_spi_ifc_top.u_spi_wb
coverage -setup -testname mic_arb1
coverage -code -score b:e
coverage -code -database -local_db mic_arb1.cov
coverage -fsm -database -local_db mic_arb1.fsm
coverage -toggle -database -local_db mic.mst
run
exit

After simulation, directory named “cov_work” will be created with following files
(coverage data),

spi_wb.dgn and mic_arb1.cov è Block and expression coverage


mic.mst and mic_arb1.tog è Toggle coverage
mic_arb1.fsm è FSM coverage
Invoking the reporting tool to analyze the coverage data,
ictr -g
and load above files to analyze the coverage data.
Above commands give block, expression, toggle and FSM coverage. For functional
coverage assertions should be included in the design and is not covered by above
commands.
Some time back in cadence demo/presentation, we were discussing about ‘-access +rwc’
in elaboration of design, and Tutorial PDF says
“This option provides full access (read, write, and connectivity access) to simulation
objects so that you can probe objects and scopes to a simulation database and debug the
design”
I encountered one example of accessing simulation object and wanted to share with all
you folks,
During the simulation we can force/deposit value to any signal (deposit is also like force
except time can be controlled),
force tb_spi_ifc_top.boot_load = 1‘h1;
deposit tb_spi_ifc_top.boot_load = 1‘h1 -after 100ns -absolute
(Can also be issued using ‘simulation’ menu of simvision)
This is nothing but accessing the simulation object, for this we need the access to be
enabled.

A good template for your Verilog file is shown below.

// timescale directive tells the simulator the base units and precision of the simulation
`timescale 1 ns / 10 ps
module name (input and outputs);
// parameter declarations
parameter parameter_name = parameter value;
// Input output declarations
input in1;
input in2; // single bit inputs
output [msb:lsb] out; // a bus output
// internal signal register type declaration - register types (only assigned within always
statements). reg register variable 1;
reg [msb:lsb] register variable 2;
// internal signal. net type declaration - (only assigned outside always statements) wire net
variable 1;
// hierarchy - instantiating another module
reference name instance name (
.pin1 (net1),
.pin2 (net2),
.
.pinn (netn)
);
// synchronous procedures
always @ (posedge clock)
begin
.
end
// combinatinal procedures
always @ (signal1 or signal2 or signal3)
begin
.
end
assign net variable = combinational logic;
endmodule

by metafor:
If a number is already divisible by 5 (as in the initial case of a number 0), then the number is equal to 5k,
where k = 0, 1, 2, 3, etc.

A 0 shifted into the LSB represents a multiplication by 2:

2*(5k) = 10k = 5k' for some integer k'.

A 1 shifted into the LSB represents a multiplication by 2 and an addition of 1:

2*(5k) + 1 = 10k + 1 = 5k' + 1 for some integer k'.

The key is to keep track of the modulus part of that equation; the part not divisible by 5, let's call it m, so the
number = 5k + m.

Each time a 0 bit is shifted, the component divisible by 5 (5k) is multiplied by 2 and remains divisible by 5.
The part not divisible by 5 (m) will be multiplied by 2 as well. This works out to:

2(5k + m) = 10k + 2m = 5k' + 2m for some integer k'.

When a 1 is shifted in, the number is multiplied by 2 and then added to 1:

2(5k + m) + 1 = 10k + 2m + 1 = 5k' + 2m + 1 = 5k' + m' for some integers k' and m'.

When the m' portion (2m + 1) reaches 5, the number is again divisible by 5 and the number can once again
be represented:

5k for some integer k.

An example:

Shift 1: 2(5k) + 1
Shift 0: 2(5k + 1) = 10k + 2 = 5k' + 2
Shift 1: 2(5k' + 2) + 1 = 10k' + 4 + 1 = 5k'' + 5 = 5k''' for some integers k'' and k'''.

Here we see that the modulus component reached 5 through first a multiply by 2 (0*2 = 0) and addition of 1
(0*2 + 1 = 1), then a multiply by 2 (1*2 = 2), then a multiply by 2 and an addition of 1 (2*2 + 1 = 5).

When the modulus reaches 5, it is effectively 0 again.

In the case where the modulus goes from, say 3 to 6 (shift of 0), the value of 6 is equal to 5+1, so the
modulus becomes 1 again. The algebra goes:

5k + 6 = 5k + 5 + 1 = 5(k + 1) + 1 = 5k'' + 1 where k'' is an integer equal to k+1.

To implement this in hardware is easy. A 4-bit register is used to keep track of the modulus. Every entry of a
0 in the LSB will represent a multiplication by 2; so the register is left-shifted by 1. Every entry of a 1 in the
LSB will represent a multiplication by 2 and an addition of 1. So the register is left-shifted by 1 and
incremented by 1.

Or, simply, m_new[3:0] = {m[2:0],1'b0} + LSB.

There is a need to catch the case when the modulus becomes greater than 5, so a comparator (implement
how you'd like) is needed to test m_new[3:0] against 5.

If m_new[3:0] is greater than 5, 5 must be subtracted from m_new[3:0], so:

m_new_qual[3:0] = (m_new[3:0] >= 4'b0101) ? (m_new[3:0] - 4'b0101) : m_new[3:0]


At the edge of the clock, assign m[3:0] = m_new_qual[3:0].

The number is divisible by 5 when m[3:0] == 0.


You are designing a circuit that implements two operations A and B as shown below.

NOTES:
1. At any point in time the circuit is doing either A or B.
2. Delays through modules of each operation are given in the figure below.
3. The circuit must have registers on all inputs. No registers are needed on the outputs.
4. The delay through a register is 5ns.
5. Operation A occurs 70% of the time while operation B occurs 30% of the time.

What is the clock period that will result in highest overall performance?

And questions increasing in difficulty.

8 REACTIONS:

Nikhil 1 said on 11:10 PM, February 06, 2008 : Assuming the internal modules cannot be

further pipelined, the clock would have to be greater than 50 + 5 = 55 ns. This is assuming registers
are put in between all the internal modules.
I don't see the relevance of the 70-30 division, as we have to take the critical path into account.
What am I missing?

Vivek 2 said on 2:57 PM, February 17, 2008 : Higher the frequency, higher will be the

performance. But we need to check what is the minimum delay being taken by any unit in the
design. All "input registers" are taking a delay for "5 ns". So the system clock should be designed
taking this into consideration. Other informations are irrelevant.
Mk 3 said on 11:11 AM, July 12, 2008 : As there is no delay needed at output so 5ns register

delay will not come in picture. Because within the 20ns+5ns period the processing unit "G" can put
the correct data into the unit "h" so the clock period will be 50ns and not 50ns + 5ns.
Also, looking from other angle. if 50ns is the period all the internal pipeline units (f,g,k) can put the
data properly in next unit (g,h,l respectively) for the pipeline to function perfectly. If there is a
output delay requirement from our critical path unit (i.e. h) then that delay will add up into the clock
period.

Anonymous 4 said on 11:41 AM, August 13, 2008 : I think it should be 50ns, not 55.

SaurabhS 5 said on 10:45 AM, April 10, 2010 : Answer wud be 35 ns

for detail solution mail me


mr.saurabh.srivastava@gmail.com

abhishek 6 said on 2:14 AM, January 23, 2011 : It should be 55ns ( 50+5)

SaurabhS 7 said on 11:38 AM, January 23, 2011 : Ok Ok

Guys lets cross check


By 55 clock period (denoted by C5)
scheduling would be
Stages - 45(40+5) 25(20+5) 55(50+5)
C5 will take 3 cycle ok
Stage 35 , 25 (for operation B)
C5 will take 2 clock cycle
Total average time
55*(0.7*3+0.3*2)= 148.5

Now C3 clock with 35 period


Operation A 5 clock
Operation B 2 clock
Total average time
35*(0.7*5+0.3*2)= 143.5

Clear now

Anonymous 8 said on 1:49 PM, February 27, 2011 : Yes, but with 35ns clock, the register at g

input will never receive the output from f before the next clock tick. The clock period has to be
sufficient to accommodate for the maximum combinational path delay in the pipe stages and the
setup/hold time of the register. The delays in each module are a clear indication that they are
combinational and cannot be split further. I agree with Mk's answer. the register setup time will be
added to the previous stage's delay. Also, nothing has been said about the output of h. So if we
assume the output to be left as a combinational one (not connected to any FF), then the clock period
will be 45ns. Although this will cause a warning during par.

Circuit below, which implements operation C, has been designed such that it has two modules i and j, with the
delays shown in the given figure below. The latency for circuit b is 2 and the delay through a register is 5ns.
Assume that circuits a and b are stages of a pipeline as shown in the figure below.

NOTES:
1. The pipeline has 2 stages where the first stage of the pipeline contains circuit a (executing operation A or B)
and the second stage of the pipeline contains circuit b (executing operation C).
2. Stage 2 reads the output of stage 1 as soon as it is available even if the two operations of stage 1 (i.e. A and B)
have imbalanced latencies.
3. The pipeline has registers only on the inputs of the stages
4. The performance and cost overhead due to control circuitry (e.g. Mux) is not considered in this preliminary
analysis.

Based on your analysis in the earlier post, what is the total time required to execute 100 input
parcels sent through the pipeline?

vikas 1 said on 6:04 PM, June 20, 2008 : If operation C that has 2 stages are considered to be
pipelined, then in total we have 3 stages of pipeline..Tclk=40+5=45 ns...
no of jobs to be executd=100
total time=time for 1st job+time for rest 99 jobs..
1st job appears after 3 clock cycles..then after each clock cycle we get output..
so,, total time=(3*45)+(99*45)=4590ns.
am i missing somewhere???

Two stages are added to the pipeline discussed in the earlier post to form the circuit shown below.
1.

2. Stage A decides to send the input either to stages B-C or to stage D.


3. 35% of data from stage A is sent to stages B-C while 65% of data from stage A is sent to stage D.
4. All information given in the earlier post is applicable to this pipeline as well.

Extend the power reduction scheme that your developed in the earlier post and apply it to the new
pipeline. Calculate how much power will be saved when applying this scheme to the new circuit.

Your task is to develop a method to reduce the power consumption of the 2-stage image processing pipeline
shown below (clock inputs are not shown for simplicity).

NOTES:

1.
2. The capacitance of an individual gate (e.g. AND, OR) is 1.
3. The capacitance of a flip flop is 2.
4. The circuit uses a valid-bit protocol to distinguish between valid and invalid data. The valid bit arrives in the
same clock cycle as the valid data.
5. The environment sends valid data once every 5 clock cycles (e.g. one clock cycle with valid data, followed by 4
clock cycles of invalid data).
6. Your power reduction scheme shall not:
change the clock speed or any characteristics of the implementation technology (e.g. supply orthreshold voltage).
add or delete any signals between the circuit and the environment.
change the functional behaviour of the circuit for valid data.
7. Your power reduction scheme may increase the latency of the pipeline

Describe/illustrate your power reduction scheme. Calculate how much less power (as a percentage)
the new circuit (i.e. with your scheme) consumes compared to the original circuit (i.e. without the
power reduction scheme).

TCM and cache interactions


In the event that a TCM and a cache both contain the requested
address, it is architecturally Unpredictable which memory the
instruction data is returned from. It is expected that such an event
only arises from a failure to invalidate the cache when the base
register of the TCM is changed, and so is clearly a programming error.

For a Harvard arrangement of caches and TCM, data reads and writes
can access any Instruction TCM configured as local memory for both
reads and writes. This ensures that accesses to literal pools, Undefined
instructions, and SWI numbers are possible, and aids debugging. For
this reason, an Instruction TCM configured as local memory must
behave as a unified TCM, but can be optimized for instruction fetches.
This requirement only exists for the TCMs when configured as Local
RAM.

You must not program an Instruction TCM to the same base address
as a Data TCM and, if the two RAMs are different sizes, the regions in
physical memory of the two RAMs must not be overlapped unless each
TCM is configured to operate as SmartCache. This is because the
resulting behavior is architecturally Unpredictable.

If a Data and an Instruction TCM overlap, and either is not configured


as SmartCache, it is Unpredictable which memory the instruction data
is returned from.

In these cases, you must not rely on the behavior of ARM1136JF-S


processor that is intended to be ported to other ARM platforms.

You might also like