You are on page 1of 50

Kernel Modules n Applications

•A module runs in kernel space.


•Application runs in user space.
•Application performs a single task
Module registers itself to serve future requests
•Module : linked only to kernel, function calls are
exported by kernel only, no lib to link to.
/usr/src/linux2.4.x/include/linux
/usr/src/linux2.4.x/include/asm
•Application libraries : /usr/include (e.g. printf())
Kernel fault is fatal, Application faults(segmentation
Namespace pollution
Everything should be static to avoid namespace pollution
Use prefix
Lowercase
Unique to globals
Usage Count
System keeps count for every module to
determine safe removal of module
MOD_INC_USE_COUNT

MOD_DEC_USE_COUNT

MOD_IN_USE
Security Issues
Security has two faces:- Deliberate and Incidental.
Deliberate:The damage a user can cause through the
misuse of existing programs.
Incidental:Incidentally exploiting bugs.
Any security check in the system is enforced by kernel
code.
Driver writers should avoid encoding security policy in
their code. Security is a policy issue that is often best
handled at higher levels within the kernel, under the control
of the system administrator.
Security Issues
Device access with affects:
1.Device operations that affect global resources (such as
setting an interrupt line)
2.Operations that could affect other users (such as setting a
default block size on a tape driver.
2.By buffer overrun errors.
Precautions
1.Any input received from user processes should be treated
with great suspicion.
2. Be careful with uninitialized memory.
3.Specific operations (e.g., reloading the firmware on an
adapter board, formatting a disk) that could affect the
system, those operations should probably be restricted to
privileged users.
4.Be careful, also, when receiving software from third
parties, especially when the kernel is concerned.A
maliciously modified kernel could allow anyone to load a
module, thus opening an unexpected back door via
create_module.
Precautions
5. Linux kernel can be compiled to have no module support
whatsoever, thus closing any related security holes. All
needed drives must be built directly into the kernel itself.
The Kernel Symbol Table
Insmod which loads module in kernel resolves
undefined symbols against the table of public kernel
symbols.
The table contains the addresses of:-
1) Global kernel items
2) Functions
3)Variables
Symbol table can be read from /proc/ksyms or by ksyms
command.
When a module is loaded, any symbol exported by the
module becomes part of the kernel symbol table.
The Kernel Symbol Table
The Kernel Symbol Table
We can stack new module on top other modules.
Module stacking is useful in complex projects.
For example, the video-for-linux set of drivers exports
symbols used by lower-level device drivers for specific
hardware.
When using stack modules make use modprobe utility to
loads any other modules that are required by your module..
Module to export no symbols:
Define a macro :EXPORT_NO_SYMBOLS
Module to export subset of symbols:EXPORT_SYMTAB

Define a macro before including module.h.


Kernel Symbols

EXPORT_SYMBOL before <module.h>

•EXPORT_NO_SYMBOL

•EXPORT_SYMBOL

•EXPORT_SYMBOL_NOVERS (with
no versioning information)
I/O Ports & I/O Memory
Driver programmers may need to allocate I/O ports, I/O
memory, and interrupt lines explicitly.
System memory is anonymous and may be allocated from
anywhere, I/O memory, ports, and interrupts have very
specific roles.
A driver needs to be able to allocate the exact ports it needs,
not just some ports.
I/O Ports & I/O Memory

The job of a typical driver is, writing and reading I/O ports
and I/O memory.

Access to I/O ports and I/O memory (I/O REGIONS)
happens both at initialization time and during normal
operations.

Device driver should be guaranteed erxclusive access to its
I/O regions to prevent interference.

Developers of linux has implemented request/free
mechanism for I/O REGIONS.(which is a software
abstraction)

Information about registered resources is in
/proc/ioports and /proc/iomem.
Access to I/O regions
The programming interface used to access the I/O

registry is made up of three functions:


1.int check_region(unsigned long start, unsigned long len);
2.struct resource *request_region(unsigned long start,
unsigned long len, char *name);
3.void release_region(unsigned long start, unsigned long
len);
I/O Ports
1.check_region ( ):- May be called to see if a range of ports is
available for allocation.
It returns a negative error code ( -EBUSY or -EINVAL) if the
answer is no.
2.request_region ( ):-Will actually allocate the port range.
Returning a non-NULL pointer value if the allocation succeeds.
3. release region ( ):-To release ports
The three functions are actually macros, and they
are declared in <linux/ioport.h>.
I/O Ports
Sequence for registering ports:-
#include <linux/ioport.h>
#include <linux/errno.h>
static int skull_detect(unsigned int port, unsigned int range)
{ int err;
if ((err = check_region(port,range)) < 0) return err; /* busy */
if (skull_probe_hw(port,range) != 0) return -ENODEV; /* not
found */
request_region(port,range,"skull"); /* "Can't fail" */
return 0;}
I/O Ports
Any I/O ports allocated by the driver must eventually be
released,skull does it from within cleanup_module:
static void skull_release(unsigned int port, unsigned int range)
{
release_region(port,range);
}
I/O Memory
I/O memory information is available in the /proc/iomem file
Access to a certain I/O memory region, the driver should
use the following calls:
1.int check_mem_region(unsigned long start, unsigned long
len);
2. int request_mem_region(unsigned long start, unsigned long
len, char *name);
3. int release_mem_region(unsigned long start, unsigned long
len);
if (check_mem_region(mem_addr, mem_size))
{ printk("drivername:memory already in use\n");return
EBUSY;}
request_mem_region(mem_addr, mem_size, "drivername");
I/O Memory
I/O memory information is available in the /proc/iomem file
Access to a certain I/O memory region, the driver should
use the following calls:
1.int check_mem_region(unsigned long start, unsigned long
len);
2. int request_mem_region(unsigned long start, unsigned long
len, char *name);
3. int release_mem_region(unsigned long start, unsigned long
len);
if (check_mem_region(mem_addr, mem_size))
{ printk("drivername:memory already in use\n");return
EBUSY;}
request_mem_region(mem_addr, mem_size, "drivername");
Device _struct structure
When the character is registered with the kernel,its
file_opeartion structure and name is added to global chrdevs,
array of device_struct structures where the major number
indexes it.This is called the character device switch table
struct device_struct{
const char *name;
struct file_operations *fops;
}
So by looking up chrdevs[YOUR_MAJOR]->fops, the kernel
knows how to talk to the device and what entry points it
supports.
ls -l /dev
Major & Minor Nos
crw-rw-rw- 1 root root 1, 3 Feb 23 1999 null
crw--------- 1 root root 10, 1 Feb 23 1999 psaux
crw----------1 rubini tty 4, 1 Aug 16 22:22 tty1
crw-rw-rw- 1 root dialout 4, 64 Jun 30 11:19 ttyS0
crw-rw-rw- 1 root dialout 4, 65 Aug 16 00:00 ttyS1
crw------- 1 root sys 7, 1 Feb 23 1999 vcs1
crw------- 1 root sys 7, 129 Feb 23 1999 hdcl
crw-rw-rw- 1 root root 1, 5 Feb 23 1999 zero
Major & Minor
The major no. indicates a specific device.
Nos
The major number identifies the driver associated with the
device.
e.g /dev/null and /dev/zero are both managed by driver 1,
whereas virtual consoles and serial terminals are managed by
driver 4
The kernel uses the major number at open time to dispatch
execution to the appropriate driver.
The minor number is used only by the driver specified by the
major number or minor is an instance within the device.
Major 7 is the offical number for the secondary IDE controller
& the IDE subsystem identifies partitions on the master & the
slave device according to minor no.
Major
Syntax for mknod
& Minor Nos
mknod name type major minor
# mknod /dev/lp0 c 6 0
File
struct file_operations{
Operations
loff_t(*llseek)(struct file *,loff_t,int);
ssize_t(*read)(struct file * ,char *,size_t,loff_t *);
ssize_t(*write)(struct file * , const char *,size_t,loff_t *);
unsigned int (*poll)(struct file *,struct poll_table_struct *);
int(*ioctl)(struct inode *,struct file *, unsigned int,unsigned
long);
int(*open)(struct inode *,struct file *);
Int((*release)(struct inode *,struct file *);
}
File Operations
loff_t(*llseek)(struct file * file,loff_t offset,int mode)
The llseek method is used to change the current read / write
position in a file, and the new position is returned as a positive
return value.
loff_t is a long offset
ssize_t(*read)(struct file *file,char *buf,size_t count,loff_t *
offset):-
Used to retrieve data from the device.
A null pointer in this position causes the read system call to fail
with -EINVAL.
On success returns the number of bytes successfully read
File Operations
ssize_t(*write)(struct file *,const char * buf ,size_t count ,loff_t
*offset);
Send data to the device.
If missing -EINVAL is returned.
Else represents the number of bytes successfully written.

unsigned int(*poll)(struct file *,struct poll_table_struct *);


Poll and select both used to inquire if a device is readable or
writeable or in some special state.Either system call can block
until a device becomes readable or writable.
File Operations
int(* ioctl)(struct inode *inode,struct file *file,unsigned int
cmd, unsigned long arg);
Offers a way to issue device -specific commands(like for
Matting a track of a floppy disk,which is neither reading nor
witing.
On error returns -ENOTTY.
File Operations
int(*open)(struct inode * inode ,struct file *file)
The first operation performed on the device file, the driver is
not required to declare a corresponding method,if this entry is
NULL, opening device is always succeeds, but your driver isn't
notified.
File Operations
int (*release) (struct inode * inode , struct file *file);
This operation is invoked when the file structure is being
released. Like open, release can be missing.

File Operations
struct file {
mode_t f_mode;
loff_t f_pos;
unsigned int f_flags;
}
Debugging Techniques

Debugging by Printing

Debugging by Querying

Debugging by Watching
Debugging Techniques
Debugging by Printing: printk is associated with different
loglevels, or priorities, with the messages. We can indicate the
loglevel with a macro.
printk(KERN_DEBUG "Here I am: %s:%i\n", __FILE__,
__LINE_&_);
printk(KERN_CRIT "I'm trashed; giving up on %p\n", ptr);
Debugging Techniques
There are eight possible loglevel strings, defined in the header
<linux/kernel.h>:
KERN_EMERG:-Used for emergency messages, usually those
that precede a crash.
KERN_ALERT:- A situation requiring immediate action.
KERN_CRIT:-Critical conditions, often related to serious
hardware or software failures.
KERN_ERR:-Used to report error conditions; device drivers
will often use KERN_ERR to report hardware difficulties.
KERN_WARNING:-Warnings about problematic situations
that do not, in themselves, create serious problems with the
system.
Debugging Techniques
KERN_NOTICE:-Situations that are normal, but still worthy of
note. A number of security-related conditions are reported at
this level.
KERN_INFO:-Informational messages. Many drivers print
information about the hardware they find at startup time at this
level.
KERN_DEBUG:-Used for debugging messages.
Each string in the macro expansion represents an integer in
angle brackets ranging from 0 to 7.
Printk with no priority defaults to
DEFAULT_MESSAGE_LOGLEVEL specified in
kernel/printk.c as an integer.
Debugging Techniques
Based on the loglevel, the kernel may print the ,message to

the current console, to a serial line or parallel printer.


If priority is less than the integer variable console_loglevel

the message is displayed.


If both klogd and syslogd are running the kernel messages

are appended to /var/log/messages or otherwise treated


depending on syslogd configuration.
klogd is a system daemon which intercepts and logs

Linux kernel mes sages.


sysklogd provides two system utilities which provide

support for system logging and kernel message trapping.


Debugging Techniques

If klogd is not running , the message won't reach user


space unless you read /proc/kmsg


See /proc/sys/kernel/printk

First integer is current console loglevel and default


level for messages.


• We can also write program to set console_loglevel.
Debugging by Printk
How Messages Get Logged:-The printk function writes
messages into a circular buffer that is LOG_BUF_LEN
(defined in kernel/printk.c) bytes long. It then wakes any
process that is waiting for messages, that is, any process that is
sleeping in the syslog system call or that is reading
/proc/kmsg.
If the circular buffer fills up, printk wraps around and starts
adding new data to the beginning of the buffer, overwriting the
oldest data.
Debugging by Printk
Turning the Messages On and Off:-
Each print statement can be enabled or disabled by removing or
adding a single letter to the macro's name.
All the messages can be disabled at once by changing the value
of the CFLAGS variable before compiling.
The same print statement can be used in kernel code and user-
level code, so that the driver and test programs can be managed
in the same way with regard to extra messages.
Debugging by Printk
Implementation of these features can be defined in scull.h
#undef PDEBUG /* undef it, just in case */
#ifdef SCULL_DEBUG
# ifdef __KERNEL__
/* This one if debugging is on, and kernel space */
# define PDEBUG(fmt, args...) printk( KERN_DEBUG
"scull: " fmt, ## args)
# else
Debugging by Printk
/* This one for user space */
#define PDEBUG(fmt, args...) fprintf(stderr, fmt, ## args)
# endif
#else
# define PDEBUG(fmt, args...) /* not debugging: nothing */
#endif
#undef PDEBUGG
#define PDEBUGG(fmt, args...) /* nothing: it's a placeholder */
Debugging by Printk
To simplify the process further, add the following lines to your
makefile:
# Comment/uncomment the following line to disable/enable
debugging
DEBUG = y
# Add your debugging flag (or not) to CFLAGS
ifeq ($(DEBUG),y)
DEBFLAGS = -O -g -DSCULL_DEBUG # "-O" is needed to
expand inlines
else DEBFLAGS = -O2
Endif CFLAGS += $(DEBFLAGS)
Debugging by Querying
Because of some disadvantages of debugging by printk :-
Like system crashing.
And slowing down the system.
We have Debugging by Querying:-By this we can derive
relevant information from the system when we need the
information.
Two main techniques are available to driver developers for
querying the system:-
1.Creating a file in the /proc filesystem
2. Using the ioctl driver method.
Debugging by Querying
Using the /proc Filesystem:-
The /proc filesystem is a special , software -created filesystem
that is used by the kernel to export information to the world.
Each file under /proc is tied to a kernel function that
generates the file's “contents” on the fly when the file is
read.
e.g 1.when we use /proc/modules
2. ps ,top,uptime get their information from /proc
/proc filesystem is dynamic.
All modules that work with /proc should include
<linux/proc_fs.h> to define the proper functions.
Debugging by Querying
To create a read-only /proc file your driver must implement a
function to produce the data when the file is read
Functions to read /proc file .
int (*read_proc)(char *page, char **start, off_t offset, int count,
int *eof, void *data);
Page pointer: points where in the buffer you will write data.
Start : where is interesting data.
Eof : to indicate there is no more data.
Data : is driver specific data pointer.
int (*get_info)(char *page, char **start, off_t offset, int count);
Debugging by
Querying
When some process reads the file(using read system call), the
request will reach your module by means of some connection.
We need to make an entry to /proc hierarchy
With kernel 2.2 and 2.4 sysdep.h is used to simply call
create_proc_read_entry.
Debugging by
Querying
Call used by scull to make its /proc function available as
/proc/scullmem
create_proc_read_entry(“scullmem”,
0 /* default mode */
NULL /*parent dir */
scull_read_procmem,
NULL /*client data */);
Debugging by Querying
The ioctl Method:
We can implement a few ioctl commands tailored for
debugging.
These commands can copy relevant data structure from the
driver to user space where you can examine them.
Just we need another program to issue the ioctl and display the
results.
Debugging by Watching
Sometimes minor problems can be traced just by watching
behaviour of the application in user space.
Ways to watch user -space program
1)Run debugger on it to step through its functions
2) add print statements.
3)run program under strace.
strace command is very powerful tool that shows all the
system calls issued by a user-space program and all the
arguments to the calls
Return value is symbolic form
Debugging by Watching
If system call fails then error ENOMEM and corresponding
string is dispalyed.
Strace receives information from the kernel itself.
A program can be traced regardless of whether or not it was
compiled with debugging support.
We can attach tracing to running process .
Trace information is used to support bug reports.
Command strace ls /dev > /dev/scull0

You might also like