You are on page 1of 15

Web Application Framework

June-July

2011

Summer Training Project Report 2011 Submitted by: S.Ganesan Software Engineering Delhi College of Engineering

CERTIFICATE

This is to certify that

2 |Web Application Development

CHAPTER 1: Project Abstract


Web Application Framework A web application framework is software that is designed to support the development of dynamic websites, web applications and web services. The framework aims to alleviate the overhead associated with common activities performed in Web development. NEED As the design of the World Wide Web was not inherently dynamic, early hypertext consisted of hand-coded HTML that was published on web servers. Any modifications to published pages needed to be performed by the pages' author. To provide a dynamic web page that reflected user inputs, the Common Gateway Interface (CGI) standard was introduced for interfacing external applications with web servers. CGI could adversely affect server load, though, since each request had to start a separate process. Programmers wanted tighter integration with the web server to enable high traffic web applications. The Apache HTTP Server, for example, supports modules that can extend the web server with arbitrary code executions or forward specific requests to a web server that can handle dynamic content. Some web servers (such as Apache Tomcat) were specifically designed to handle dynamic content by executing code written in some languages. Around the same time, new languages were being developed specifically for use in the web, such as PHP and Active Server Pages. While the vast majority of languages available to programmers to use in creating dynamic web pages have libraries to help with common tasks, web applications often require specific libraries that are useful in web applications, such as creating HTML (for example, Java Server). Eventually, mature, "full stack" frameworks appeared, that often gathered multiple libraries useful for web development into a single cohesive software stack for web developers to use.

ARCHITECTURES From an architecture perspective, there are generally five major types: request-based, component-based, hybrid, meta, and RIA-based. Modelviewcontroller (MVC) Many frameworks follow the modelviewcontroller (MVC) architectural pattern to separate the data model with business rules from user interface. This is generally considered a good
3 |Web Application Development

practice as it modularizes code, promotes code reuse, and allows multiple interfaces to be applied. Push-based vs. Pull-based Most MVC frameworks follow a push-based architecture. These frameworks use actions that do the required processing, and then "push" the data to the view layer to render the results. Ruby on Rails and Spring MVC are good examples of this architecture. An alternative to this is pullbased architecture, sometimes also called "component-based". These frameworks start with the view layer, which can then "pull" results from multiple controllers as needed. In this architecture, multiple controllers can be involved with a single view. Struts2, Lift, Tapestry are examples of pull-based architectures. Content Management Systems A content management system (CMS) is the collection of procedures used to manage work flow in a collaborative environment. These procedures can be manual or computer-based. The procedures are designed to do the following:

Allow for a large number of people to contribute to and share stored data Control access to data, based on user roles (defining which information users or user groups can view, edit, publish, etc.) Aid in easy storage and retrieval of data Reduce repetitive duplicate input Improve the ease of report writing Improve communication between users

In a CMS, data can be defined as nearly anything: documents, movies, pictures, phone numbers, scientific data, and so forth. CMSs are frequently used for storing, controlling, semantically enriching, and publishing documentation. Serving as a central repository, the CMS increases the version level of new updates to an already existing file. Version control is one of the primary advantages of a CMS.

4 |Web Application Development

CHAPTER 2: Features of a Framework


Web template system Dynamic web pages usually consist of a static part (HTML) and a dynamic part, which is code that generates HTML. The code that generates the HTML can do this based on variables in a template, or on code. The text to be generated can come from a database, thereby making it possible to dramatically reduce the number of pages in a site. Caching Web caching is the caching of web documents in order to reduce bandwidth usage, server load, and perceived "lag". A web cache stores copies of documents passing through it; subsequent requests may be satisfied from the cache if certain conditions are met. Some application frameworks provide mechanisms for caching documents and bypassing various stages of the page's preparation, such as database access or template interpretation. Security Some web application frameworks come with authentication and authorization frameworks, that enable the web server to identify the users of the application, and restrict access to functions based on some defined criteria. Database access and mapping Many web application frameworks create a unified API to a database backend, enabling web applications to work with a variety of databases with no code changes, and allowing programmers to work with higher-level concepts. For higher performance, database connections should be pooled as e.g. AOLserver does. Additionally, some objectoriented frameworks contain mapping tools to provide Object-Relational Mapping, which will map objects to tuples. Other features web application support and database migration tools. URL mapping A framework's URL mapping facility is the mechanism by which the framework interprets URLs. A URL mapping system that uses pattern matching or URL rewriting allows more "friendly" URLs to be used, increasing the simplicity of the site and allowing for better indexing by search engines. For example, a URL that ends with "/page.cgi?cat=science&topic=physics" could be changed to simply "/page/science/physics". This makes the URL easier to read and provides search engines with better information about the structural layout of the site. A graph traversal approach also tends to result in the creation of friendly URLs. A shorter URL such as frameworks may provide include transactional

5 |Web Application Development

"/page/science" tends to exist by default as that is simply a shorter form of the longer traversal to "/page/science/physics". Ajax Ajax, shorthand for "Asynchronous JavaScript and XML", is a web development technique for creating interactive web applications. The intent is to make web pages feel more responsive by exchanging small amounts of data with the server behind the scenes, so that the entire web page does not have to be reloaded each time the user requests a change. This is intended to increase the web page's interactivity, speed, and usability. Automatic configuration Some frameworks minimize web application configuration through the use of introspection and/or following known conventions. For example, many Java frameworks use Hibernate as a persistence layer, which can generate a database schema at runtime capable of persisting the necessary information. This allows the application designer to design business objects without needing to explicitly define a database schema. Frameworks such as Ruby on Rails can also work in reverse, that is, define properties of model objects at runtime based on a database schema.

6 |Web Application Development

CHAPTER 3: MODULE DESCRIPTION


.htaccess In several web servers (most commonly Apache), .htaccess (hypertext access) is the default name of a directory-level configuration file that allows for decentralized management of web server configuration. The .htaccess file is placed inside the web tree, and is able to override a subset of the server's global configuration; the extent of this subset is defined by the web server administrator. The original purpose of .htaccess was to allow per-directory access control (e.g. requiring a password to access the content), hence the name. Nowadays .htaccess can override many other configuration settings, mostly related to content control, e.g. content type and character set, CGI handlers, etc. In the Apache web server, the format of .htaccess is the same as the server's global configuration file; other web servers (such as Sun Java System Web Server and Zeus Web Server) implement the same syntax, even though their configuration files are very different. Directives in the .htaccess file apply to the current directory, and to all sub-directories (unless explicitly disabled in the server configuration), but for reasons of performance and security, cannot affect their parent directories. The file name begins with a dot because dot-files are by convention hidden files on Unixlike operating systems. Purpose Authorization, authentication .htaccess files are often used to specify the security restrictions for the particular directory, hence the filename "access". The .htaccess file is often accompanied by a .htpasswd file which stores valid usernames and their passwords.[3] Rewriting URLs Servers often use .htaccess to rewrite long, overly comprehensive URLs to shorter and more memorable ones. Blocking Use allow/deny to block users by IP address or domain. Also, use to block bad bots, rippers and referrers. Often used to restrict access by Search Engine spiders SSI Enable server-side includes.
7 |Web Application Development

Directory listing Control how the server will react when no specific web page is specified. Customized error responses Changing the page that is shown when a server-side error occurs, for example HTTP 404 Not Found. MIME types Instruct the server how to treat different varying file types. Cache Control .htaccess files allow a server to control caching by web browsers and proxies to reduce bandwidth usage, server load, and perceived lag.

Web Cache A web cache is a mechanism for the temporary storage (caching) of web documents, such as HTML pages and images, to reduce bandwidth usage, server load, and perceived lag. A web cache stores copies of documents passing through it; subsequent requests may be satisfied from the cache if certain conditions are met. It should not be confused with a web archive, a site that keeps old versions of web pages. Web caches various systems.

A search engine may cache a website. A forward cache is a cache outside the web servers network, e.g. on the client software's ISP or company network. A network-aware forward cache is just like a forward cache but only caches heavily accessed items. A reverse cache sits in front of one or more Web servers and web applications, accelerating requests from the Internet. A client, such as a web browser, can store web content for reuse. For example, if the back button is pressed, the local cached version of a page may be displayed instead of a new request being sent to the web server. A web proxy sitting between the client and the server can evaluate HTTP headers and choose to store web content.

8 |Web Application Development

A content delivery network can retain copies of web content at various points throughout a network.

Cache Control HTTP defines three basic mechanisms for controlling caches: freshness, validation, and invalidation.

Freshness allows a response to be used without re-checking it on the origin server, and can be controlled by both the server and the client. For example, the Expires response header gives a date when the document becomes stale, and the Cache-Control: max-age directive tells the cache how many seconds the response is fresh for. Validation can be used to check whether a cached response is still good after it becomes stale. For example, if the response has a Last-Modified header, a cache can make a conditional request using the If-Modified-Since header to see if it has changed. The E Tag (entity tag) mechanism also allows for both strong and weak validation.

Invalidation is usually a side effect of another request that passes through the cache. For example, if URL associated with a cached response subsequently gets a POST, PUT or DELETE request, the cached response will be invalidated.

Security Identify the users of the application, and restrict access to functions based on some defined criteria. Security module include two important steps: 1. Authentication 2. Authorization Authentication Authentication is the act of confirming the truth of an attribute of a datum or entity. This might involve confirming the identity of a person, tracing the origins of an artifact, ensuring that a product is what its packaging and labeling claims to be, or assuring that a computer program is a trusted one. Common examples of access control involving authentication include:

Asking for photoID when a contractor first arrives at a house to perform work. Using captcha as a means of asserting that a user is a human being and not a computer program. A computer program using a blind credential to authenticate to another program Entering a country with a passport Logging in to a computer

9 |Web Application Development

Using a confirmation E-mail to verify ownership of an e-mail address.

Authorization Authorization is the function of specifying access rights to resources, which is related to information security and computer security in general and to access control in particular. More formally, "to authorize" is to define access policy. For example, human resources staff are normally authorized to access employee records, and this policy is usually formalized as access control rules in a computer system. During operation, the system uses the access control rules to decide whether access requests from (authenticated) consumers shall be approved (granted) or disapproved (rejected). Resources include individual files' or items' data, computer programs, computer devices and functionality provided by computer applications. Access control in computer systems and networks relies on access policies. The access control process can be divided into two phases: 1) Policy definition phase where access is authorized, and 2) Policy enforcement phase where access requests are approved or disapproved. Authorization is thus the function of the policy definition phase which precedes the policy enforcement phase where access requests are approved or disapproved based on the previously defined authorizations. Most modern, multi-user operating systems include access control and thereby rely on authorization. Access control also makes use of authentication to verify the identity of consumers. When a consumer tries to access a resource, the access control process checks that the consumer has been authorized to use that resource. Authorization is the responsibility of an authority, such as a department manager, within the application domain, but is often delegated to a custodian such as a system administrator. Authorizations are expressed as access policies in some type of "policy definition application", e.g. in the form of an access control list or a capability, on the basis of the "principle of least privilege": consumers should only be authorized to access whatever they need to do their jobs. Older and single user operating systems often had weak or non-existent authentication and access control systems. "Anonymous consumers" or "guests", are consumers that have not been required to authenticate. They often have limited authorization. On a distributed system, it is often desirable to grant access without requiring a unique identity. Familiar examples of access tokens include keys and tickets: they grant access without proving identity. Trusted consumers are often authorized for unrestricted access to resources on a system, but must be authenticated so that the access control system can make the access approval decision. "Partially trusted" and guests will often have restricted authorization in order to protect resources against improper access and usage. The access policy in some operating
10 |Web Application Development

systems, by default, grants all consumers full access to all resources. Others do the opposite, insisting that the administrator explicitly authorizes a consumer to use each resource. Even when access is controlled through a combination of authentication and access control lists, the problems of maintaining the authorization data is not trivial, and often represents as much administrative burden as managing authentication credentials. It is often necessary to change or remove a user's authorization: this is done by changing or deleting the corresponding access rules on the system. Using atomic authorization is an alternative to per-system authorization management, where a trusted third party securely distributes authorization information. URL Mapping URL Mapping allows portal administrators to create constant user friendly URLs and map them to portal pages. As administrators create the URLs, they can define human readable names for them. These can be easily remembered and are therefore more user friendly. The self defined URLs can be published externally and thereby made available to portal users. For example, a computer store that uses the portal could create a user defined URL products/hardware/laptops for the page on which their laptop product line is advertised. This URL can be appended to a portal prefix that was defined by the store, for example http://www.fancy_xyz_computers.com/wps/portal/products/hardware/laptops. Clicking on such a mapped URL from outside of the portal takes the user to the desired portal page. Users can also combine several mapped contexts into the representation of a full valid URL, type that full mapped URL into the address field of the browser and thereby get to the portal page. Application Programming Interface An application programming interface (API) is a particular set of rules and specifications that software programs can follow to communicate with each other. It serves as an interface between different software programs and facilitates their interaction, similar to the way the user interface facilitates interaction between humans and computers. An API can be created for applications, libraries, operating systems, etc., as a way of defining their "vocabularies" and resources request conventions (e.g. function-calling conventions). It may include specifications for routines, data structures, object classes, and protocols used to communicate between the consumer program and the implementer program of the API. An API can be:

General, the full set of an API that is bundled in the libraries of a programming language, e.g. Standard Template Library in C++ or Java API.

11 |Web Application Development

Specific, meant to address a specific problem, e.g. Google Maps API or Java API for XML Web Services. Language-dependent, meaning it is only available by using the syntax and elements of a particular language, which makes the API more convenient to use. Language-independent, written so that it can be called from several programming languages. This is a desirable feature for a service-oriented API that is not bound to a specific process or system and may be provided as remote procedure calls or web services. For example, a website that allows users to review local restaurants is able to layer their reviews over maps taken from Google Maps, because Google Maps has an API that facilitates this functionality. Google Maps' API controls what information a third-party site can use and how they can use it.

The term API may be used to refer to a complete interface, a single function, or even a set of APIs provided by an organization. Thus, the scope of meaning is usually determined by the context of usage. Database Transaction Support A transaction comprises a unit of work performed within a database management system (or similar system) against a database, and treated in a coherent and reliable way independent of other transactions. Transactions in a database environment have two main purposes: 1. To provide reliable units of work that allow correct recovery from failures and keep a database consistent even in cases of system failure, when execution stops (completely or partially) and many operations upon a database remain uncompleted, with unclear status. 2. To provide isolation between programs accessing a database concurrently. Without isolation the program's outcomes are possibly erroneous. Transactions provide an "all-or-nothing" proposition, stating that each work-unit performed in a database must either complete in its entirety or have no effect whatsoever. Further, the system must isolate each transaction from other transactions, results must conform to existing constraints in the database, and transactions that complete successfully must get written to durable storage. Purpose for this support Databases and other data stores which treat the integrity of data as paramount often include the ability to handle transactions to maintain the integrity of data. A single transaction consists of one or more independent units of work, each reading and/or writing information to a database or other data store. When this happens it is often important to ensure that all such processing leaves the database or data store in a consistent state.

12 |Web Application Development

A 'transactional database is a DBMS where write transactions on the database are able to be rolled back if they are not completed properly (e.g. due to power or connectivity loss). Most modern relational database management systems fall into the category of databases that support transactions. In a database system a transaction might consist of one or more data-manipulation statements and queries, each reading and/or writing information in the database. Users of database systems consider consistency and integrity of data as highly important. A simple transaction is usually issued to the database system in a language like SQL wrapped in a transaction, using a pattern similar to the following: 1. Begin the transaction 2. Execute a set of data manipulations and/or queries 3. If no errors occur then commit the transaction and end it 4. If errors occur then rollback the transaction and end it If no errors occurred during the execution of the transaction then the system commits the transaction. A transaction commit operation applies all data manipulations within the scope of the transaction and persists the results to the database. If an error occurs during the transaction, or if the user specifies a rollback operation, the data manipulations within the transaction are not persisted to the database. In no case can a partial transaction be committed to the database since that would leave the database in an inconsistent state. Internally, multiuser databases store and process transactions, often by using a transaction ID or XID. SQL is inherently transactional, and a transaction is automatically started when another ends. Some databases extend SQL and implement a START TRANSACTION statement, but while seemingly signifying the start of the transaction it merely deactivates auto commit. The result of any work done after this point will remain invisible to other database-users until the system processes a COMMIT statement. A ROLLBACK statement can also occur, which will undo any work performed since the last transaction. Both COMMIT and ROLLBACK will end the transaction, and start new. If auto commit was disabled using START TRANSACTION, auto commit will often also be reenabled. Some database systems allow the synonyms BEGIN, BEGIN WORK and BEGIN TRANSACTION, and may have other options available. Web Server Web server can refer to either the hardware (the computer) or the software (the computer application) that helps to deliver content that can be accessed through the Internet.
13 |Web Application Development

The primary function of a web server is to deliver web pages on the request to clients. This means delivery of HTML documents and any additional content that may be included by a document, such as images, style sheets and scripts. A client, commonly a web browser or web crawler, initiates communication by making a request for a specific resource using HTTP and the server responds with the content of that resource or an error message if unable to do so. The resource is typically a real file on the server's secondary memory, but this is not necessarily the case and depends on how the web server is implemented. While the primary function is to serve content, a full implementation of HTTP also includes ways of receiving content from clients. This feature is used for submitting web forms, including uploading of files. Many generic web servers also support server-side scripting, e.g., Apache HTTP Server and PHP. This means that the behavior of the web server can be scripted in separate files, while the actual server software remains unchanged. Usually, this function is used to create HTML documents "on-the-fly" as opposed to returning fixed documents. This is referred to as dynamic and static content respectively. The former is primarily used for retrieving and/or modifying information from databases. The latter is, however, typically much faster and more easily cached. Web servers are not always used for serving the World Wide Web. They can also be found embedded in devices such as printers, routers, webcams and serving only a local network. The web server may then be used as a part of a system for monitoring and/or administrating the device in question. This usually means that no additional software has to be installed on the client computer, since only a web browser is required (which now is included with most operating systems). Content Management System A component content management system (CCMS) is a content management system that manages content at a granular level (component) rather than at the document level. Each component represents a single topic, concept or asset (e.g., image, table, product description). Components can be as large as a chapter or as small as a definition or even a word. Components in multiple content assemblies (content types) can be viewed as components or as traditional documents. Each component is only stored one time in the content management system, providing a single, trusted source of content. These components are then reused (rather than copied and pasted)
14 |Web Application Development

within a document or across multiple documents. This ensures that content is consistent across the entire documentation set. Each component has its own lifecycle (owner, version, approval, use) and can be tracked individually or as part of an assembly. Component content management (CCM) is typically used for multi-channel customer-facing content (marketing, usage, learning, support). CCM can be a separate system or be a functionality of another content management system type (e.g., enterprise content management or web content management). Benefits of managing contents at components level: 1. Greater consistency and accuracy. 2. Reduced maintenance costs. 3. Reduced delivery costs. 4. Reduced translation costs.

15 |Web Application Development

You might also like