Professional Documents
Culture Documents
June-July
2011
Summer Training Project Report 2011 Submitted by: S.Ganesan Software Engineering Delhi College of Engineering
CERTIFICATE
ARCHITECTURES From an architecture perspective, there are generally five major types: request-based, component-based, hybrid, meta, and RIA-based. Modelviewcontroller (MVC) Many frameworks follow the modelviewcontroller (MVC) architectural pattern to separate the data model with business rules from user interface. This is generally considered a good
3 |Web Application Development
practice as it modularizes code, promotes code reuse, and allows multiple interfaces to be applied. Push-based vs. Pull-based Most MVC frameworks follow a push-based architecture. These frameworks use actions that do the required processing, and then "push" the data to the view layer to render the results. Ruby on Rails and Spring MVC are good examples of this architecture. An alternative to this is pullbased architecture, sometimes also called "component-based". These frameworks start with the view layer, which can then "pull" results from multiple controllers as needed. In this architecture, multiple controllers can be involved with a single view. Struts2, Lift, Tapestry are examples of pull-based architectures. Content Management Systems A content management system (CMS) is the collection of procedures used to manage work flow in a collaborative environment. These procedures can be manual or computer-based. The procedures are designed to do the following:
Allow for a large number of people to contribute to and share stored data Control access to data, based on user roles (defining which information users or user groups can view, edit, publish, etc.) Aid in easy storage and retrieval of data Reduce repetitive duplicate input Improve the ease of report writing Improve communication between users
In a CMS, data can be defined as nearly anything: documents, movies, pictures, phone numbers, scientific data, and so forth. CMSs are frequently used for storing, controlling, semantically enriching, and publishing documentation. Serving as a central repository, the CMS increases the version level of new updates to an already existing file. Version control is one of the primary advantages of a CMS.
"/page/science" tends to exist by default as that is simply a shorter form of the longer traversal to "/page/science/physics". Ajax Ajax, shorthand for "Asynchronous JavaScript and XML", is a web development technique for creating interactive web applications. The intent is to make web pages feel more responsive by exchanging small amounts of data with the server behind the scenes, so that the entire web page does not have to be reloaded each time the user requests a change. This is intended to increase the web page's interactivity, speed, and usability. Automatic configuration Some frameworks minimize web application configuration through the use of introspection and/or following known conventions. For example, many Java frameworks use Hibernate as a persistence layer, which can generate a database schema at runtime capable of persisting the necessary information. This allows the application designer to design business objects without needing to explicitly define a database schema. Frameworks such as Ruby on Rails can also work in reverse, that is, define properties of model objects at runtime based on a database schema.
Directory listing Control how the server will react when no specific web page is specified. Customized error responses Changing the page that is shown when a server-side error occurs, for example HTTP 404 Not Found. MIME types Instruct the server how to treat different varying file types. Cache Control .htaccess files allow a server to control caching by web browsers and proxies to reduce bandwidth usage, server load, and perceived lag.
Web Cache A web cache is a mechanism for the temporary storage (caching) of web documents, such as HTML pages and images, to reduce bandwidth usage, server load, and perceived lag. A web cache stores copies of documents passing through it; subsequent requests may be satisfied from the cache if certain conditions are met. It should not be confused with a web archive, a site that keeps old versions of web pages. Web caches various systems.
A search engine may cache a website. A forward cache is a cache outside the web servers network, e.g. on the client software's ISP or company network. A network-aware forward cache is just like a forward cache but only caches heavily accessed items. A reverse cache sits in front of one or more Web servers and web applications, accelerating requests from the Internet. A client, such as a web browser, can store web content for reuse. For example, if the back button is pressed, the local cached version of a page may be displayed instead of a new request being sent to the web server. A web proxy sitting between the client and the server can evaluate HTTP headers and choose to store web content.
A content delivery network can retain copies of web content at various points throughout a network.
Cache Control HTTP defines three basic mechanisms for controlling caches: freshness, validation, and invalidation.
Freshness allows a response to be used without re-checking it on the origin server, and can be controlled by both the server and the client. For example, the Expires response header gives a date when the document becomes stale, and the Cache-Control: max-age directive tells the cache how many seconds the response is fresh for. Validation can be used to check whether a cached response is still good after it becomes stale. For example, if the response has a Last-Modified header, a cache can make a conditional request using the If-Modified-Since header to see if it has changed. The E Tag (entity tag) mechanism also allows for both strong and weak validation.
Invalidation is usually a side effect of another request that passes through the cache. For example, if URL associated with a cached response subsequently gets a POST, PUT or DELETE request, the cached response will be invalidated.
Security Identify the users of the application, and restrict access to functions based on some defined criteria. Security module include two important steps: 1. Authentication 2. Authorization Authentication Authentication is the act of confirming the truth of an attribute of a datum or entity. This might involve confirming the identity of a person, tracing the origins of an artifact, ensuring that a product is what its packaging and labeling claims to be, or assuring that a computer program is a trusted one. Common examples of access control involving authentication include:
Asking for photoID when a contractor first arrives at a house to perform work. Using captcha as a means of asserting that a user is a human being and not a computer program. A computer program using a blind credential to authenticate to another program Entering a country with a passport Logging in to a computer
Authorization Authorization is the function of specifying access rights to resources, which is related to information security and computer security in general and to access control in particular. More formally, "to authorize" is to define access policy. For example, human resources staff are normally authorized to access employee records, and this policy is usually formalized as access control rules in a computer system. During operation, the system uses the access control rules to decide whether access requests from (authenticated) consumers shall be approved (granted) or disapproved (rejected). Resources include individual files' or items' data, computer programs, computer devices and functionality provided by computer applications. Access control in computer systems and networks relies on access policies. The access control process can be divided into two phases: 1) Policy definition phase where access is authorized, and 2) Policy enforcement phase where access requests are approved or disapproved. Authorization is thus the function of the policy definition phase which precedes the policy enforcement phase where access requests are approved or disapproved based on the previously defined authorizations. Most modern, multi-user operating systems include access control and thereby rely on authorization. Access control also makes use of authentication to verify the identity of consumers. When a consumer tries to access a resource, the access control process checks that the consumer has been authorized to use that resource. Authorization is the responsibility of an authority, such as a department manager, within the application domain, but is often delegated to a custodian such as a system administrator. Authorizations are expressed as access policies in some type of "policy definition application", e.g. in the form of an access control list or a capability, on the basis of the "principle of least privilege": consumers should only be authorized to access whatever they need to do their jobs. Older and single user operating systems often had weak or non-existent authentication and access control systems. "Anonymous consumers" or "guests", are consumers that have not been required to authenticate. They often have limited authorization. On a distributed system, it is often desirable to grant access without requiring a unique identity. Familiar examples of access tokens include keys and tickets: they grant access without proving identity. Trusted consumers are often authorized for unrestricted access to resources on a system, but must be authenticated so that the access control system can make the access approval decision. "Partially trusted" and guests will often have restricted authorization in order to protect resources against improper access and usage. The access policy in some operating
10 |Web Application Development
systems, by default, grants all consumers full access to all resources. Others do the opposite, insisting that the administrator explicitly authorizes a consumer to use each resource. Even when access is controlled through a combination of authentication and access control lists, the problems of maintaining the authorization data is not trivial, and often represents as much administrative burden as managing authentication credentials. It is often necessary to change or remove a user's authorization: this is done by changing or deleting the corresponding access rules on the system. Using atomic authorization is an alternative to per-system authorization management, where a trusted third party securely distributes authorization information. URL Mapping URL Mapping allows portal administrators to create constant user friendly URLs and map them to portal pages. As administrators create the URLs, they can define human readable names for them. These can be easily remembered and are therefore more user friendly. The self defined URLs can be published externally and thereby made available to portal users. For example, a computer store that uses the portal could create a user defined URL products/hardware/laptops for the page on which their laptop product line is advertised. This URL can be appended to a portal prefix that was defined by the store, for example http://www.fancy_xyz_computers.com/wps/portal/products/hardware/laptops. Clicking on such a mapped URL from outside of the portal takes the user to the desired portal page. Users can also combine several mapped contexts into the representation of a full valid URL, type that full mapped URL into the address field of the browser and thereby get to the portal page. Application Programming Interface An application programming interface (API) is a particular set of rules and specifications that software programs can follow to communicate with each other. It serves as an interface between different software programs and facilitates their interaction, similar to the way the user interface facilitates interaction between humans and computers. An API can be created for applications, libraries, operating systems, etc., as a way of defining their "vocabularies" and resources request conventions (e.g. function-calling conventions). It may include specifications for routines, data structures, object classes, and protocols used to communicate between the consumer program and the implementer program of the API. An API can be:
General, the full set of an API that is bundled in the libraries of a programming language, e.g. Standard Template Library in C++ or Java API.
Specific, meant to address a specific problem, e.g. Google Maps API or Java API for XML Web Services. Language-dependent, meaning it is only available by using the syntax and elements of a particular language, which makes the API more convenient to use. Language-independent, written so that it can be called from several programming languages. This is a desirable feature for a service-oriented API that is not bound to a specific process or system and may be provided as remote procedure calls or web services. For example, a website that allows users to review local restaurants is able to layer their reviews over maps taken from Google Maps, because Google Maps has an API that facilitates this functionality. Google Maps' API controls what information a third-party site can use and how they can use it.
The term API may be used to refer to a complete interface, a single function, or even a set of APIs provided by an organization. Thus, the scope of meaning is usually determined by the context of usage. Database Transaction Support A transaction comprises a unit of work performed within a database management system (or similar system) against a database, and treated in a coherent and reliable way independent of other transactions. Transactions in a database environment have two main purposes: 1. To provide reliable units of work that allow correct recovery from failures and keep a database consistent even in cases of system failure, when execution stops (completely or partially) and many operations upon a database remain uncompleted, with unclear status. 2. To provide isolation between programs accessing a database concurrently. Without isolation the program's outcomes are possibly erroneous. Transactions provide an "all-or-nothing" proposition, stating that each work-unit performed in a database must either complete in its entirety or have no effect whatsoever. Further, the system must isolate each transaction from other transactions, results must conform to existing constraints in the database, and transactions that complete successfully must get written to durable storage. Purpose for this support Databases and other data stores which treat the integrity of data as paramount often include the ability to handle transactions to maintain the integrity of data. A single transaction consists of one or more independent units of work, each reading and/or writing information to a database or other data store. When this happens it is often important to ensure that all such processing leaves the database or data store in a consistent state.
A 'transactional database is a DBMS where write transactions on the database are able to be rolled back if they are not completed properly (e.g. due to power or connectivity loss). Most modern relational database management systems fall into the category of databases that support transactions. In a database system a transaction might consist of one or more data-manipulation statements and queries, each reading and/or writing information in the database. Users of database systems consider consistency and integrity of data as highly important. A simple transaction is usually issued to the database system in a language like SQL wrapped in a transaction, using a pattern similar to the following: 1. Begin the transaction 2. Execute a set of data manipulations and/or queries 3. If no errors occur then commit the transaction and end it 4. If errors occur then rollback the transaction and end it If no errors occurred during the execution of the transaction then the system commits the transaction. A transaction commit operation applies all data manipulations within the scope of the transaction and persists the results to the database. If an error occurs during the transaction, or if the user specifies a rollback operation, the data manipulations within the transaction are not persisted to the database. In no case can a partial transaction be committed to the database since that would leave the database in an inconsistent state. Internally, multiuser databases store and process transactions, often by using a transaction ID or XID. SQL is inherently transactional, and a transaction is automatically started when another ends. Some databases extend SQL and implement a START TRANSACTION statement, but while seemingly signifying the start of the transaction it merely deactivates auto commit. The result of any work done after this point will remain invisible to other database-users until the system processes a COMMIT statement. A ROLLBACK statement can also occur, which will undo any work performed since the last transaction. Both COMMIT and ROLLBACK will end the transaction, and start new. If auto commit was disabled using START TRANSACTION, auto commit will often also be reenabled. Some database systems allow the synonyms BEGIN, BEGIN WORK and BEGIN TRANSACTION, and may have other options available. Web Server Web server can refer to either the hardware (the computer) or the software (the computer application) that helps to deliver content that can be accessed through the Internet.
13 |Web Application Development
The primary function of a web server is to deliver web pages on the request to clients. This means delivery of HTML documents and any additional content that may be included by a document, such as images, style sheets and scripts. A client, commonly a web browser or web crawler, initiates communication by making a request for a specific resource using HTTP and the server responds with the content of that resource or an error message if unable to do so. The resource is typically a real file on the server's secondary memory, but this is not necessarily the case and depends on how the web server is implemented. While the primary function is to serve content, a full implementation of HTTP also includes ways of receiving content from clients. This feature is used for submitting web forms, including uploading of files. Many generic web servers also support server-side scripting, e.g., Apache HTTP Server and PHP. This means that the behavior of the web server can be scripted in separate files, while the actual server software remains unchanged. Usually, this function is used to create HTML documents "on-the-fly" as opposed to returning fixed documents. This is referred to as dynamic and static content respectively. The former is primarily used for retrieving and/or modifying information from databases. The latter is, however, typically much faster and more easily cached. Web servers are not always used for serving the World Wide Web. They can also be found embedded in devices such as printers, routers, webcams and serving only a local network. The web server may then be used as a part of a system for monitoring and/or administrating the device in question. This usually means that no additional software has to be installed on the client computer, since only a web browser is required (which now is included with most operating systems). Content Management System A component content management system (CCMS) is a content management system that manages content at a granular level (component) rather than at the document level. Each component represents a single topic, concept or asset (e.g., image, table, product description). Components can be as large as a chapter or as small as a definition or even a word. Components in multiple content assemblies (content types) can be viewed as components or as traditional documents. Each component is only stored one time in the content management system, providing a single, trusted source of content. These components are then reused (rather than copied and pasted)
14 |Web Application Development
within a document or across multiple documents. This ensures that content is consistent across the entire documentation set. Each component has its own lifecycle (owner, version, approval, use) and can be tracked individually or as part of an assembly. Component content management (CCM) is typically used for multi-channel customer-facing content (marketing, usage, learning, support). CCM can be a separate system or be a functionality of another content management system type (e.g., enterprise content management or web content management). Benefits of managing contents at components level: 1. Greater consistency and accuracy. 2. Reduced maintenance costs. 3. Reduced delivery costs. 4. Reduced translation costs.