Saturday, 29 April 2017

Apache administration for beginners

WebServer 

 A web server is an information technology that processes requests via HTTP, the basic network protocol used to distribute information on the World Wide Web (WWW). When the Web server receives an HTTP request, it responds with an HTTP response, such as sending back an HTML page.
Apache is one of the web server. The list of web servers available in the market are Apache web server, Microsoft IIS, Nginx web server, Lighttpd, Jigsaw, klone, Abyss web server, Oracle Http server, X5 webserver, Zeus webserver, IBM Http server, Google web server, Oracle iPlanet web server, Redhat web server etc.

Apache

The Apache HTTP Server is the world's most used web server software. Originally based on the NCSA HTTPd server, development of Apache began in early 1995 after work on the NCSA code stalled. Apache played a key role in the initial growth of the World Wide Web quickly overtaking NCSA HTTPd as the dominant HTTP server, and has remained most popular since April 1996. In 2009, it became the first web server software to serve more than 100 million websites.

Apache is developed and maintained by an open community of developers under the auspices of the Apache Software Foundation. Most commonly used on a Unix-like system (usually Linux), the software is available for a wide variety of operatingsystems besides Unix, including eComStation, Microsoft Windows, NetWare, OpenVMS, OS/2, and TPF. Released under the Apache License, Apache is free and open-source software.


Apache Version
Initial release
Latest release
1.3
1998-06-06
2010-02-03 (1.3.42)
2.0
2002-04-06
2013-07-10 (2.0.65)
2.2
2005-12-01
2015-07-17 (2.2.31)
2.4
2012-02-21
2015-12-14 (2.4.18)

Client (Web browser): A client connects to a server (Apache HTTP Server), with the specified protocol (http), and makes a request for a resource using the URL-path 

Server ( Apache http server): The server will send a response consisting of a status code and, optionally, a response body. The status code indicates whether the request was successful, and, if not, what kind of error condition there was. This tells the client what it should do with the response

In order to connect to a server, the client will first have to resolve the server name to an IP address - the location on the Internet where the server resides. Thus, in order for your web server to be reachable, it is necessary that the servername be in DNS.

If you don't know how to do this, you'll need to contact your network administrator, or Internet service provider, to perform this step for you.

More than one hostname may point to the same IP address, and more than one IP address can be attached to the same physical server. Thus, you can run more than one web site on the same physical server, using a feature called virtual hosts.

If you are testing a server that is not Internet-accessible, you can put host names in your hosts file in order to do local resolution. For example, you might want to put a record in your hosts file to map a request for www.example.com to your local system, for testing purposes.

 This entry would look like:
127.0.0.1 www.example.com
A hosts file will probably be located at /etc/hosts or C:\Windows\system32\drivers\etc\hosts.

Features of Apache.
       1. Modules
       2. Aliases
       3 Virtual Hosting

Modules: One of the Apache's key features is its modular construction. After installation you can add extra functionality to it quickly and easily by loading modules without having to re-compile the source code for Example: you can load mod-dir for basic dir handling (or) mod-auth to authenticate users with text file.

The modular approach in Apache makes it is easy for third party developers to add functionality to a server. You can also customize the Apache for a site by developing own modules using Apache module API.

Alias: Apache supports the use of aliases, which enable it to serve content from file system locations other than those directly underneath the specified document sort.
As a result an Apache server can reference any content on a computer (or) even on other computers without having to move (or) duplicate the information.

Virtual Hosting:Virtual hosting is a useful feature of Apache that enables the simultaneous hosting of multiple websites on a single computer.

Virtual hosting has many practical applications. For Example: an ISP commonly configures websites for different companies as virtual hosts this enables the separation of the use of separate computers. 

From 1.1 on wards Apache can support both IP-based and name-based virtual hosting.

The IP-based system establishes which virtual host Apache should serve by using the connection’s IP address. So it requires each virtual host domain to have a dedicated IP address.

By using the host names to identify virtual hosts name based virtual hosting enables the use of one IP address for multiple virtual hosts.
                
Httpd.conf file is the main configuration file in Apache

The Apache Directory Structure: 
The Apache software is typically distributed into the following subdirectories:

cgi-bin
This is where many, if not all, of the interactive programs that you write will reside. These will be programs written with Perl, Java, or other programming languages.
Conf
This directory will contain your configuration files.
htdocs
This directory will contain your actual hypertext documents. This directory will typically have many subdirectories. This directory is known as the DocumentRoot.
Icons
This directory contains the icons (small images) that Apache will use when displaying information or error messages.
images
This directory will contain the image files (GIF or JPG) that you will use on your web site.
Logs
This directory will contain your log files - the access_log and error_log files.
Sbin
Use nogroup

Main Configuration file in apache:
1.The Apache software is configured by changing settings in several text files in the Apache conf (configuration) directory.
2.There are four configuration files used by Apache. The main configuration file is usually called httpd.conf.
access.conf
This is The security configuration file. It Contains instructions about which users should be able to access.  And  what information.
httpd.conf
This is The server configuration file. It Typically contains directives that affect how the server runs, such as user and group ID's it should use when running, the location of other files, etc.
srm.conf
This is The resource configuration file. It Contains directives that define where documents are found, how to change addresses to filenames, etc.
mime.types
A configuration file that relates filename extensions to file types.



Httpd.conf file Sections

 Httpd.conf file has 3 main sections.

Section 1: Global environment

The directives in this section affect the overall operation such as the number of concurrent requests it can handle (or) where it can find its configuration data.
            Access config /dev/null
            Resource config /dev/null
Server Type Standalone
Specify whether the apache server should seen under the inetd daemon (or) as a standalone server.
Server Root “/etc/httpd”
Don’t give a slash at the end configuration and log files are stored as subdirectories of this root directory.
 Pid File run/httpd.pid
            The file in which the server should record its process identification number when it starts.
Start servers 5
            The number of processes that are run at start-up.      
Timeout 300: Sets the period of which apache waits during certain operations before sending (or) receiving a timeout signal.
Keep Alive off: Whether (or) not to allow persistent connections (more one request per connection). Set to “off” to deactivate.

Section 2: Main Server Configuration
            This section contains the directives for the main server. The values of these directives are also used as the default values for virtual hosts unless the virtual host section of the file specifies different values.
Port number: 80
User and Group: apache
Server Admin: root@localhost
Server Name: www.easynomad:80
Document Root: “/var/www/html”
Directory Index:  http://www.easynomad.com/offers/
Error Log:
Alias:
Script Alias: /cgi-bin/ “/var/www/html” “/var/www/easynomad/cgi-bin”

Section 3: Virtual Hosts
          This enables you to setup virtual host containers to enable multiple-servers capability
< Virtual host >
   Server Admin:
   Document Root:
   Server Name:
   Error Log:
   Custom Log:
< Virtual host >


Apache Directives 

General Configuration Tips

If configuring the Apache HTTP Server, edit /etc/httpd/conf/httpd.conf and then either reload, restart, or stop and start the httpd.

Before editing httpd.conf, make a copy the original file. Creating a backup makes it easier to recover from mistakes made while editing the configuration file.

If a mistake is made and the Web server does not work correctly, first review recently edited passages in httpd.conf to verify there are no typos.

Next look in the Web server's error log, /var/log/httpd/error_log. The error log may not be easy to interpret, depending on your level of expertise. However, the last entries in the error log should provide useful information.

The following subsections contain a list of short descriptions for many of the directives included in httpd.conf.

ServerRoot
The ServerRoot directive specifies the top-level directory containing website content. By default, ServerRoot is set to "/etc/httpd"for both secure and non-secure servers.
PidFile
PidFile names the file where the server records its process ID (PID).
Timeout
Timeout defines, in seconds, the amount of time that the server waits for receipts and transmissions during communications. Timeoutis set to 300 seconds by default, which is appropriate for most situations.
KeepAlive
KeepAlive sets whether the server allows more than one request per connection and can be used to prevent any one client from consuming too much of the server's resources.
By default Keepalive is set to off. If Keepalive is set to on and the server becomes very busy, the server can quickly spawn the maximum number of child processes. In this situation, the server slows down significantly. If Keepalive is enabled, it is a good idea to set the the KeepAliveTimeout low and monitor the /var/log/httpd/error_log log file on the server. This log reports when the server is running out of child processes.
MaxKeepAliveRequests
This directive sets the maximum number of requests allowed per persistent connection. The Apache Project recommends a high setting, which improves the server's performance. MaxKeepAliveRequests is set to 100 by default, which should be appropriate for most situations.
KeepAliveTimeout
KeepAliveTimeout sets the number of seconds the server waits after a request has been served before it closes the connection. Once the server receives a request, the Timeout directive applies instead. KeepAliveTimeout is set to 15 seconds by default.
IfModule
<IfModule> and </IfModule> tags create a conditional container which are only activated if the specified module is loaded. Directives within the IfModule container are processed under one of two conditions. The directives are processed if the module contained within the starting <IfModule> tag is loaded. Or, if an exclamation point [!] appears before the module name, the directives are processed only if the module specified in the <IfModule> tag is not loaded.
MPM Specific Server-Pool Directives
Apache HTTP Server 2.0 the responsibility for managing characteristics of the server-pool falls to a module group called MPMs. The characteristics of the server-pool differ depending upon which MPM is used. For this reason, an IfModule container is necessary to define the server-pool for the MPM in use.
By default, Apache HTTP Server 2.0 defines the server-pool for both the prefork and worker MPMs.
The following a list of directives found within the MPM-specific server-pool containers.
StartServers
StartServers sets how many server processes are created upon startup. Since the Web server dynamically kills and creates server processes based on traffic load, it is not necessary to change this parameter. The Web server is set to start 8 server processes at startup for the prefork MPM and 2 for the worker MPM.
MaxRequestsPerChild
MaxRequestsPerChild sets the total number of requests each child server process serves before the child dies. The main reason for setting MaxRequestsPerChild is to avoid long-lived process induced memory leaks. The default MaxRequestsPerChild for theprefork MPM is 1000 and for the worker MPM is 0.
MaxClients
MaxClients sets a limit on the total number of server processes, or simultaneously connected clients that can run at one time. The main purpose of this directive is to keep a runaway Apache HTTP Server from crashing the operating system. For busy servers this value should be set to a high value. The server's default is set to 150 regardless of the MPM in use. However, it is not recommended that the value for MaxClients exceeds 256 when using the prefork MPM.
MinSpareServers and MaxSpareServers
These values are only used with the prefork MPM. They adjust how the Apache HTTP Server dynamically adapts to the perceived load by maintaining an appropriate number of spare server processes based on the number of incoming requests. The server checks the number of servers waiting for a request and kills some if there are more than MaxSpareServers or creates some if the number of servers is less than MinSpareServers.
The default MinSpareServers value is 5; the default MaxSpareServers value is 20. These default settings should be appropriate for most situations. Be careful not to increase the MinSpareServers to a large number as doing so creates a heavy processing load on the server even when traffic is light.
MinSpareThreads and MaxSpareThreads
These values are only used with the worker MPM. They adjust how the Apache HTTP Server dynamically adapts to the perceived load by maintaining an appropriate number of spare server threads based on the number of incoming requests. The server checks the number of server threads waiting for a request and kills some if there are more than MaxSpareThreads or creates some if the number of servers is less than MinSpareThreads.
The default MinSpareThreads value is 25; the default MaxSpareThreads value is 75. These default settings should be appropriate for most situations. The value for MaxSpareThreads is must be greater than or equal to the sum of MinSpareThreads andThreadsPerChild or Apache HTTP Server automatically corrects it.
ThreadsPerChild
This value is only used with the worker MPM. It sets the number of threads within each child process. The default value for this directive is 25.
Listen
The Listen command identifies the ports on which the Web server accepts incoming requests. By default, the Apache HTTP Server is set to listen to port 80 for non-secure Web communications and (in the /etc/httpd/extra/conf/ssl.conf file which defines any secure servers) to port 443 for secure Web communications.
If the Apache HTTP Server is configured to listen to a port under 1024, only the root user can start it. For port 1024 and above, httpdcan be started as a regular user.
The Listen directive can also be used to specify particular IP addresses over which the server accepts connections.
Include
Include allows other configuration files to be included at runtime.
The path to these configuration files can be absolute or relative to the ServerRoot.
LoadModule
LoadModule is used to load in Dynamic Shared Object (DSO) modules.
ExtendedStatus
The ExtendedStatus directive controls whether Apache generates basic (off) or detailed server status information (on), when theserver-status handler is called. The Server-status handler is called using Location tags.
IfDefine
The IfDefine tags surround configuration directives that are applied if the "test" stated in the IfDefine tag is true. The directives are ignored if the test is false.
The test in the IfDefine tags is a parameter name (for example, HAVE_PERL). If the parameter is defined, meaning that it is provided as an argument to the server's start-up command, then the test is true. In this case, when the Web server is started, the test is true and the directives contained in the IfDefine tags are applied.
User
The User directive sets the user name of the server process and determines what files the server is allowed to access. Any files inaccessible to this user are also inaccessible to clients connecting to the Apache HTTP Server.
By default User is set to apache.
Group
Specifies the group name of the Apache HTTP Server processes.
By default Group is set to apache.
ServerAdmin
Sets the ServerAdmin directive to the email address of the Web server administrator. This email address shows up in error messages on server-generated Web pages, so users can report a problem by sending email to the server administrator.
By default, ServerAdmin is set to root@localhost.
A common way to set up ServerAdmin is to set it to webmaster@example.com. Then alias webmaster to the person responsible for the Web server in /etc/aliases and run /usr/bin/newaliases.
ServerName
ServerName specifies a hostname and port number (matching the Listen directive) for the server. The ServerName does not need to match the machine's actual hostname. For example, the Web server may be www.example.com, but the server's hostname is actuallyfoo.example.com. The value specified in ServerName must be a valid Domain Name Service (DNS) name that can be resolved by the system  do not make something up.
The following is a sample ServerName directive:
ServerName www.example.com:80
When specifying a ServerName, be sure the IP address and server name pair are included in the /etc/hosts file.
UseCanonicalName
When set to on, this directive configures the Apache HTTP Server to reference itself using the value specified in the ServerName and Port directives. When UseCanonicalName is set to off, the server instead uses the value used by the requesting client when referring to itself.
UseCanonicalName is set to off by default.
DocumentRoot
The DocumentRoot is the directory which contains most of the HTML files which are served in response to requests. The default DocumentRoot for both the non-secure and secure Web servers is the /var/www/html directory. For example, the server might receive a request for the following document:
http://example.com/foo.html

The server looks for the following file in the default directory:
/var/www/html/foo.html

Directory
<Directory /path/to/directory> and </Directory> tags create a container used to enclose a group of configuration directives which apply only to a specific directory and its subdirectories. Any directive which is applicable to a directory may be used within Directory tags.
By default, very restrictive parameters are applied to the root directory (/), using the Options and AllowOverride directives. Under this configuration, any directory on the system which needs more permissive settings has to be explicitly given those settings.
In the default configuration, another Directory container is configured for the DocumentRoot which assigns less rigid parameters to the directory tree so that the Apache HTTP Server can access the files residing there.
The Directory container can be also be used to configure additional cgi-bin directories for server-side applications outside of the directory specified in the ScriptAlias directive
To accomplish this, the Directory container must set the ExecCGI option for that directory.
For example, if CGI scripts are located in /home/my_cgi_directory, add the following Directory container to the httpd.conf file:

<Directory /home/my_cgi_directory>
    Options +ExecCGI
</Directory>

Next, the AddHandler directive must be uncommented to identify files with the .cgi extension as CGI scripts.
For this to work, permissions for CGI scripts, and the entire path to the scripts, must be set to 0755.
Options
The Options directive controls which server features are available in a particular directory. For example, under the restrictive parameters specified for the root directory, Options is set to only FollowSymLinks. No features are enabled, except that the server is allowed to follow symbolic links in the root directory.
By default, in the DocumentRoot directory, Options is set to include Indexes and FollowSymLinks. Indexes permits the server to generate a directory listing for a directory if no DirectoryIndex (for example, index.html) is specified. FollowSymLinks allows the server to follow symbolic links in that directory.
AllowOverride
The AllowOverride directive sets whether any Options can be overridden by the declarations in an .htaccess file. By default, both the root directory and the DocumentRoot are set to allow no .htaccess overrides.
Order
The Order directive controls the order in which allow and deny directives are evaluated. The server is configured to evaluate theAllow directives before the Deny directives for the DocumentRoot directory.
Allow
Allow specifies which client can access a given directory. The client can be all, a domain name, an IP address, a partial IP address, a network/netmask pair, and so on. The DocumentRoot directory is configured to Allow requests from all, meaning everyone has access.
Deny
Deny works similar to Allow, except it specifies who is denied access. The DocumentRoot is not configured to Deny requests from anyone by default.
UserDir
UserDir is the subdirectory within each user's home directory where they should place personal HTML files which are served by the Web server. This directive is set to disable by default.
The name for the subdirectory is set to public_html in the default configuration. For example, the server might receive the following request:
http://example.com/~username/foo.html
The server would look for the file:
/home/username/public_html/foo.html
In the above example, /home/username/ is the user's home directory (note that the default path to users' home directories may vary).
Make sure that the permissions on the users' home directories are set correctly. Users' home directories must be set to 0711. The read (r) and execute (x) bits must be set on the users' public_html directories (0755 also works). Files that are served in a users'public_html directories must be set to at least 0644.
DirectoryIndex
The DirectoryIndex is the default page served by the server when a user requests an index of a directory by specifying a forward slash (/) at the end of the directory name.
When a user requests the page http://example/this_directory/, they get either the DirectoryIndex page if it exists or a server-generated directory list. The default for DirectoryIndex is index.html and the index.html.var type map. The server tries to find either of these files and returns the first one it finds. If it does not find one of these files and Options Indexes is set for that directory, the server generates and returns a listing, in HTML format, of the subdirectories and files within the directory, unless the directory listing feature is turned off.
AccessFileName
AccessFileName names the file which the server should use for access control information in each directory. The default is.htaccess.
Immediately after the AccessFileName directive, a set of Files tags apply access control to any file beginning with a .ht. These directives deny Web access to any .htaccess files (or other files which begin with .ht) for security reasons.
CacheNegotiatedDocs
By default, the Web server asks proxy servers not to cache any documents which were negotiated on the basis of content (that is, they may change over time or because of the input from the requester). If CacheNegotiatedDocs is set to on, this function is disabled and proxy servers are allowed to such cache documents.
TypesConfig
TypesConfig names the file which sets the default list of MIME type mappings (file name extensions to content types). The defaultTypesConfig file is /etc/mime.types. Instead of editing /etc/mime.types, the recommended way to add MIME type mappings is to use the AddType directive.
DefaultType
DefaultType sets a default content type for the Web server to use for documents whose MIME types cannot be determined. The default is text/plain.
HostnameLookups
HostnameLookups can be set to on, off or double. If HostnameLookups is set to on, the server automatically resolves the IP address for each connection. Resolving the IP address means that the server makes one or more connections to a DNS server, adding processing overhead. If HostnameLookups is set to double, the server performs a double-reverse DNS look up adding even more processing overhead.
To conserve resources on the server, HostnameLookups is set to off by default.
If hostnames are required in server log files, consider running one of the many log analyzer tools that perform the DNS lookups more efficiently and in bulk when rotating the Web server log files.
ErrorLog
ErrorLog specifies the file where server errors are logged. By default, this directive is set to /var/log/httpd/error_log.
LogLevel
LogLevel sets how verbose the error messages in the error logs are. LogLevel can be set (from least verbose to most verbose) toemerg, alert, crit, error, warn, notice, info or debug. The default LogLevel is warn.
LogFormat
The LogFormat directive configures the format of the various Web server log files. The actual LogFormat used depends on the settings given in the CustomLog directive
The following are the format options if the CustomLog directive is set to combined:
%h (remote host's IP address or hostname)
Lists the remote IP address of the requesting client. If HostnameLookups is set to on, the client hostname is recorded unless it is not available from DNS.
%l (rfc931)
Not used. A hyphen [-] appears in the log file for this field.
%u (authenticated user)
If authentication was required, lists the user name of the user is recorded. Usually, this is not used, so a hyphen [-] appears in the log file for this field.
%t (date)
Lists the date and time of the request.
%r (request string)
Lists the request string exactly as it came from the browser or client.
%s (status)
Lists the HTTP status code which was returned to the client host.
%b (bytes)
Lists the size of the document.
%\"%{Referer}i\" (referrer)
Lists the URL of the webpage which referred the client host to Web server.
%\"%{User-Agent}i\" (user-agent)
Lists the type of Web browser making the request.
CustomLog
CustomLog identifies the log file and the log file format. By default, the log is recorded to the /var/log/httpd/access_log file.
The default CustomLog format is combined. The following illustrates the combined log file format:  remotehost rfc931 user date "request" status bytes referrer user-agent
ServerSignature
The ServerSignature directive adds a line containing the Apache HTTP Server server version and the ServerName to any server-generated documents, such as error messages sent back to clients. ServerSignature is set to on by default.
It can also be set to off or to EMail. EMail, adds a mailto:ServerAdmin HTML tag to the signature line of auto-generated responses.
Alias
The Alias setting allows directories outside the DocumentRoot directory to be accessible. Any URL ending in the alias automatically resolves to the alias' path. By default, one alias for an icons/ directory is already set up. An icons/ directory can be accessed by the Web server, but the directory is not in the DocumentRoot.
ScriptAlias
The ScriptAlias directive defines where CGI scripts are located. Generally, it is not good practice to leave CGI scripts within theDocumentRoot, where they can potentially be viewed as text documents. For this reason, a special directory outside of theDocumentRoot directory containing server-side executables and scripts is designated by the ScriptAlias directive. This directory is known as a cgi-bin and set to /var/www/cgi-bin/ by default.
It is possible to establish directories for storing executables outside of the cgi-bin directory
Redirect
When a webpage is moved, Redirect can be used to map the file location to a new URL. The format is as follows:

Redirect /<old-path>/<file-name> http://<current-domain>/<current-path>/<file-name>

In this example, replace <old-path> with the old path information for <file-name> and <current-domain> and <current-path> with the current domain and path information for <file-name>.
In this example, any requests for <file-name> at the old location is automatically redirected to the new location.
IndexOptions
IndexOptions controls the appearance of server generated directing listings, by adding icons, file descriptions, and so on. If Options Indexes is set, the Web server generates a directory listing when the Web server receives an HTTP request for a directory without an index.
First, the Web server looks in the requested directory for a file matching the names listed in the DirectoryIndex directive (usually,index.html). If an index.html file is not found, Apache HTTP Server creates an HTML directory listing of the requested directory. The appearance of this directory listing is controlled, in part, by the IndexOptions directive.
The default configuration turns on FancyIndexing. This means that a user can re-sort a directory listing by clicking on column headers. Another click on the same header switches from ascending to descending order. FancyIndexing also shows different icons for different files, based upon file extensions.
The AddDescription option, when used in conjunction with FancyIndexing, presents a short description for the file in server generated directory listings.
IndexOptions has a number of other parameters which can be set to control the appearance of server generated directories. Parameters include IconHeight and IconWidth, to make the server include HTML HEIGHT and WIDTH tags for the icons in server generated webpages; IconsAreLinks, for making the icons act as part of the HTML link anchor along with the filename and others.
AddIconByEncoding
This directive names icons which are displayed by files with MIME encoding in server generated directory listings. For example, by default, the Web server shows the compressed.gif icon next to MIME encoded x-compress and x-gzip files in server generated directory listings.
AddIconByType
This directive names icons which are displayed next to files with MIME types in server generated directory listings. For example, the server shows the icon text.gif next to files with a mime-type of text, in server generated directory listings.
AddIcon
AddIcon specifies which icon to show in server generated directory listings for files with certain extensions. For example, the Web server is set to show the icon binary.gif for files with .bin or .exe extensions.
DefaultIcon
DefaultIcon specifies the icon displayed in server generated directory listings for files which have no other icon specified. Theunknown.gif image file is the default.
AddDescription
When using FancyIndexing as an IndexOptions parameter, the AddDescription directive can be used to display user-specified descriptions for certain files or file types in a server generated directory listing. The AddDescription directive supports listing specific files, wildcard expressions, or file extensions.
ReadmeName
ReadmeName names the file which, if it exists in the directory, is appended to the end of server generated directory listings. The Web server first tries to include the file as an HTML document and then try to include it as plain text. By default, ReadmeName is set toREADME.html.
HeaderName
HeaderName names the file which, if it exists in the directory, is prepended to the start of server generated directory listings. LikeReadmeName, the server tries to include it as an HTML document if possible or in plain text if not.
IndexIgnore
IndexIgnore lists file extensions, partial file names, wildcard expressions or full filenames. The Web server does not include any files which match any of those parameters in server generated directory listings.
AddEncoding
AddEncoding names filename extensions which should specify a particular encoding type. AddEncoding can also be used to instruct some browsers to uncompress certain files as they are downloaded.
AddLanguage
AddLanguage associates file name extensions with specific languages. This directive is useful for Apache HTTP Servers which serve content in multiple languages based on the client Web browser's language settings.
LanguagePriority
LanguagePriority sets precedence for different languages in case the client Web browser has no language preference set.
AddType
Use the AddType directive to define or override a default MIME type and file extension pairs. The following example directive tells the Apache HTTP Server to recognize the .tgz file extension: AddType application/x-tar .tgz
AddHandler
AddHandler maps file extensions to specific handlers. For example, the cgi-script handler can be matched with the extension .cgito automatically treat a file ending with .cgi as a CGI script. The following is a sample AddHandler directive for the .cgi extension.
AddHandler cgi-script .cgi
This directive enables CGIs outside of the cgi-bin to function in any directory on the server which has the ExecCGI option within the directories container.
Action
Action specifies a MIME content type and CGI script pair, so that whenever a file of that media type is requested, a particular CGI script is executed.
ErrorDocument
The ErrorDocument directive associates an HTTP response code with a message or a URL to be sent back to the client. By default, the Web server outputs a simple and usually cryptic error message when an error occurs. The ErrorDocument directive forces the Web server to instead output a customized message or page.
BrowserMatch
The BrowserMatch directive allows the server to define environment variables and take appropriate actions based on the User-Agent HTTP header field — which identifies the client's Web browser type. By default, the Web server uses BrowserMatch to deny connections to specific browsers with known problems and also to disable keepalives and HTTP header flushes for browsers that are known to have problems with those actions.
Location
The <Location> and </Location> tags create a container in which access control based on URL can be specified.
For instance, to allow people connecting from within the server's domain to see status reports, use the following directives:
<Location /server-status>
    SetHandler server-status
    Order deny,allow
    Deny from all
    Allow from <.example.com>
</Location>
Replace <.example.com> with the second-level domain name for the Web server.
To provide server configuration reports (including installed modules and configuration directives) to requests from inside the domain, use the following directives:

<Location /server-info>
    SetHandler server-info
    Order deny,allow
    Deny from all
    Allow from <.example.com>
</Location
Again, replace <.example.com> with the second-level domain name for the Web server.
ProxyRequests
To configure the Apache HTTP Server to function as a proxy server, remove the hash mark (#) from the beginning of the <IfModule mod_proxy.c> line, the ProxyRequests, and each line in the <Proxy> stanza. Set the ProxyRequests directive to On, and set which domains are allowed access to the server in the Allow from directive of the <Proxy> stanza.
Proxy
<Proxy *> and </Proxy> tags create a container which encloses a group of configuration directives meant to apply only to the proxy server. Many directives which are allowed within a <Directory> container may also be used within <Proxy> container.
ProxyVia
The ProxyVia command controls whether or not an HTTP Via: header line is sent along with requests or replies which go through the Apache proxy server. The Via: header shows the hostname if ProxyVia is set to On, shows the hostname and the Apache HTTP Server version for Full, passes along any Via: lines unchanged for Off, and Via: lines are removed for Block.
Cache Directives
A number of commented cache directives are supplied by the default Apache HTTP Server configuration file. In most cases, uncommenting these lines by removing the hash mark (#) from the beginning of the line is sufficient. The following, however, is a list of some of the more important cache-related directives.
CacheEnable — Specifies whether the cache is a disk, memory, or file descriptor cache. By default CacheEnable configures a disk cache for URLs at or below /.
CacheRoot — Specifies the name of the directory containing cached files. The default CacheRoot is the /var/httpd/proxy/directory.
CacheSize — Specifies how much space the cache can use in kilobytes. The default CacheSize is 5 KB.
The following is a list of some of the other common cache-related directives.
CacheMaxExpire — Specifies how long HTML documents are retained (without a reload from the originating Web server) in the cache. The default is 24 hours (86400 seconds).
CacheLastModifiedFactor — Specifies the creation of an expiry (expiration) date for a document which did not come from its originating server with its own expiry set. The default CacheLastModifiedFactor is set to 0.1, meaning that the expiry date for such documents equals one-tenth of the amount of time since the document was last modified.
CacheDefaultExpire — Specifies the expiry time in hours for a document that was received using a protocol that does not support expiry times. The default is set to 1 hour (3600 seconds).
NoProxy — Specifies a space-separated list of subnets, IP addresses, domains, or hosts whose content is not cached. This setting is most useful for Intranet sites.
NameVirtualHost
The NameVirtualHost directive associates an IP address and port number, if necessary, for any name-based virtual hosts. Name-based virtual hosting allows one Apache HTTP Server to serve different domains without using multiple IP addresses.
To enable name-based virtual hosting, uncomment the NameVirtualHost configuration directive and add the correct IP address. Then add more VirtualHost containers for each virtual host.
VirtualHost
<VirtualHost> and </VirtualHost> tags create a container outlining the characteristics of a virtual host. The VirtualHostcontainer accepts most configuration directives.
A commented VirtualHost container is provided in httpd.conf, which illustrates the minimum set of configuration directives necessary for each virtual host.
SetEnvIf
SetEnvIf sets environment variables based on the headers of incoming connections. It is not solely an SSL directive, though it is present in the supplied /etc/httpd/extra/conf/ssl.conf file. It's purpose in this context is to disable HTTP keepalive and to allow SSL to close the connection without a close notify alert from the client browser. This setting is necessary for certain browsers that do not reliably shut down the SSL connection.

Apchectl Commnads

Apachectl: This is short for apache server control interface to help admin manage the http daemon.
This utility includes a variety of commands for starting, stopping, checking httpd status and running syntax tests.
Apachectl start:
Start command starts the httpd daemon. An error message displays if httpd is already running.
Restart:
If httpd is running the restart command restart the daemon, automatically checking the configuration files as in configtest to make sure the daemon doesn’t die. If the daemon is not running this control will start it.
Graceful:
The command will start the httpd daemon if it is not running. It allows current connections to continue before restarting the http daemon.
configtest:
Command carries out a configuration syntax test. If passes the configuration files and returns either syntax ok (or) detailed information about the syntax error. While this command can’t check if the configuration file what you expect them to do, it does make sure all configuration syntax is correct.
Full status:
Command provides a status report from mod-status. We will need to have both a text based browser such as syntax and mod-status installed on the server if you want to use this command to report on the web server’s status.
Status:
It will provide a brief status report similar to full status.

Apache Redirects

What you are trying to accomplish here is to have one resource (either a page or an entire site) redirect a visitor to a completely different page or site, and while doing so tell the visitor's browser that the redirect is either permanent (301) or temporary (302).
Therefore you need to do three things:
Have 2 resources - one source page or website, and one destination page or website.
When an attempt to access the source resource is made, the webserver transfers the visitor to the destination instead.
During the transfer, the webserver reports to the visitor that a redirect is happening and it's either temporary or permanent.
The ability to control the "status" argument in the redirect directive (which sets whether it's a 301 or 302) within Apache is only available in version 1.2 and above. You are best off using version 2 or above for maximum stability, security and usefulness. 

301 Redirect
A function of a web server that redirects the visitor from the current page or site to another page or site, while returning a response code that says that the original page or site has been permanently moved to the new location. Search engines like this information and will readily transfer link popularity (and PageRank) to the new site quickly and with few issues. They are also not as likely to cause issues with duplication filters. SEOs like 301 redirects, and they are usually the preferred way to deal with multiple domains pointing at one website.
302 Redirect
A function of a web server that redirects the visitor from the current page or site to another page or site, while returning a response code that says that the original page or site has been temporarily moved to the new location. Search engines will often interpret these as a park, and take their time figuring out how to handle the setup. Try to avoid a 302 redirect on your site if you can (unless it truly is only a temporary redirect), and never use them as some form of click tracking for your outgoing links, as they can result in a "website hijacking" under some circumstances.

mod_rewrite
Mod_Rewrite is an Apache extension module which will allow URL's to be rewritten on the fly. Often this is used by SEOs to convert dynamic URL's with multiple query strings into static URL's. An example of this would be to convert the dynamic URL domain.com/search.php?day=31&month=may&year=2005 to domain.com/search-31-may-2005.htm

htaccess
htaccess (Hypertext Access) is the default name of Apache's directory-level configuration file. It provides the ability to customize configuration directives defined in the main configuration file. You can execute a mod_rewrite script using the .htaccess file.

httpd.conf
Apache is configured by placing directives in plain text configuration files. The main configuration file is usually called httpd.conf. The location of this file is set at compile-time, but may be overridden with the -f command line flag. In addition, other configuration files may be added using the Include directive, and wildcards can be used to include many configuration files. Any directive may be placed in any of these configuration files. Changes to the main configuration files are only recognized by Apache when it is started or restarted.

Redirection (302)
A default redirection function of IIS that redirects the visitor from the current page or site to another page or site, while returning a response code that says that the original page or site has been temporarily moved to the new location. Search engines will often interpret these as a park, and take their time figuring out how to handle the setup. Try to avoid a 302 redirect on your site if you can (unless it truly is only a temporary redirect), and never use them as some form of click tracking for your outgoing links, as they can result in a "website hijacking" under some circumstances.

 Permanent Redirection (301)
An optional function of IIS that redirects the visitor from the current page or site to another page or site, while returning a response code that says that the original page or site has been permanently moved to the new location. Search engines like this information and will readily transfer link popularity (and PageRank) to the new site quickly and with few issues. They are also not as likely to cause issues with duplication filters. SEOs like 301 redirects, and they are usually the preferred way to deal with multiple domains pointing at one website.

Mod_Rewrite and the Apache Redirect
If you have the mod_rewrite extension installed (it comes with most Apache installs as a default) you can use it to dynamically change URL's using arguments on the fly - this is NOT a 301 redirect, but rather it's related behavior. For example, if you wanted to redirect .htm files from an old server to their equivalent .php files on a new one using a 301 redirect, you would use a combination of mod_rewrite and the redirect directive to do the redirection + URL change.
You could do it on a file by file basis by making a really long list of possible redirects in the .htaccess file by hand without mod_rewrite, but that would be a real pain on a server with a lot of files, or a completely dynamic system. Therefore these 2 functions are often used together.
Syntax for a 301 Redirect
The syntax for the redirect directive is:
Redirect /yourdirectory http://www.newdomain.com/newdirectory
If the client requests http://myserver/service/foo.txt, it will be told to access http://www.yourdomain.com/service/foo.txt instead.
Note: Redirect directives take precedence over Alias and ScriptAlias directives, irrespective of their ordering in the configuration file. Also, URL-path must be a fully qualified URL, not a relative path, even when used with .htaccess files or inside of <Directory> sections.
If you use the redirect without the status argument, it will return a status code of 302 by default. This default behaviour has given me problems over the years as an SEO, so it's important to remember to use it, like this:
Redirect permanent /one http://www.newdomain.com/two
or
Redirect 301 /two http://www.newdomain.com/other
Both of which will return the 301 status code. If you wanted to return a 302 you could either not specify anything, or use "302" or "temp" as the status argument above.
You can also use 2 other directives - RedirectPermanent URL-path URL (returns a 301 and works the same as Redirect permanent /URL PathURL) and RedirectTemp URL-path URL (same, but for a 302 status).
For more global changes, you would use redirectMatch, with the same syntax:
RedirectMatch 301 ^(.*)$ http://www.newdomain.com
or
RedirectMatch permanent ^(.*)$ http://www.newdomain.com
These arguments will match any file requested at the old account, change the domain, and redirect it to the file of the same name at the new account.
You would use these directives in either the .htaccess file or the httpd file. It's most common to do it in the .htaccess file because it's the easiest and doesn't require a restart, but the httpd method has less overhead and works fine, as well.
Simple Domain 301 Redirect Checklist
This assumes you just have a new domain (with no working pages under it) and want it to redirect properly to your main domain.
1. Ensure that you have 2 accounts - the old site and the new site (they do not have to be on different IP's or different machines).
2. Your main (proper or canonical) site should be pointed at the new site using DNS. All your other domains should be pointed at the old site using DNS. Parking them there is fine at this point.
3. Find the .htaccess file at the root of your old account. Yes, it starts with a "." We will be working with this file. The new site does not need any changes made to it - the old site does all the redirection work.
4. Download the .htaccess file and open it in a text only editor.
5a. Add this code:
Redirect 301 / http://www.newdomain.com/
6. Then upload the file to your root folder and test your new redirect. Make you you also check it using a HTTP Header viewer just to be sure it shows as a 301.
Control Panel Method
cPanel redirect
Log into your cPanel, and look for "Redirects" under Site Management
Put in the current directory into the first box
Put the new directory in the second box
Choose the type (temporary or permanent) temporary=302 and permanent=301
Click "Add" and you're done
You can only do 302 redirects (or frame forwarding - bad!) using the Plesk control panel - use .htaccess for 301's instead.
If you use Ensim, the only way to redirect is by using the .htaccess file (no control panel option at this time).
Basic Old Website to New Website Redirection
This is used when you have an existing website (with pages) and want to move it to a new domain, while keeping all your page names and the links to them.
1. Ensure that you have 2 websites - the old site and the new site, and that they are on different accounts (they do not have to be on different IP's or different machines).
2. Your main (proper or canonical) site should be pointed at the new site using DNS. All your old domains should be pointed at the old site using DNS.
3. Find the .htaccess file at the root of your old account. Yes, it starts with a "."  We will be working with this file. The new site does not need any changes made to it - the old site does all the redirection work.
4. Download the .htaccess file and open it in a text only editor.
5a. If you have mod_rewrite installed, add this code:
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} !^newdomain\.com
RewriteRule ^(.*)$ http://www.newdomain.com/$1 [R=301,L]
5b. If you don't have mod_rewrite installed, you really should. If you can't install it, then you can use this code instead:
RedirectMatch 301 ^(.*)$ http://www.newdomain.com
6. Then upload the file to your root folder and test your new redirect. Make you you also check it using a HTTP Header viewer just to be sure it shows as a 301.
FrontPage on Apache
After you've done the basic Apache 301 redirection described in this article, you will also need to change the .htaccess files in:  
_vti_bin
_vti_bin /_vti_adm
_vti_bin/ _vti_aut
Replace "Options None" to "Options +FollowSymLinks"
Those folders are part of your FrontPage extensions on the server, so you will have to use FTP to get to them, since FrontPage hides these folders by default to prevent them from accidentally being messed with by novice users.
More Complicated Redirects
You can't use a control panel in Apache currently for these - .htaccess only.
Redirecting everything to a single page
This is common when you are totally changing the new website from the old and you just want all your links and requests form the old site to be directed to a spot on your new site (usually the home page). You actually need to do it on a page by page basis.
Redirect 301 /oldfile1.htm http://www.newdomain.com
Redirect 301 /oldfile2.htm http://www.newdomain.com
Redirect 301 /oldfile3.htm http://www.newdomain.com
Redirection while changing the filename
This example will redirect all the files on the old account that end in html to the same file on the new account, but with a php extension. You can also use this technique within the same account if you want to change all your extensions but don't want to lose your incoming links to the old pages. This is  common when people switch to from static htm files to dynamic ones while keeping the same domain name, for example.
Just change the "html" and "php" parts of the below example to your specific situation, if needed.
RedirectMatch 301 (.*)\.html$ http://www.newdomain.com$1.php
Redirection while changing the filename, but keeping the GET arguments
Sometimes, you will want to change to a different CMS, but keep your database the same, or you want to switch everything but you like the arguments and don't want to change them.
RedirectMatch 301 /oldcart.php(.*) http://www.newdomain.com/newcart.php$1
This will result in "http://www.olddomain.com/oldcart.php?Cat_ID=Blue" being redirected to "http://www.newdomain.com/newcart.php?Cat_ID=Blue"



URL Rewriting

Most dynamic sites include variables in their URLs that tell the site what information to show the user. Typically, this gives URLs like the following, telling the relevant script on a site to load product number 7.
http://www.pets.com/show_a_product.php?product_id=7

The problems with this kind of URL structure are that the URL is not at all memorable. It's difficult to read out over the phone (you'd be surprised how many people pass URLs this way). Search engines and users alike get no useful information about the content of a page from that URL. You can't tell from that URL that that page allows you to buy a Norwegian Blue Parrot (lovely plumage). It's a fairly standard URL - the sort you'd get by default from most CMSes. Compare that to this URL:
http://www.pets.com/products/7/

Clearly a much cleaner and shorter URL. It's much easier to remember, and vastly easier to read out. That said, it doesn't exactly tell anyone what it refers to. But we can do more:
http://www.pets.com/parrots/norwegian-blue/

Now we're getting somewhere. You can tell from the URL, even when it's taken out of context, what you're likely to find on that page. Search engines can split that URL into words (hyphens in URLs are treated as spaces by search engines, whereas underscores are not), and they can use that information to better determine the content of the page. It's an easy URL to remember and to pass to another person.

Unfortunately, the last URL cannot be easily understood by a server without some work on our part. When a request is made for that URL, the server needs to work out how to process that URL so that it knows what to send back to the user. URL rewriting is the technique used to "translate" a URL like the last one into something the server can understand.
Platforms and Tools

Depending on the software your server is running, you may already have access to URL rewriting modules. If not, most hosts will enable or install the relevant modules for you if you ask them very nicely.

Apache is the easiest system to get URL rewriting running on. It usually comes with its own built-in URL rewriting module, mod_rewrite, enabled, and working with mod_rewrite is as simple as uploading correctly formatted and named text files.

IIS, Microsoft's server software, doesn't include URL rewriting capability as standard, but there are add-ons out there that can provide this functionality. ISAPI_Rewrite is the one I recommend working with, as I've so far found it to be the closest to mod_rewrite's functionality. Instructions for installing and configuring ISAPI_Rewrite can be found at the end of this article.

The code that follows is based on URL rewriting using mod_rewrite.
Basic URL Rewriting

To begin with, let's consider a simple example. We have a website, and we have a single PHP script that serves a single page. Its URL is:
http://www.pets.com/pet_care_info_07_07_2008.php

We want to clean up the URL, and our ideal URL would be:
http://www.pets.com/pet-care/

In order for this to work, we need to tell the server to internally redirect all requests for the URL "pet-care" to "pet_care_info_07_07_2008.php". We want this to happen internally, because we don't want the URL in the browser's address bar to change.

To accomplish this, we need to first create a text document called ".htaccess" to contain our rules. It must be named exactly that (not ".htaccess.txt" or "rules.htaccess"). This would be placed in the root directory of the server (the same folder as "pet_care_info_07_07_2008.php" in our example). There may already be an .htaccess file there, in which case we should edit that rather than overwrite it.

The .htaccess file is a configuration file for the server. If there are errors in the file, the server will display an error message (usually with an error code of "500"). If you are transferring the file to the server using FTP, you must make sure it is transferred using the ASCII mode, rather than BINARY. We use this file to perform 2 simple tasks in this instance - first, to tell Apache to turn on the rewrite engine, and second, to tell apache what rewriting rule we want it to use. We need to add the following to the file:
RewriteEngine On # Turn on the rewriting engine 
RewriteRule ^pet-care/?$ pet_care_info_01_02_2008.php [NC,L] # Handle requests for "pet-care"

A couple of quick items to note - everything following a hash symbol in an .htaccess file is ignored as a comment, and I'd recommend you use comments liberally; and the "RewriteEngine" line should only be used once per .htaccess file (please note that I've not included this line from here onwards in code example).

The "RewriteRule" line is where the magic happens. The line can be broken down into 5 parts:

    RewriteRule - Tells Apache that this like refers to a single RewriteRule.
    ^/pet-care/?$ - The "pattern". The server will check the URL of every request to the site to see if this pattern matches. If it does, then Apache will swap the URL of the request for the "substitution" section that follows.
    pet_care_info_01_02_2003.php - The "substitution". If the pattern above matches the request, Apache uses this URL instead of the requested URL.
    [NC,L] - "Flags", that tell Apache how to apply the rule. In this case, we're using two flags. "NC", tells Apache that this rule should be case-insensitive, and "L" tells Apache not to process any more rules if this one is used.
    # Handle requests for "pet-care" - Comment explaining what the rule does (optional but recommended)

The rule above is a simple method for rewriting a single URL, and is the basis for almost all URL rewriting rules.
Patterns and Replacements

The rule above allows you to redirect requests for a single URL, but the real power of mod_rewrite comes when you start to identify and rewrite groups of URLs based on patterns they contain.

Let's say you want to change all of your site URLs as described in the first pair of examples above. Your existing URLs look like this:
http://www.pets.com/show_a_product.php?product_id=7

And you want to change them to look like this:
http://www.pets.com/products/7/

Rather than write a rule for every single product ID, you of course would rather write one rule to manage all product IDs. Effectively you want to change URLs of this format:
http://www.pets.com/show_a_product.php?product_id={a number}

And you want to change them to look like this:
http://www.pets.com/products/{a number}/

In order to do so, you will need to use "regular expressions". These are patterns, defined in a specific format that the server can understand and handle appropriately. A typical pattern to identify a number would look like this:
[0-9]+

The square brackets contain a range of characters, and "0-9" indicates all the digits. The plus symbol indicates that the pattern will idenfiy one or more of whatever precedes the plus - so this pattern effectively means "one or more digits" - exactly what we're looking to find in our URL.

The entire "pattern" part of the rule is treated as a regular expression by default - you don't need to turn this on or activate it at all.
RewriteRule ^products/([0-9]+)/?$ show_a_product.php?product_id=$1 [NC,L] 
# Handle product requests

The first thing I hope you'll notice is that we've wrapped our pattern in brackets. This allows us to "back-reference" (refer back to) that section of the URL in the following "substitution" section. The "$1" in the substitution tells Apache to put whatever matched the earlier bracketed pattern into the URL at this point. You can have lots of backreferences, and they are numbered in the order they appear.

And so, this RewriteRule will now mean that Apache redirects all requests for domain.com/products/{number}/ to show_a_product.php?product_id={same number}.
Regular Expressions

A complete guide to regular expressions is rather beyond the scope of this article. However, important points to remember are that the entire pattern is treated as a regular expression, so always be careful of characters that are "special" characters in regular expressions.

The most instance of this is when people use a period in their pattern. In a pattern, this actually means "any character" rather than a literal period, and so if you want to match a period (and only a period) you will need to "escape" the character - precede it with another special character, a backslash, that tells Apache to take the next character to be literal.

For example, this RewriteRule will not just match the URL "rss.xml" as intended - it will also match "rss1xml", "rss-xml" and so on.
RewriteRule ^rss.xml$ rss.php [NC,L] # Change feed URL

This does not usually present a serious problem, but escaping characters properly is a very good habit to get into early. Here's how it should look:
RewriteRule ^rss\.xml$ rss.php [NC,L] # Change feed URL

This only applies to the pattern, not to the substitution. Other characters that require escaping (referred to as "metacharacters") follow, with their meaning in brackets afterwards:

    . (any character)
    * (zero of more of the preceding)
    + (one or more of the preceding)
    {} (minimum to maximum quantifier)
    ? (ungreedy modifier)
    ! (at start of string means "negative pattern")
    ^ (start of string, or "negative" if at the start of a range)
    $ (end of string)
    [] (match any of contents)
    - (range if used between square brackets)
    () (group, backreferenced group)
    | (alternative, or)
    \ (the escape character itself)

Using regular expressions, it is possible to search for all sorts of patterns in URLs and rewrite them when they match. Time for another example - we wanted earlier to be able to indentify this URL and rewrite it:

http://www.pets.com/parrots/norwegian-blue/

And we want to be able to tell the server to interpret this as the following, but for all products:
http://www.pets.com/get_product_by_name.php?product_name=norwegian-blue

And we can do that relatively simply, with the following rule:
RewriteRule ^parrots/([A-Za-z0-9-]+)/?$ get_product_by_name.php?product_name=$1 [NC,L] 
# Process parrots
With this rule, any URL that starts with "parrots" followed by a slash (parrots/), then one or more (+) of any combination of letters, numbers and hyphens ([A-Za-z0-9-]) (note the hyphen at the end of the selection of characters within square brackets - it must be added there to be treated literally rather than as a range separator). We reference the product name in brackets with $1 in the substitution.

We can make it even more generic, if we want, so that it doesn't matter what directory a product appears to be in, it is still sent to the same script, like so:
RewriteRule ^[A-Za-z-]+/([A-Za-z0-9-]+)/?$ get_product_by_name.php?product_name=$1 [NC,L] # Process all products
As you can see, we've replaced "parrots" with a pattern that matches letter and hyphens. That rule will now match anything in the parrots directory or any other directory whose name is comprised of at least one or more letters and hyphens.
Flags

Flags are added to the end of a rewrite rule to tell Apache how to interpret and handle the rule. They can be used to tell apache to treat the rule as case-insensitive, to stop processing rules if the current one matches, or a variety of other options. They are comma-separated, and contained in square brackets. Here's a list of the flags, with their meanings (this information is included on the cheat sheet, so no need to try to learn them all).

    C (chained with next rule)
    CO=cookie (set specified cookie)
    E=var:value (set environment variable var to value)
    F (forbidden - sends a 403 header to the user)
    G (gone - no longer exists)
    H=handler (set handler)
    L (last - stop processing rules)
    N (next - continue processing rules)
    NC (case insensitive)
    NE (do not escape special URL characters in output)
    NS (ignore this rule if the request is a subrequest)
    P (proxy - i.e., apache should grab the remote content specified in the substitution section and return it)
    PT (pass through - use when processing URLs with additional handlers, e.g., mod_alias)
    R (temporary redirect to new URL)
    R=301 (permanent redirect to new URL)
    QSA (append query string from request to substituted URL)
    S=x (skip next x rules)
    T=mime-type (force specified mime type)

Moving Content
RewriteRule ^article/?$ http://www.new-domain.com/article/ [R,NC,L] # Temporary Move

Adding an "R" flag to the flags section changes how a RewriteRule works. Instead of rewriting the URL internally, Apache will send a message back to the browser (an HTTP header) to tell it that the document has moved temporarily to the URL given in the "substitution" section. Either an absolute or a relative URL can be given in the substitution section. The header sent back includea a code - 302 - that indicates the move is temporary.
RewriteRule ^article/?$ http://www.new-domain.com/article/ [R=301,NC,L] # Permanent Move

If the move is permanent, append "=301" to the "R" flag to have Apache tell the browser the move is considered permanent. Unlike the default "R", "R=301" will also tell the browser to display the new address in the address bar.

This is one of the most common methods of rewriting URLs of items that have moved to a new URL (for example, it is in use extensively on this site to forward users to new post URLs whenever they are changed).
Conditions

Rewrite rules can be preceded by one or more rewrite conditions, and these can be strung together. This can allow you to only apply certain rules to a subset of requests. Personally, I use this most often when applying rules to a subdomain or alternative domain as rewrite conditions can be run against a variety of criteria, not just the URL. Here's an example:
RewriteCond %{HTTP_HOST} ^addedbytes\.com [NC] RewriteRule ^(.*)$ http://www.addedbytes.com/$1 [L,R=301]
The rewrite rule above redirects all requests, no matter what for, to the same URL at "www.addedbytes.com". Without the condition, this rule would create a loop, with every request matching that rule and being sent back to itself. The rule is intended to only redirect requests missing the "www" URL portion, though, and the condition preceding the rule ensures that this happens.

The condition operates in a similar way to the rule. It starts with "RewriteCond" to tell mod_rewrite this line refers to a condition. Following that is what should actually be tested, and then the pattern to test. Finally, the flags in square brackets, the same as with a RewriteRule.

The string to test (the second part of the condition) can be a variety of different things. You can test the domain being requested, as with the above example, or you could test the browser being used, the referring URL (commonly used to prevent hotlinking), the user's IP address, or a variety of other things (see the "server variables" section for an outline of how these work).

The pattern is almost exactly the same as that used in a RewriteRule, with a couple of small exceptions. The pattern may not be interpreted as a pattern if it starts with specific characters as described in the following "exceptions" section. This means that if you wish to use a regular expression pattern starting with <, >, or a hyphen, you should escape them with the backslash.

Rewrite conditions can, like rewrite rules, be followed by flags, and there are only two. "NC", as with rules, tells Apache to treat the condition as case-insensitive. The other available flag is "OR". If you only want to apply a rule if one of two conditions match, rather than repeat the rule, add the "OR" flag to the first condition, and if either match then the following rule will be applied. The default behaviour, if a rule is preceded by multiple conditions, is that it is only applied if all rules match.
Exceptions and Special Cases

Rewrite conditions can be tested in a few different ways - they do not need to be treated as regular expression patterns, although this is the most common way they are used. Here are the various ways rewrite conditons can be processed:

    <Pattern (is test string lower than pattern)
    >Pattern (is test string greater than pattern)
    =Pattern (is test string equal to pattern)
    -d (is test string a valid directory)
    -f (is test string a valid file)
    -s (is test string a valid file with size greater than zero)
    -l (is test string a symbolic link)
    -F (is test string a valid file, and accessible (via subrequest))
    -U (is test string a valid URL, and accessible (via subrequest))

Server Variables

Server variables are a selection of items you can test when writing rewrite conditions. This allows you to apply rules based on all sorts of request parameters, including browser identifiers, referring URL or a multitude of other strings. Variables are of the following format:
%{VARIABLE_NAME}

And "VARIABLE_NAME" can be replaced with any one of the following items:

    HTTP Headers
        HTTP_USER_AGENT
        HTTP_REFERER
        HTTP_COOKIE
        HTTP_FORWARDED
        HTTP_HOST
        HTTP_PROXY_CONNECTION
        HTTP_ACCEPT
    Connection Variables
        REMOTE_ADDR
        REMOTE_HOST
        REMOTE_USER
        REMOTE_IDENT
        REQUEST_METHOD
        SCRIPT_FILENAME
        PATH_INFO
        QUERY_STRING
        AUTH_TYPE
    Server Variables
        DOCUMENT_ROOT
        SERVER_ADMIN
        SERVER_NAME
        SERVER_ADDR
        SERVER_PORT
        SERVER_PROTOCOL
        SERVER_SOFTWARE
    Dates and Times
        TIME_YEAR
        TIME_MON
        TIME_DAY
        TIME_HOUR
        TIME_MIN
        TIME_SEC
        TIME_WDAY
        TIME
    Special Items
        API_VERSION
        THE_REQUEST
        REQUEST_URI
        REQUEST_FILENAME
        IS_SUBREQ

Working With Multiple Rules

The more complicated a site, the more complicated the set of rules governing it can be. This can be problematic when it comes to resolving conflicts between rules. You will find this issue rears its ugly head most often when you add a new rule to a file, and it doesn't work. What you may find, if the rule itself is not at fault, is that an earlier rule in the file is matching the URL and so the URL is not being tested against the new rule you've just added.
RewriteRule ^([A-Za-z0-9-]+)/([A-Za-z0-9-]+)/?$ get_product_by_name.php?category_name=$1&product_name=$2 [NC,L] # Process product requests                 
RewriteRule ^([A-Za-z0-9-]+)/([A-Za-z0-9-]+)/?$ get_blog_post_by_title.php?category_name=$1&post_title=$2 [NC,L] # Process blog posts
In the example above, the product pages of a site and the blog post pages have identical patterns. The second rule will never match a URL, because anything that would match that pattern will have already been matched by the first rule.

There are a few ways to work around this. Several CMSes (including wordpress) handle this by adding an extra portion to the URL to denote the type of request, like so:
RewriteRule ^products/([A-Za-z0-9-]+)/([A-Za-z0-9-]+)/?$ get_product_by_name.php?category_name=$1&product_name=$2 [NC,L]  # Process product requests
RewriteRule ^blog/([A-Za-z0-9-]+)/([A-Za-z0-9-]+)/?$ get_blog_post_by_title.php?category_name=$1&post_title=$2 [NC,L]# Process blog posts
You could also write a single PHP script to process all requests, which checked to see if the second part of the URL matched a blog post or a product. I usually go for this option, as while it may increase the load on the server slightly, it gives much cleaner URLs.
RewriteRule ^([A-Za-z0-9-]+)/([A-Za-z0-9-]+)/?$ get_product_or_blog_post.php?category_name=$1&item_name=$2 [NC,L] # Process product and blog requests

There are certain situations where you can work around this issue by writing more precise rules and ordering your rules intelligently. Imagine a blog where there were two archives - one by topic and one by year.
RewriteRule ^([A-Za-z0-9-]+)/?$ get_archives_by_topic.php?topic_name=$1 [NC,L] # Get archive by topic 
RewriteRule ^([A-Za-z0-9-]+)/?$ get_archives_by_year.php?year=$1 [NC,L] # Get archive by

The above rules will conflict. Of course, years are numeric and only 4 digits, so you can make that rule more precise, and by running it first the only type of conflict you cound encounter would be if you had a topic with a 4-digit number for a name.
RewriteRule ^([0-9]{4})/?$ get_archives_by_year.php?year=$1 [NC,L] # Get archive by year RewriteRule ^([A-Za-z0-9-]+)/?$ get_archives_by_topic.php?topic_name=$1 [NC,L] # Get archive by topic
mod_rewrite

Apache's mod_rewrite comes as standard with most Apache hosting accounts, so if you're on shared hosting, you are unlikely to have to do anything. If you're managing your own box, then you most likely just have to turn on mod_rewrite. If you are using Apache1, you will need to edit your httpd.conf file and remove the leading '#' from the following lines:
#LoadModule rewrite_module modules/mod_rewrite.so #AddModule mod_rewrite.c

If you are using Apache2 on a Debian-based distribution, you need to run the following command and then restart Apache:
sudo a2enmod rewrite

Other distubutions and platforms differ. If the above instructions are not suitable for your system, then Google is your friend. You may need to edit your apache2 configuration file and add "rewrite" to the "APACHE_MODULES" list, or edit httpd.conf, or even download and compile mod_rewrite yourself. For the majority, however, installation should be simple.
ISAPI_Rewrite

ISAPI_Rewrite is a URL rewriting plugin for IIS based on mod_rewrite and is not free. It performs most of the same functionality as mod_rewrite, and there is a good quality ISAPI_Rewrite forum where most common questions are answered. As ISAPI_Rewrite works with IIS, installation is relatively simple - there are installation instructions available.

ISAPI_Rewrite rules go into a file named httpd.ini. Errors will go into a file named httpd.parse.errors by default.
Leading Slashes

I have found myself tripped up numerous times by leading slashes in URL rewriting systems. Whether they should be used in the pattern or in the substitution section of a RewriteRule or used in a RewriteCond statement is a constant source of frustration to me. This may be in part because I work with different URL rewriting engines, but I would advise being careful of leading slashes - if a rule is not working, that's often a good place to start looking. I never include leading slashes in mod_rewrite rules and always include them in ISAPI_Rewrite.
Sample Rules

To redirect an old domain to a new domain:
RewriteCond %{HTTP_HOST} old_domain\.com [NC] RewriteRule ^(.*)$ http://www.new_domain.com/$1 [L,R=301]
To redirect all requests missing "www" (yes www):
RewriteCond %{HTTP_HOST} ^domain\.com [NC] RewriteRule ^(.*)$ http://www.domain.com/$1 [L,R=301]

To redirect all requests with "www" (no www):
RewriteCond %{HTTP_HOST} ^www\.domain\.com [NC] RewriteRule ^(.*)$ http://domain.com/$1 [L,R=301]

Redirect old page to new page:
RewriteRule ^old-url\.htm$ http://www.domain.com/new-url.htm [NC,R=301,L]


.htaccess Error Documents

In Apache, you can set up each directory on your server individually, giving them different properties or requirements for access. And while you can do this through normal Apache configuration, some hosts may wish to give users the ability to set up their own virtual server how they like. And so we have .htaccess files, a way to set Apache directives on a directory by directory basis without the need for direct server access, and without being able to affect other directories on the same server.

One up-side of this (amongst many) is that with a few short lines in an .htaccess file, you can tell your server that, for example, when a user asks for a page that doesn't exist, they are shown a customized error page instead of the bog-standard error page they've seen a million times before. If you visit http://www.addedbytes.com/random_made_up_address then you'll see this in action - instead of your browser's default error page, you see an error page sent by my server to you, telling you that the page you asked for doesn't exist.

This has a fair few uses. For example, my 404 (page not found) error page also sends me an email whenever somebody ends up there, telling me which page they were trying to find, and where they came from to find it - hopefully, this will help me to fix broken links without needing to trawl through mind-numbing error logs.

[Aside: If you set up your custom error page to email you whenever a page isn't found, remember that "/favicon.ico" requests failing doesn't mean that a page is missing. Internet Explorer 5 assumes everyone has a "favicon" and so asks the server for it. It's best to filter error messages about missing "/favicon.ico" files from your error logging, if you plan to do any.]

Setting up your htaccess file is a piece of cake. First things first, open notepad (or better yet, [url=http://www.editplus.com/]EditPlus2[/url]), and add the following to a new document:
ErrorDocument 404     /404.html

Next you need to save the file. You need to save it as ".htaccess". Not ".htaccess.txt", or "mysite.htaccess" - just ".htaccess". I know it sounds strange, but that is what these files are - just .htaccess files. Nothing else. Happy? If not, take a look at this [url=http://wsabstract.com/howto/htaccess.shtml].htaccess guide[/url], which also explains the naming convention of .htaccess in a little more depth. If you do use Notepad, you may need to rename the file after saving it, and you can do this before or after uploading the file to your server.

Now, create a page called 404.html, containing whatever you want a visitor to your site to see when they try to visit a page that doesn't exist. Now, upload both to your website, and type in a random, made-up address. You should, with any luck, see your custom error page instead of the traditional "Page Not Found" error message. If you do not see that, then there is a good chance your server does not support .htaccess, or it has been disabled. I suggest the next thing you do is check quickly with your server administrator that you are allowed to use .htaccess to serve custom error pages.

If all went well, and you are now viewing a custom 404 (page not found) error page, then you are well on your way to a complete set of error documents to match your web site. There are more errors out there, you know, not just missing pages. Of course, you can also use PHP, ASP or CFML pages as error documents - very useful for keeping track of errors.

You can customize these directives a great deal. For example, you can add directives for any of the status codes below, to show custom pages for any error the server may report. You can also, if you want, specify a full URL instead of a relative one. And if you are truly adventurous, you could even use pure HTML in the .htaccess file to be displayed in case of an error, as below. Note that if you want to use HTML, you must start the HTML with a quotation mark, however you should not put one at the other end of the HTML (you can include quotation marks within the HTML itself as normal).
ErrorDocument 404 "Ooops, that page was <b>not found</b>. Please try a different one or <a href="mailto:owner@site.com">email the site owner</a> for assistance.
Server response codes

A server reponse code is a three digit number sent by a server to a user in response to a request for a web page or document. They tell the user whether the request can be completed, or if the server needs more information, or if the server cannot complete the request. Usually, these codes are sent 'silently' - so you never see them, as a user - however, there are some common ones that you may wish to set up error pages for, and they are listed below. Most people will only ever need to set up error pages for server codes 400, 401, 403, 404 and 500, and you would be wise to always have an error document for 404 errors at the very least.

It is also relatively important to ensure that any error page is over 512 bytes in size. Internet Explorer 5, when sent an error page of less than 512 bytes, will display its own default error document instead of your one. Feel free to use padding if this is an issue - personally, I'm not going to increase the size of a page because Internet Explorer 5 doesn't behave well.

In order to set up an error page for any other error codes, you simply add more lines to your .htaccess file. If you wanted to have error pages for the above five errors, your .htaccess file might look something like this:
ErrorDocument 400     /400.html
ErrorDocument 401     /401.html
ErrorDocument 403     /403.html
ErrorDocument 404     /404.html
ErrorDocument 500     /500.html

1 comment:

  1. Thanks for the useful information on apache. This makes me easier to learn apache.

    ReplyDelete