WebServer
A web
server is an information technology that processes requests via HTTP, the
basic network protocol used to distribute information on the World Wide Web
(WWW). When the Web server receives an HTTP request, it responds
with an HTTP response, such as sending back an HTML page.
Apache is
one of the web server. The list of web servers available in the market are
Apache web server, Microsoft IIS, Nginx web server, Lighttpd, Jigsaw, klone,
Abyss web server, Oracle Http server, X5 webserver, Zeus webserver, IBM Http
server, Google web server, Oracle iPlanet web server, Redhat web server etc.
Apache
The Apache
HTTP Server is the world's most used web server software. Originally
based on the NCSA HTTPd server, development of Apache began in early
1995 after work on the NCSA code stalled. Apache played a key role in the
initial growth of the World Wide Web quickly overtaking NCSA HTTPd as
the dominant HTTP server, and has remained most popular since April
1996. In 2009, it became the first web server software to serve more than 100
million websites.
Apache is
developed and maintained by an open community of developers under the auspices
of the Apache Software Foundation. Most commonly used on
a Unix-like system (usually Linux), the software is
available for a wide variety of operatingsystems besides Unix,
including eComStation, Microsoft
Windows, NetWare, OpenVMS, OS/2, and TPF. Released under
the Apache License, Apache is free and open-source software.
Apache Version
|
Initial release
|
Latest release
|
1.3
|
1998-06-06
|
2010-02-03 (1.3.42)
|
2.0
|
2002-04-06
|
2013-07-10 (2.0.65)
|
2.2
|
2005-12-01
|
2015-07-17 (2.2.31)
|
2.4
|
2012-02-21
|
2015-12-14 (2.4.18)
|
Client
(Web browser): A
client connects to a server (Apache HTTP Server), with the specified protocol
(http), and makes a request for a resource using the
URL-path
Server (
Apache http server): The
server will send a response consisting of a status code and,
optionally, a response body. The status code indicates whether the request was
successful, and, if not, what kind of error condition there was. This tells the
client what it should do with the response
In order
to connect to a server, the client will first have to resolve the server name
to an IP address - the location on the Internet where the server resides. Thus,
in order for your web server to be reachable, it is necessary that the
servername be in DNS.
If you
don't know how to do this, you'll need to contact your network administrator,
or Internet service provider, to perform this step for you.
More than
one hostname may point to the same IP address, and more than one IP address can
be attached to the same physical server. Thus, you can run more than one web
site on the same physical server, using a feature called virtual hosts.
If you
are testing a server that is not Internet-accessible, you can put host names in
your hosts file in order to do local resolution. For example, you might want to
put a record in your hosts file to map a request for www.example.com to
your local system, for testing purposes.
This
entry would look like:
127.0.0.1
www.example.com
A hosts
file will probably be located
at /etc/hosts or C:\Windows\system32\drivers\etc\hosts.
Features
of Apache.
1. Modules
2. Aliases
3 Virtual Hosting
Modules: One of the Apache's key
features is its modular construction. After installation you can add extra
functionality to it quickly and easily by loading modules without having to
re-compile the source code for Example: you can load mod-dir for basic dir
handling (or) mod-auth to authenticate users with text file.
The
modular approach in Apache makes it is easy for third party developers to add
functionality to a server. You can also customize the Apache for a site by
developing own modules using Apache module API.
Alias: Apache supports the use of
aliases, which enable it to serve content from file system locations other than
those directly underneath the specified document sort.
As a
result an Apache server can reference any content on a computer (or) even on
other computers without having to move (or) duplicate the information.
Virtual
Hosting:Virtual
hosting is a useful feature of Apache that enables the simultaneous hosting of
multiple websites on a single computer.
Virtual
hosting has many practical applications. For Example: an ISP commonly
configures websites for different companies as virtual hosts this enables the
separation of the use of separate computers.
From 1.1
on wards Apache can support both IP-based and name-based virtual hosting.
The
IP-based system establishes which virtual host Apache should serve by using the
connection’s IP address. So it requires each virtual host domain to have a
dedicated IP address.
By using
the host names to identify virtual hosts name based virtual hosting enables the
use of one IP address for multiple virtual hosts.
Httpd.conf
file is the main configuration file in Apache
The Apache Directory Structure:
The
Apache software is typically distributed into the following subdirectories:
cgi-bin
|
This is
where many, if not all, of the interactive programs that you write will
reside. These will be programs written with Perl, Java, or other programming
languages.
|
Conf
|
This
directory will contain your configuration files.
|
htdocs
|
This
directory will contain your actual hypertext documents. This directory will
typically have many subdirectories. This directory is known as the
DocumentRoot.
|
Icons
|
This
directory contains the icons (small images) that Apache will use when
displaying information or error messages.
|
images
|
This
directory will contain the image files (GIF or JPG) that you will use on your
web site.
|
Logs
|
This
directory will contain your log files - the access_log and error_log files.
|
Sbin
|
Use
nogroup
|
Main
Configuration file in apache:
1.The
Apache software is configured by changing settings in several text files in the
Apache conf (configuration) directory.
2.There
are four configuration files used by Apache. The main configuration file is
usually called httpd.conf.
access.conf
|
This is
The security configuration file. It Contains instructions about which users
should be able to access. And what information.
|
httpd.conf
|
This is
The server configuration file. It Typically contains directives that affect
how the server runs, such as user and group ID's it should use when running,
the location of other files, etc.
|
srm.conf
|
This is
The resource configuration file. It Contains directives that define where
documents are found, how to change addresses to filenames, etc.
|
mime.types
|
A
configuration file that relates filename extensions to file types.
|
Httpd.conf file Sections
Httpd.conf file has 3 main sections.
Section
1: Global
environment
The
directives in this section affect the overall operation such as the number of
concurrent requests it can handle (or) where it can find its configuration
data.
Access config /dev/null
Resource
config /dev/null
Server
Type Standalone
Specify
whether the apache server should seen under the inetd daemon (or) as a
standalone server.
Server
Root “/etc/httpd”
Don’t
give a slash at the end configuration and log files are stored as
subdirectories of this root directory.
Pid
File run/httpd.pid
The file in which the server should record its process identification number
when it starts.
Start
servers 5
The number of processes that are run at
start-up.
Timeout
300: Sets the period of which apache waits during certain operations before
sending (or) receiving a timeout signal.
Keep
Alive off: Whether (or) not to allow persistent connections (more one request
per connection). Set to “off” to deactivate.
Section
2: Main
Server Configuration
This section contains the directives for the main server. The values of these
directives are also used as the default values for virtual hosts unless the
virtual host section of the file specifies different values.
Port
number: 80
User and
Group: apache
Server
Admin: root@localhost
Server
Name: www.easynomad:80
Document
Root: “/var/www/html”
Directory
Index: http://www.easynomad.com/offers/
Error
Log:
Alias:
Script
Alias: /cgi-bin/ “/var/www/html” “/var/www/easynomad/cgi-bin”
Section
3: Virtual
Hosts
This enables you to setup virtual host
containers to enable multiple-servers capability
<
Virtual host >
Server Admin:
Document Root:
Server Name:
Error Log:
Custom Log:
<
Virtual host >
Apache
Directives
General
Configuration Tips
If
configuring the Apache HTTP Server,
edit /etc/httpd/conf/httpd.conf and then either reload, restart, or
stop and start the httpd.
Before
editing httpd.conf, make a copy the original file. Creating a backup makes
it easier to recover from mistakes made while editing the configuration file.
If a
mistake is made and the Web server does not work correctly, first review
recently edited passages in httpd.conf to verify there are no typos.
Next look
in the Web server's error log, /var/log/httpd/error_log. The error log may
not be easy to interpret, depending on your level of expertise. However, the
last entries in the error log should provide useful information.
The
following subsections contain a list of short descriptions for many of the
directives included in httpd.conf.
ServerRoot
The ServerRoot directive
specifies the top-level directory containing website content. By
default, ServerRoot is set to "/etc/httpd"for both
secure and non-secure servers.
Timeout
Timeout defines,
in seconds, the amount of time that the server waits for receipts and
transmissions during communications. Timeoutis set
to 300 seconds by default, which is appropriate for most situations.
KeepAlive sets
whether the server allows more than one request per connection and can be used
to prevent any one client from consuming too much of the server's resources.
By
default Keepalive is set to off. If Keepalive is set
to on and the server becomes very busy, the server can quickly spawn
the maximum number of child processes. In this situation, the server slows down
significantly. If Keepalive is enabled, it is a good idea to set the
the KeepAliveTimeout low and monitor
the /var/log/httpd/error_log log file on the server. This log reports
when the server is running out of child processes.
This
directive sets the maximum number of requests allowed per persistent
connection. The Apache Project recommends a high setting, which improves the
server's performance. MaxKeepAliveRequests is set to 100 by
default, which should be appropriate for most situations.
KeepAliveTimeout sets
the number of seconds the server waits after a request has been served before
it closes the connection. Once the server receives a request,
the Timeout directive applies instead. KeepAliveTimeout is
set to 15 seconds by default.
<IfModule> and </IfModule> tags
create a conditional container which are only activated if the specified module
is loaded. Directives within the IfModule container are processed
under one of two conditions. The directives are processed if the module
contained within the starting <IfModule> tag is loaded. Or, if
an exclamation point [!] appears before the module name, the
directives are processed only if the module specified in
the <IfModule> tag is not loaded.
Apache
HTTP Server 2.0 the responsibility for managing characteristics of the
server-pool falls to a module group called MPMs. The characteristics of the
server-pool differ depending upon which MPM is used. For this reason,
an IfModule container is necessary to define the server-pool for the
MPM in use.
By
default, Apache HTTP Server 2.0 defines the server-pool for both
the prefork and worker MPMs.
The
following a list of directives found within the MPM-specific server-pool
containers.
StartServers sets
how many server processes are created upon startup. Since the Web server
dynamically kills and creates server processes based on traffic load, it is not
necessary to change this parameter. The Web server is set to start 8 server
processes at startup for the prefork MPM and 2 for
the worker MPM.
MaxRequestsPerChild sets
the total number of requests each child server process serves before the child
dies. The main reason for setting MaxRequestsPerChild is to avoid
long-lived process induced memory leaks. The
default MaxRequestsPerChild for theprefork MPM
is 1000 and for the worker MPM is 0.
MaxClients sets
a limit on the total number of server processes, or simultaneously connected
clients that can run at one time. The main purpose of this directive is to keep
a runaway Apache HTTP Server from crashing the operating system. For busy
servers this value should be set to a high value. The server's default is set
to 150 regardless of the MPM in use. However, it is not recommended that the
value for MaxClients exceeds 256 when using
the prefork MPM.
These
values are only used with the prefork MPM. They adjust how the Apache
HTTP Server dynamically adapts to the perceived load by maintaining an
appropriate number of spare server processes based on the number of incoming
requests. The server checks the number of servers waiting for a request and
kills some if there are more than MaxSpareServers or creates some if
the number of servers is less than MinSpareServers.
The
default MinSpareServers value is 5; the
default MaxSpareServers value is 20. These default settings
should be appropriate for most situations. Be careful not to increase
the MinSpareServers to a large number as doing so creates a heavy
processing load on the server even when traffic is light.
These
values are only used with the worker MPM. They adjust how the Apache
HTTP Server dynamically adapts to the perceived load by maintaining an
appropriate number of spare server threads based on the number of incoming
requests. The server checks the number of server threads waiting for a request
and kills some if there are more than MaxSpareThreads or creates some
if the number of servers is less than MinSpareThreads.
The
default MinSpareThreads value is 25; the
default MaxSpareThreads value is 75. These default settings
should be appropriate for most situations. The value
for MaxSpareThreads is must be greater than or equal to the sum
of MinSpareThreads andThreadsPerChild or Apache HTTP Server
automatically corrects it.
This
value is only used with the worker MPM. It sets the number of threads
within each child process. The default value for this directive is 25.
The Listen command
identifies the ports on which the Web server accepts incoming requests. By
default, the Apache HTTP Server is set to listen to port 80 for non-secure Web
communications and (in the /etc/httpd/extra/conf/ssl.conf file which
defines any secure servers) to port 443 for secure Web communications.
If the
Apache HTTP Server is configured to listen to a port under 1024, only the root
user can start it. For port 1024 and above, httpdcan be started as a
regular user.
The Listen directive
can also be used to specify particular IP addresses over which the server
accepts connections.
Include allows
other configuration files to be included at runtime.
The path
to these configuration files can be absolute or relative to
the ServerRoot.
ExtendedStatus
The ExtendedStatus directive
controls whether Apache generates basic (off) or detailed server status
information (on), when theserver-status handler is called.
The Server-status handler is called using Location tags.
IfDefine
The IfDefine tags
surround configuration directives that are applied if the "test"
stated in the IfDefine tag is true. The directives are ignored if the
test is false.
The test
in the IfDefine tags is a parameter name (for
example, HAVE_PERL). If the parameter is defined, meaning that it is
provided as an argument to the server's start-up command, then the test is
true. In this case, when the Web server is started, the test is true and the
directives contained in the IfDefine tags are applied.
The User directive
sets the user name of the server process and determines what files the server
is allowed to access. Any files inaccessible to this user are also inaccessible
to clients connecting to the Apache HTTP Server.
By
default User is set to apache.
Specifies
the group name of the Apache HTTP Server processes.
By
default Group is set to apache.
Sets
the ServerAdmin directive to the email address of the Web server
administrator. This email address shows up in error messages on
server-generated Web pages, so users can report a problem by sending email to
the server administrator.
By
default, ServerAdmin is set to root@localhost.
A common
way to set up ServerAdmin is to set it to webmaster@example.com.
Then alias webmaster to the person responsible for the Web server
in /etc/aliases and run /usr/bin/newaliases.
ServerName specifies
a hostname and port number (matching the Listen directive) for the
server. The ServerName does not need to match the machine's actual
hostname. For example, the Web server may be www.example.com, but the
server's hostname is actuallyfoo.example.com. The value specified
in ServerName must be a valid Domain Name Service (DNS) name that can
be resolved by the system do not make something up.
The
following is a sample ServerName directive:
ServerName
www.example.com:80
When
specifying a ServerName, be sure the IP address and server name pair are
included in the /etc/hosts file.
When set
to on, this directive configures the Apache HTTP Server to reference
itself using the value specified in the ServerName and
Port directives. When UseCanonicalName is set to off, the
server instead uses the value used by the requesting client when referring to
itself.
UseCanonicalName is
set to off by default.
The DocumentRoot is
the directory which contains most of the HTML files which are served in
response to requests. The default DocumentRoot for both the non-secure and
secure Web servers is the /var/www/html directory. For example, the
server might receive a request for the following document:
http://example.com/foo.html
The
server looks for the following file in the default directory:
/var/www/html/foo.html
<Directory
/path/to/directory> and </Directory> tags create a
container used to enclose a group of configuration directives which apply only
to a specific directory and its subdirectories. Any directive which is
applicable to a directory may be used within Directory tags.
By default,
very restrictive parameters are applied to the root directory (/), using
the Options and AllowOverride directives. Under this
configuration, any directory on the system which needs more permissive settings
has to be explicitly given those settings.
In the
default configuration, another Directory container is configured for
the DocumentRoot which assigns less rigid parameters to the directory
tree so that the Apache HTTP Server can access the files residing there.
The Directory container
can be also be used to configure additional cgi-bin directories for
server-side applications outside of the directory specified in
the ScriptAlias directive
To
accomplish this, the Directory container must set
the ExecCGI option for that directory.
For
example, if CGI scripts are located in /home/my_cgi_directory, add the
following Directory container to the httpd.conf file:
<Directory
/home/my_cgi_directory>
Options +ExecCGI
</Directory>
Next,
the AddHandler directive must be uncommented to identify files with the .cgi extension
as CGI scripts.
For this
to work, permissions for CGI scripts, and the entire path to the scripts, must
be set to 0755.
The Options directive
controls which server features are available in a particular directory. For
example, under the restrictive parameters specified for the root
directory, Options is set to only FollowSymLinks. No features
are enabled, except that the server is allowed to follow symbolic links in the
root directory.
By
default, in the DocumentRoot directory, Options is set to
include Indexes and FollowSymLinks. Indexes permits
the server to generate a directory listing for a directory if
no DirectoryIndex (for example, index.html) is
specified. FollowSymLinks allows the server to follow symbolic links
in that directory.
The AllowOverride directive
sets whether any Options can be overridden by the declarations in
an .htaccess file. By default, both the root directory and
the DocumentRoot are set to allow no .htaccess overrides.
The Order directive
controls the order in which allow and deny directives are
evaluated. The server is configured to evaluate theAllow directives before
the Deny directives for the DocumentRoot directory.
Allow specifies
which client can access a given directory. The client can be all, a domain
name, an IP address, a partial IP address, a network/netmask pair, and so on.
The DocumentRoot directory is configured to Allow requests
from all, meaning everyone has access.
Deny works
similar to Allow, except it specifies who is denied access.
The DocumentRoot is not configured to Deny requests from
anyone by default.
UserDir is
the subdirectory within each user's home directory where they should place
personal HTML files which are served by the Web server. This directive is set
to disable by default.
The name
for the subdirectory is set to public_html in the default
configuration. For example, the server might receive the following request:
http://example.com/~username/foo.html
The
server would look for the file:
/home/username/public_html/foo.html
In the
above example, /home/username/ is the user's home directory (note
that the default path to users' home directories may vary).
Make sure
that the permissions on the users' home directories are set correctly. Users'
home directories must be set to 0711. The read (r) and execute (x) bits must be
set on the users' public_html directories (0755 also works). Files
that are served in a users'public_html directories must be set to at least
0644.
The DirectoryIndex is
the default page served by the server when a user requests an index of a
directory by specifying a forward slash (/) at the end of the directory name.
When a
user requests the page http://example/this_directory/, they get either
the DirectoryIndex page if it exists or a server-generated directory
list. The default for DirectoryIndex is index.html and
the index.html.var type map. The server tries to find either of these
files and returns the first one it finds. If it does not find one of these
files and Options Indexes is set for that directory, the server
generates and returns a listing, in HTML format, of the subdirectories and
files within the directory, unless the directory listing feature is turned off.
AccessFileName names
the file which the server should use for access control information in each
directory. The default is.htaccess.
Immediately
after the AccessFileName directive, a set of Files tags
apply access control to any file beginning with a .ht. These directives
deny Web access to any .htaccess files (or other files which begin
with .ht) for security reasons.
By
default, the Web server asks proxy servers not to cache any documents which
were negotiated on the basis of content (that is, they may change over time or
because of the input from the requester). If CacheNegotiatedDocs is
set to on, this function is disabled and proxy servers are allowed to such
cache documents.
TypesConfig names
the file which sets the default list of MIME type mappings (file name
extensions to content types). The defaultTypesConfig file
is /etc/mime.types. Instead of editing /etc/mime.types, the recommended
way to add MIME type mappings is to use the AddType directive.
DefaultType sets
a default content type for the Web server to use for documents whose MIME types
cannot be determined. The default is text/plain.
HostnameLookups can
be set to on, off or double. If HostnameLookups is
set to on, the server automatically resolves the IP address for each
connection. Resolving the IP address means that the server makes one or more
connections to a DNS server, adding processing overhead. If HostnameLookups is
set to double, the server performs a double-reverse DNS look up adding
even more processing overhead.
To
conserve resources on the server, HostnameLookups is set
to off by default.
If
hostnames are required in server log files, consider running one of the many
log analyzer tools that perform the DNS lookups more efficiently and in bulk
when rotating the Web server log files.
ErrorLog specifies
the file where server errors are logged. By default, this directive is set
to /var/log/httpd/error_log.
LogLevel sets
how verbose the error messages in the error logs are. LogLevel can be
set (from least verbose to most verbose)
toemerg, alert, crit, error, warn, notice, info or debug.
The default LogLevel is warn.
The LogFormat directive
configures the format of the various Web server log files. The
actual LogFormat used depends on the settings given in
the CustomLog directive
The
following are the format options if the CustomLog directive is set
to combined:
%h (remote
host's IP address or hostname)
Lists the
remote IP address of the requesting client. If HostnameLookups is set
to on, the client hostname is recorded unless it is not available from
DNS.
%l (rfc931)
Not used.
A hyphen [-] appears in the log file for this field.
%u (authenticated
user)
If
authentication was required, lists the user name of the user is recorded.
Usually, this is not used, so a hyphen [-] appears in the log file
for this field.
%t (date)
Lists the
date and time of the request.
%r (request
string)
Lists the
request string exactly as it came from the browser or client.
%s (status)
Lists the
HTTP status code which was returned to the client host.
%b (bytes)
Lists the
size of the document.
%\"%{Referer}i\" (referrer)
Lists the
URL of the webpage which referred the client host to Web server.
%\"%{User-Agent}i\" (user-agent)
Lists the
type of Web browser making the request.
CustomLog identifies
the log file and the log file format. By default, the log is recorded to
the /var/log/httpd/access_log file.
The
default CustomLog format is combined. The following illustrates
the combined log file format: remotehost rfc931 user date
"request" status bytes referrer user-agent
The ServerSignature directive
adds a line containing the Apache HTTP Server server version and the ServerName to
any server-generated documents, such as error messages sent back to
clients. ServerSignature is set to on by default.
It can
also be set to off or to EMail. EMail, adds
a mailto:ServerAdmin HTML tag to the signature line of auto-generated
responses.
The Alias setting
allows directories outside the DocumentRoot directory to be
accessible. Any URL ending in the alias automatically resolves to the alias'
path. By default, one alias for an icons/ directory is already set
up. An icons/ directory can be accessed by the Web server, but the
directory is not in the DocumentRoot.
The ScriptAlias directive
defines where CGI scripts are located. Generally, it is not good practice to
leave CGI scripts within theDocumentRoot, where they can potentially be viewed
as text documents. For this reason, a special directory outside of
theDocumentRoot directory containing server-side executables and scripts is
designated by the ScriptAlias directive. This directory is known as
a cgi-bin and set to /var/www/cgi-bin/ by default.
Redirect
When a
webpage is moved, Redirect can be used to map the file location to a
new URL. The format is as follows:
Redirect
/<old-path>/<file-name>
http://<current-domain>/<current-path>/<file-name>
In this
example, replace <old-path> with the old path information
for <file-name> and <current-domain> and <current-path> with
the current domain and path information for <file-name>.
In this
example, any requests for <file-name> at the old location is
automatically redirected to the new location.
IndexOptions controls
the appearance of server generated directing listings, by adding icons, file
descriptions, and so on. If Options Indexes is set, the Web server
generates a directory listing when the Web server receives an HTTP request for
a directory without an index.
First,
the Web server looks in the requested directory for a file matching the names
listed in the DirectoryIndex directive (usually,index.html). If
an index.html file is not found, Apache HTTP Server creates an HTML
directory listing of the requested directory. The appearance of this directory
listing is controlled, in part, by the IndexOptions directive.
The
default configuration turns on FancyIndexing. This means that a user can
re-sort a directory listing by clicking on column headers. Another click on the
same header switches from ascending to descending
order. FancyIndexing also shows different icons for different files,
based upon file extensions.
The AddDescription option,
when used in conjunction with FancyIndexing, presents a short description
for the file in server generated directory listings.
IndexOptions has
a number of other parameters which can be set to control the appearance of
server generated directories. Parameters
include IconHeight and IconWidth, to make the server include
HTML HEIGHT and WIDTH tags for the icons in server
generated webpages; IconsAreLinks, for making the icons act as part of the
HTML link anchor along with the filename and others.
This
directive names icons which are displayed by files with MIME encoding in server
generated directory listings. For example, by default, the Web server shows
the compressed.gif icon next to MIME encoded x-compress and x-gzip
files in server generated directory listings.
This
directive names icons which are displayed next to files with MIME types in
server generated directory listings. For example, the server shows the
icon text.gif next to files with a mime-type of text, in server
generated directory listings.
AddIcon specifies
which icon to show in server generated directory listings for files with
certain extensions. For example, the Web server is set to show the
icon binary.gif for files
with .bin or .exe extensions.
DefaultIcon specifies
the icon displayed in server generated directory listings for files which have
no other icon specified. Theunknown.gif image file is the default.
When
using FancyIndexing as an IndexOptions parameter,
the AddDescription directive can be used to display user-specified
descriptions for certain files or file types in a server generated directory
listing. The AddDescription directive supports listing specific
files, wildcard expressions, or file extensions.
ReadmeName names
the file which, if it exists in the directory, is appended to the end of server
generated directory listings. The Web server first tries to include the file as
an HTML document and then try to include it as plain text. By default, ReadmeName is
set toREADME.html.
HeaderName names
the file which, if it exists in the directory, is prepended to the start of
server generated directory listings. LikeReadmeName, the server tries to
include it as an HTML document if possible or in plain text if not.
IndexIgnore lists
file extensions, partial file names, wildcard expressions or full filenames.
The Web server does not include any files which match any of those parameters
in server generated directory listings.
AddEncoding names
filename extensions which should specify a particular encoding
type. AddEncoding can also be used to instruct some browsers to
uncompress certain files as they are downloaded.
AddLanguage associates
file name extensions with specific languages. This directive is useful for
Apache HTTP Servers which serve content in multiple languages based on the
client Web browser's language settings.
LanguagePriority sets
precedence for different languages in case the client Web browser has no
language preference set.
Use
the AddType directive to define or override a default MIME type and
file extension pairs. The following example directive tells the Apache HTTP
Server to recognize the .tgz file extension: AddType
application/x-tar .tgz
AddHandler maps
file extensions to specific handlers. For example,
the cgi-script handler can be matched with the extension .cgito
automatically treat a file ending with .cgi as a CGI script. The
following is a sample AddHandler directive for
the .cgi extension.
AddHandler
cgi-script .cgi
This
directive enables CGIs outside of the cgi-bin to function in any
directory on the server which has the ExecCGI option within the directories
container.
Action specifies
a MIME content type and CGI script pair, so that whenever a file of that media
type is requested, a particular CGI script is executed.
The ErrorDocument directive
associates an HTTP response code with a message or a URL to be sent back to the
client. By default, the Web server outputs a simple and usually cryptic error
message when an error occurs. The ErrorDocument directive forces the
Web server to instead output a customized message or page.
The BrowserMatch directive
allows the server to define environment variables and take appropriate actions
based on the User-Agent HTTP header field — which identifies the client's Web
browser type. By default, the Web server uses BrowserMatch to deny
connections to specific browsers with known problems and also to disable
keepalives and HTTP header flushes for browsers that are known to have problems
with those actions.
The <Location> and </Location> tags
create a container in which access control based on URL can be specified.
For
instance, to allow people connecting from within the server's domain to see
status reports, use the following directives:
<Location
/server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from <.example.com>
</Location>
Replace <.example.com> with
the second-level domain name for the Web server.
To
provide server configuration reports (including installed modules and
configuration directives) to requests from inside the domain, use the following
directives:
<Location
/server-info>
SetHandler server-info
Order deny,allow
Deny from all
Allow from <.example.com>
</Location
Again,
replace <.example.com> with the second-level domain name for
the Web server.
To
configure the Apache HTTP Server to function as a proxy server, remove the hash
mark (#) from the beginning of the <IfModule mod_proxy.c> line,
the ProxyRequests, and each line in the <Proxy> stanza. Set
the ProxyRequests directive to On, and set which domains are
allowed access to the server in the Allow from directive of
the <Proxy> stanza.
<Proxy
*> and </Proxy> tags create a container which encloses
a group of configuration directives meant to apply only to the proxy server.
Many directives which are allowed within
a <Directory> container may also be used
within <Proxy> container.
The ProxyVia command
controls whether or not an HTTP Via: header line is sent along with requests or
replies which go through the Apache proxy server. The Via: header shows the
hostname if ProxyVia is set to On, shows the hostname and the
Apache HTTP Server version for Full, passes along any Via: lines unchanged
for Off, and Via: lines are removed for Block.
A number
of commented cache directives are supplied by the default Apache HTTP Server
configuration file. In most cases, uncommenting these lines by removing the
hash mark (#) from the beginning of the line is sufficient. The following,
however, is a list of some of the more important cache-related directives.
CacheEnable —
Specifies whether the cache is a disk, memory, or file descriptor cache. By
default CacheEnable configures a disk cache for URLs at or
below /.
CacheRoot —
Specifies the name of the directory containing cached files. The
default CacheRoot is the /var/httpd/proxy/directory.
CacheSize —
Specifies how much space the cache can use in kilobytes. The
default CacheSize is 5 KB.
The
following is a list of some of the other common cache-related directives.
CacheMaxExpire —
Specifies how long HTML documents are retained (without a reload from the
originating Web server) in the cache. The default is 24 hours
(86400 seconds).
CacheLastModifiedFactor —
Specifies the creation of an expiry (expiration) date for a document which did
not come from its originating server with its own expiry set. The
default CacheLastModifiedFactor is set to 0.1, meaning that the
expiry date for such documents equals one-tenth of the amount of time since the
document was last modified.
CacheDefaultExpire —
Specifies the expiry time in hours for a document that was received using a
protocol that does not support expiry times. The default is set
to 1 hour (3600 seconds).
NoProxy —
Specifies a space-separated list of subnets, IP addresses, domains, or hosts
whose content is not cached. This setting is most useful for Intranet sites.
The NameVirtualHost directive
associates an IP address and port number, if necessary, for any name-based
virtual hosts. Name-based virtual hosting allows one Apache HTTP Server to
serve different domains without using multiple IP addresses.
To enable
name-based virtual hosting, uncomment the NameVirtualHost configuration
directive and add the correct IP address. Then add
more VirtualHost containers for each virtual host.
<VirtualHost> and </VirtualHost> tags
create a container outlining the characteristics of a virtual host.
The VirtualHostcontainer accepts most configuration directives.
A
commented VirtualHost container is provided in httpd.conf, which
illustrates the minimum set of configuration directives necessary for each
virtual host.
SetEnvIf sets
environment variables based on the headers of incoming connections. It
is not solely an SSL directive, though it is present in the
supplied /etc/httpd/extra/conf/ssl.conf file. It's purpose in this
context is to disable HTTP keepalive and to allow SSL to close the connection
without a close notify alert from the client browser. This setting is necessary
for certain browsers that do not reliably shut down the SSL connection.
Apchectl
Commnads
Apachectl: This is short for apache
server control interface to help admin manage the http daemon.
This
utility includes a variety of commands for starting, stopping, checking httpd
status and running syntax tests.
Apachectl
start:
Start
command starts the httpd daemon. An error message displays if httpd is already
running.
Restart:
If httpd
is running the restart command restart the daemon, automatically checking the
configuration files as in configtest to make sure the daemon doesn’t die. If
the daemon is not running this control will start it.
Graceful:
The
command will start the httpd daemon if it is not running. It allows current
connections to continue before restarting the http daemon.
configtest:
Command
carries out a configuration syntax test. If passes the configuration files and
returns either syntax ok (or) detailed information about the syntax error.
While this command can’t check if the configuration file what you expect them
to do, it does make sure all configuration syntax is correct.
Full
status:
Command
provides a status report from mod-status. We will need to have both a text
based browser such as syntax and mod-status installed on the server if you want
to use this command to report on the web server’s status.
Status:
It will
provide a brief status report similar to full status.
Apache
Redirects
What you are trying to accomplish here is to have one resource (either a
page or an entire site) redirect a visitor to a completely different page or
site, and while doing so tell the visitor's browser that the redirect is either
permanent (301) or temporary (302).
Therefore you need to do three things:
Have 2 resources - one source page or website, and one destination page
or website.
When an attempt to access the source resource is made, the webserver
transfers the visitor to the destination instead.
During the transfer, the webserver reports to the visitor that a
redirect is happening and it's either temporary or permanent.
The ability to control the "status" argument in the redirect
directive (which sets whether it's a 301 or 302) within Apache is only
available in version 1.2 and above. You are best off using version 2 or above
for maximum stability, security and usefulness.
301 Redirect
A function of a web server that redirects the visitor from the current
page or site to another page or site, while returning a response code that says
that the original page or site has been permanently moved to the new
location. Search engines like this information and will readily transfer link popularity
(and PageRank) to the new site quickly and with few issues. They are also not
as likely to cause issues with duplication filters. SEOs like 301 redirects,
and they are usually the preferred way to deal with multiple domains pointing
at one website.
302 Redirect
A function of a web server that redirects the visitor from the current
page or site to another page or site, while returning a response code that says
that the original page or site has been temporarily moved to the new
location. Search engines will often interpret these as a park, and take their
time figuring out how to handle the setup. Try to avoid a 302 redirect on your
site if you can (unless it truly is only a temporary redirect), and never use
them as some form of click tracking for your outgoing links, as they can result
in a "website hijacking" under some circumstances.
mod_rewrite
Mod_Rewrite is an Apache extension module which will allow URL's to be
rewritten on the fly. Often this is used by SEOs to convert dynamic URL's with
multiple query strings into static URL's. An example of this would be to
convert the dynamic URL
domain.com/search.php?day=31&month=may&year=2005 to
domain.com/search-31-may-2005.htm
htaccess
htaccess (Hypertext Access) is the default name of Apache's
directory-level configuration file. It provides the ability to customize
configuration directives defined in the main configuration file. You can
execute a mod_rewrite script using the .htaccess file.
httpd.conf
Apache is configured by placing directives in plain text configuration
files. The main configuration file is usually called httpd.conf. The location
of this file is set at compile-time, but may be overridden with the -f command
line flag. In addition, other configuration files may be added using the
Include directive, and wildcards can be used to include many configuration
files. Any directive may be placed in any of these configuration files. Changes
to the main configuration files are only recognized by Apache when it is
started or restarted.
Redirection (302)
A default redirection function of IIS that redirects the visitor from
the current page or site to another page or site, while returning a response
code that says that the original page or site has been temporarily moved
to the new location. Search engines will often interpret these as a park, and
take their time figuring out how to handle the setup. Try to avoid a 302
redirect on your site if you can (unless it truly is only a temporary redirect),
and never use them as some form of click tracking for your outgoing links, as
they can result in a "website hijacking" under some circumstances.
Permanent Redirection (301)
An optional function of IIS that redirects the visitor from the current
page or site to another page or site, while returning a response code that says
that the original page or site has been permanently moved to the new
location. Search engines like this information and will readily transfer link
popularity (and PageRank) to the new site quickly and with few issues. They are
also not as likely to cause issues with duplication filters. SEOs like 301
redirects, and they are usually the preferred way to deal with multiple domains
pointing at one website.
Mod_Rewrite and the Apache Redirect
If you have the mod_rewrite extension installed (it comes with most
Apache installs as a default) you can use it to dynamically change URL's using
arguments on the fly - this is NOT a 301 redirect, but rather it's related
behavior. For example, if you wanted to redirect .htm files from an
old server to their equivalent .php files on a new one using a 301
redirect, you would use a combination of mod_rewrite and the redirect directive
to do the redirection + URL change.
You could do it on a file by file basis by making a really long list of
possible redirects in the .htaccess file by hand without mod_rewrite, but that
would be a real pain on a server with a lot of files, or a completely dynamic
system. Therefore these 2 functions are often used together.
Syntax for a 301 Redirect
The syntax for the redirect directive is:
Redirect /yourdirectory http://www.newdomain.com/newdirectory
If the client requests http://myserver/service/foo.txt, it will be
told to access http://www.yourdomain.com/service/foo.txt instead.
Note: Redirect directives take precedence over Alias and
ScriptAlias directives, irrespective of their ordering in the configuration
file. Also, URL-path must be a fully qualified URL, not a relative
path, even when used with .htaccess files or inside of <Directory>
sections.
If you use the redirect without the status argument, it will return a
status code of 302 by default. This default behaviour has given me problems
over the years as an SEO, so it's important to remember to use it, like this:
Redirect permanent /one http://www.newdomain.com/two
or
Redirect 301 /two http://www.newdomain.com/other
Both of which will return the 301 status code. If you wanted to return a
302 you could either not specify anything, or use "302" or
"temp" as the status argument above.
You can also use 2 other directives - RedirectPermanent URL-path
URL (returns a 301 and works the same as Redirect permanent /URL PathURL)
and RedirectTemp URL-path URL (same, but for a 302 status).
For more global changes, you would use redirectMatch, with the same
syntax:
RedirectMatch 301 ^(.*)$ http://www.newdomain.com
or
RedirectMatch permanent ^(.*)$ http://www.newdomain.com
These arguments will match any file requested at the old account, change
the domain, and redirect it to the file of the same name at the new account.
You would use these directives in either the .htaccess file or the httpd
file. It's most common to do it in the .htaccess file because it's the easiest
and doesn't require a restart, but the httpd method has less overhead and works
fine, as well.
Simple Domain 301 Redirect Checklist
This assumes you just have a new domain (with no working pages under it)
and want it to redirect properly to your main domain.
1. Ensure that you have 2 accounts - the old site and the new site
(they do not have to be on different IP's or different machines).
2. Your main (proper or canonical) site should be pointed at the
new site using DNS. All your other domains should be pointed at the old site
using DNS. Parking them there is fine at this point.
3. Find the .htaccess file at the root of your old account. Yes, it
starts with a "." We will be working with this file. The new site does
not need any changes made to it - the old site does all the redirection work.
4. Download the .htaccess file and open it in a text only editor.
5a. Add this code:
Redirect 301 / http://www.newdomain.com/
6. Then upload the file to your root folder and test your new
redirect. Make you you also check it using a HTTP Header viewer just to be sure
it shows as a 301.
Control Panel Method
cPanel redirect
Log into your cPanel, and look for "Redirects" under Site
Management
Put in the current directory into the first box
Put the new directory in the second box
Choose the type (temporary or permanent) temporary=302 and permanent=301
Click "Add" and you're done
You can only do 302 redirects (or frame forwarding - bad!) using
the Plesk control panel - use .htaccess for 301's instead.
If you use Ensim, the only way to redirect is by using the
.htaccess file (no control panel option at this time).
Basic Old Website to New Website Redirection
This is used when you have an existing website (with pages) and want to
move it to a new domain, while keeping all your page names and the links to
them.
1. Ensure that you have 2 websites - the old site and the new site,
and that they are on different accounts (they do not have to be on different
IP's or different machines).
2. Your main (proper or canonical) site should be pointed at the
new site using DNS. All your old domains should be pointed at the old site
using DNS.
3. Find the .htaccess file at the root of your old account. Yes, it
starts with a "." We will be working with this file. The new
site does not need any changes made to it - the old site does all the
redirection work.
4. Download the .htaccess file and open it in a text only editor.
5a. If you have mod_rewrite installed, add this code:
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} !^newdomain\.com
RewriteRule ^(.*)$ http://www.newdomain.com/$1 [R=301,L]
5b. If you don't have mod_rewrite installed, you really should. If
you can't install it, then you can use this code instead:
RedirectMatch 301 ^(.*)$ http://www.newdomain.com
6. Then upload the file to your root folder and test your new
redirect. Make you you also check it using a HTTP Header viewer just to be sure
it shows as a 301.
FrontPage on Apache
After you've done the basic Apache 301 redirection described in this
article, you will also need to change the .htaccess files in:
_vti_bin
_vti_bin /_vti_adm
_vti_bin/ _vti_aut
Replace "Options None" to "Options +FollowSymLinks"
Those folders are part of your FrontPage extensions on the server, so
you will have to use FTP to get to them, since FrontPage hides these folders by
default to prevent them from accidentally being messed with by novice users.
More Complicated Redirects
You can't use a control panel in Apache currently for these - .htaccess
only.
Redirecting everything to a single page
This is common when you are totally changing the new website from the
old and you just want all your links and requests form the old site to be
directed to a spot on your new site (usually the home page). You actually need
to do it on a page by page basis.
Redirect 301 /oldfile1.htm http://www.newdomain.com
Redirect 301 /oldfile2.htm http://www.newdomain.com
Redirect 301 /oldfile3.htm http://www.newdomain.com
Redirection while changing the filename
This example will redirect all the files on the old account that end in
html to the same file on the new account, but with a php extension. You can
also use this technique within the same account if you want to change
all your extensions but don't want to lose your incoming links to the old
pages. This is common when people switch to from static htm files to
dynamic ones while keeping the same domain name, for example.
Just change the "html" and "php" parts of the below
example to your specific situation, if needed.
RedirectMatch 301 (.*)\.html$ http://www.newdomain.com$1.php
Redirection while changing the filename, but keeping the GET arguments
Sometimes, you will want to change to a different CMS, but keep your
database the same, or you want to switch everything but you like the arguments
and don't want to change them.
RedirectMatch 301 /oldcart.php(.*)
http://www.newdomain.com/newcart.php$1
This will result in
"http://www.olddomain.com/oldcart.php?Cat_ID=Blue" being redirected
to "http://www.newdomain.com/newcart.php?Cat_ID=Blue"
URL
Rewriting
Most dynamic sites include variables in their URLs that tell the site
what information to show the user. Typically, this gives URLs like the
following, telling the relevant script on a site to load product number 7.
http://www.pets.com/show_a_product.php?product_id=7
The problems with this kind of URL structure are that the URL is not at
all memorable. It's difficult to read out over the phone (you'd be surprised
how many people pass URLs this way). Search engines and users alike get no
useful information about the content of a page from that URL. You can't tell
from that URL that that page allows you to buy a Norwegian Blue Parrot (lovely
plumage). It's a fairly standard URL - the sort you'd get by default from most
CMSes. Compare that to this URL:
http://www.pets.com/products/7/
Clearly a much cleaner and shorter URL. It's much easier to remember,
and vastly easier to read out. That said, it doesn't exactly tell anyone what
it refers to. But we can do more:
http://www.pets.com/parrots/norwegian-blue/
Now we're getting somewhere. You can tell from the URL, even when it's
taken out of context, what you're likely to find on that page. Search engines
can split that URL into words (hyphens in URLs are treated as spaces by search
engines, whereas underscores are not), and they can use that information to
better determine the content of the page. It's an easy URL to remember and to
pass to another person.
Unfortunately, the last URL cannot be easily understood by a server
without some work on our part. When a request is made for that URL, the server
needs to work out how to process that URL so that it knows what to send back to
the user. URL rewriting is the technique used to "translate" a URL
like the last one into something the server can understand.
Platforms and Tools
Depending on the software your server is running, you may already have
access to URL rewriting modules. If not, most hosts will enable or install the
relevant modules for you if you ask them very nicely.
Apache is the easiest system to get URL rewriting running on. It usually
comes with its own built-in URL rewriting module, mod_rewrite, enabled, and
working with mod_rewrite is as simple as uploading correctly formatted and
named text files.
IIS, Microsoft's server software, doesn't include URL rewriting
capability as standard, but there are add-ons out there that can provide this
functionality. ISAPI_Rewrite is the one I recommend working with, as I've so
far found it to be the closest to mod_rewrite's functionality. Instructions for
installing and configuring ISAPI_Rewrite can be found at the end of this
article.
The code that follows is based on URL rewriting using mod_rewrite.
Basic URL Rewriting
To begin with, let's consider a simple example. We have a website, and
we have a single PHP script that serves a single page. Its URL is:
http://www.pets.com/pet_care_info_07_07_2008.php
We want to clean up the URL, and our ideal URL would be:
http://www.pets.com/pet-care/
In order for this to work, we need to tell the server to internally
redirect all requests for the URL "pet-care" to
"pet_care_info_07_07_2008.php". We want this to happen internally,
because we don't want the URL in the browser's address bar to change.
To accomplish this, we need to first create a text document called
".htaccess" to contain our rules. It must be named exactly that (not
".htaccess.txt" or "rules.htaccess"). This would be placed
in the root directory of the server (the same folder as
"pet_care_info_07_07_2008.php" in our example). There may already be
an .htaccess file there, in which case we should edit that rather than
overwrite it.
The .htaccess file is a configuration file for the server. If there are
errors in the file, the server will display an error message (usually with an
error code of "500"). If you are transferring the file to the server
using FTP, you must make sure it is transferred using the ASCII mode, rather
than BINARY. We use this file to perform 2 simple tasks in this instance -
first, to tell Apache to turn on the rewrite engine, and second, to tell apache
what rewriting rule we want it to use. We need to add the following to the
file:
RewriteEngine On # Turn on the rewriting engine
RewriteRule ^pet-care/?$ pet_care_info_01_02_2008.php [NC,L] #
Handle requests for "pet-care"
A couple of quick items to note - everything following a hash symbol in
an .htaccess file is ignored as a comment, and I'd recommend you use comments
liberally; and the "RewriteEngine" line should only be used once per
.htaccess file (please note that I've not included this line from here onwards
in code example).
The "RewriteRule" line is where the magic happens. The line
can be broken down into 5 parts:
RewriteRule - Tells Apache that this like refers to a
single RewriteRule.
^/pet-care/?$ - The "pattern". The server
will check the URL of every request to the site to see if this pattern matches.
If it does, then Apache will swap the URL of the request for the "substitution"
section that follows.
pet_care_info_01_02_2003.php - The
"substitution". If the pattern above matches the request, Apache uses
this URL instead of the requested URL.
[NC,L] - "Flags", that tell Apache how to
apply the rule. In this case, we're using two flags. "NC", tells
Apache that this rule should be case-insensitive, and "L" tells
Apache not to process any more rules if this one is used.
# Handle requests for "pet-care" - Comment
explaining what the rule does (optional but recommended)
The rule above is a simple method for rewriting a single URL, and is the
basis for almost all URL rewriting rules.
Patterns and Replacements
The rule above allows you to redirect requests for a single URL, but the
real power of mod_rewrite comes when you start to identify and rewrite groups
of URLs based on patterns they contain.
Let's say you want to change all of your site URLs as described in the
first pair of examples above. Your existing URLs look like this:
http://www.pets.com/show_a_product.php?product_id=7
And you want to change them to look like this:
http://www.pets.com/products/7/
Rather than write a rule for every single product ID, you of course
would rather write one rule to manage all product IDs. Effectively you want to
change URLs of this format:
http://www.pets.com/show_a_product.php?product_id={a number}
And you want to change them to look like this:
http://www.pets.com/products/{a number}/
In order to do so, you will need to use "regular expressions".
These are patterns, defined in a specific format that the server can understand
and handle appropriately. A typical pattern to identify a number would look
like this:
[0-9]+
The square brackets contain a range of characters, and "0-9"
indicates all the digits. The plus symbol indicates that the pattern will
idenfiy one or more of whatever precedes the plus - so this pattern effectively
means "one or more digits" - exactly what we're looking to find in
our URL.
The entire "pattern" part of the rule is treated as a regular
expression by default - you don't need to turn this on or activate it at all.
RewriteRule ^products/([0-9]+)/?$ show_a_product.php?product_id=$1 [NC,L]
# Handle product requests
The first thing I hope you'll notice is that we've wrapped our pattern
in brackets. This allows us to "back-reference" (refer back to) that
section of the URL in the following "substitution" section. The
"$1" in the substitution tells Apache to put whatever matched the
earlier bracketed pattern into the URL at this point. You can have lots of
backreferences, and they are numbered in the order they appear.
And so, this RewriteRule will now mean that Apache redirects all requests
for domain.com/products/{number}/ to show_a_product.php?product_id={same
number}.
Regular Expressions
A complete guide to regular expressions is rather beyond the scope of
this article. However, important points to remember are that the entire pattern
is treated as a regular expression, so always be careful of characters that are
"special" characters in regular expressions.
The most instance of this is when people use a period in their pattern.
In a pattern, this actually means "any character" rather than a
literal period, and so if you want to match a period (and only a period) you
will need to "escape" the character - precede it with another special
character, a backslash, that tells Apache to take the next character to be
literal.
For example, this RewriteRule will not just match the URL
"rss.xml" as intended - it will also match "rss1xml",
"rss-xml" and so on.
RewriteRule ^rss.xml$ rss.php [NC,L] # Change feed URL
This does not usually present a serious problem, but escaping characters
properly is a very good habit to get into early. Here's how it should look:
RewriteRule ^rss\.xml$ rss.php [NC,L] # Change feed URL
This only applies to the pattern, not to the substitution. Other
characters that require escaping (referred to as "metacharacters")
follow, with their meaning in brackets afterwards:
. (any character)
* (zero of more of the preceding)
+ (one or more of the preceding)
{} (minimum to maximum quantifier)
? (ungreedy modifier)
! (at start of string means "negative
pattern")
^ (start of string, or "negative" if at the
start of a range)
$ (end of string)
[] (match any of contents)
- (range if used between square brackets)
() (group, backreferenced group)
| (alternative, or)
\ (the escape character itself)
Using regular expressions, it is possible to search for all sorts of
patterns in URLs and rewrite them when they match. Time for another example -
we wanted earlier to be able to indentify this URL and rewrite it:
http://www.pets.com/parrots/norwegian-blue/
And we want to be able to tell the server to interpret this as the
following, but for all products:
http://www.pets.com/get_product_by_name.php?product_name=norwegian-blue
And we can do that relatively simply, with the following rule:
RewriteRule ^parrots/([A-Za-z0-9-]+)/?$
get_product_by_name.php?product_name=$1 [NC,L]
# Process parrots
With this rule, any URL that starts with "parrots" followed by
a slash (parrots/), then one or more (+) of any combination of letters, numbers
and hyphens ([A-Za-z0-9-]) (note the hyphen at the end of the selection of
characters within square brackets - it must be added there to be treated
literally rather than as a range separator). We reference the product name in
brackets with $1 in the substitution.
We can make it even more generic, if we want, so that it doesn't matter
what directory a product appears to be in, it is still sent to the same script,
like so:
RewriteRule ^[A-Za-z-]+/([A-Za-z0-9-]+)/?$
get_product_by_name.php?product_name=$1 [NC,L] # Process all products
As you can see, we've replaced "parrots" with a pattern that
matches letter and hyphens. That rule will now match anything in the parrots
directory or any other directory whose name is comprised of at least one or
more letters and hyphens.
Flags
Flags are added to the end of a rewrite rule to tell Apache how to
interpret and handle the rule. They can be used to tell apache to treat the
rule as case-insensitive, to stop processing rules if the current one matches,
or a variety of other options. They are comma-separated, and contained in
square brackets. Here's a list of the flags, with their meanings (this
information is included on the cheat sheet, so no need to try to learn them
all).
C (chained with next rule)
CO=cookie (set specified cookie)
E=var:value (set environment variable var to value)
F (forbidden - sends a 403 header to the user)
G (gone - no longer exists)
H=handler (set handler)
L (last - stop processing rules)
N (next - continue processing rules)
NC (case insensitive)
NE (do not escape special URL characters in output)
NS (ignore this rule if the request is a subrequest)
P (proxy - i.e., apache should grab the remote
content specified in the substitution section and return it)
PT (pass through - use when processing URLs with
additional handlers, e.g., mod_alias)
R (temporary redirect to new URL)
R=301 (permanent redirect to new URL)
QSA (append query string from request to substituted
URL)
S=x (skip next x rules)
T=mime-type (force specified mime type)
Moving Content
RewriteRule ^article/?$ http://www.new-domain.com/article/
[R,NC,L] # Temporary Move
Adding an "R" flag to the flags section changes how a
RewriteRule works. Instead of rewriting the URL internally, Apache will send a
message back to the browser (an HTTP header) to tell it that the document has
moved temporarily to the URL given in the "substitution" section.
Either an absolute or a relative URL can be given in the substitution section.
The header sent back includea a code - 302 - that indicates the move is
temporary.
RewriteRule ^article/?$ http://www.new-domain.com/article/
[R=301,NC,L] # Permanent Move
If the move is permanent, append "=301" to the "R"
flag to have Apache tell the browser the move is considered permanent. Unlike
the default "R", "R=301" will also tell the browser to
display the new address in the address bar.
This is one of the most common methods of rewriting URLs of items that
have moved to a new URL (for example, it is in use extensively on this site to
forward users to new post URLs whenever they are changed).
Conditions
Rewrite rules can be preceded by one or more rewrite conditions, and
these can be strung together. This can allow you to only apply certain rules to
a subset of requests. Personally, I use this most often when applying rules to
a subdomain or alternative domain as rewrite conditions can be run against a
variety of criteria, not just the URL. Here's an example:
RewriteCond %{HTTP_HOST} ^addedbytes\.com [NC] RewriteRule ^(.*)$
http://www.addedbytes.com/$1 [L,R=301]
The rewrite rule above redirects all requests, no matter what for, to
the same URL at "www.addedbytes.com". Without the condition, this
rule would create a loop, with every request matching that rule and being sent
back to itself. The rule is intended to only redirect requests missing the
"www" URL portion, though, and the condition preceding the rule
ensures that this happens.
The condition operates in a similar way to the rule. It starts with
"RewriteCond" to tell mod_rewrite this line refers to a condition.
Following that is what should actually be tested, and then the pattern to test.
Finally, the flags in square brackets, the same as with a RewriteRule.
The string to test (the second part of the condition) can be a variety
of different things. You can test the domain being requested, as with the above
example, or you could test the browser being used, the referring URL (commonly
used to prevent hotlinking), the user's IP address, or a variety of other
things (see the "server variables" section for an outline of how
these work).
The pattern is almost exactly the same as that used in a RewriteRule,
with a couple of small exceptions. The pattern may not be interpreted as a
pattern if it starts with specific characters as described in the following
"exceptions" section. This means that if you wish to use a regular
expression pattern starting with <, >, or a hyphen, you should escape
them with the backslash.
Rewrite conditions can, like rewrite rules, be followed by flags, and
there are only two. "NC", as with rules, tells Apache to treat the
condition as case-insensitive. The other available flag is "OR". If
you only want to apply a rule if one of two conditions match, rather than
repeat the rule, add the "OR" flag to the first condition, and if
either match then the following rule will be applied. The default behaviour, if
a rule is preceded by multiple conditions, is that it is only applied if all
rules match.
Exceptions and Special Cases
Rewrite conditions can be tested in a few different ways - they do not
need to be treated as regular expression patterns, although this is the most
common way they are used. Here are the various ways rewrite conditons can be
processed:
<Pattern (is test string lower than pattern)
>Pattern (is test string greater than pattern)
=Pattern (is test string equal to pattern)
-d (is test string a valid directory)
-f (is test string a valid file)
-s (is test string a valid file with size greater
than zero)
-l (is test string a symbolic link)
-F (is test string a valid file, and accessible (via
subrequest))
-U (is test string a valid URL, and accessible (via
subrequest))
Server Variables
Server variables are a selection of items you can test when writing
rewrite conditions. This allows you to apply rules based on all sorts of
request parameters, including browser identifiers, referring URL or a multitude
of other strings. Variables are of the following format:
%{VARIABLE_NAME}
And "VARIABLE_NAME" can be replaced with any one of the
following items:
HTTP Headers
HTTP_USER_AGENT
HTTP_REFERER
HTTP_COOKIE
HTTP_FORWARDED
HTTP_HOST
HTTP_PROXY_CONNECTION
HTTP_ACCEPT
Connection Variables
REMOTE_ADDR
REMOTE_HOST
REMOTE_USER
REMOTE_IDENT
REQUEST_METHOD
SCRIPT_FILENAME
PATH_INFO
QUERY_STRING
AUTH_TYPE
Server Variables
DOCUMENT_ROOT
SERVER_ADMIN
SERVER_NAME
SERVER_ADDR
SERVER_PORT
SERVER_PROTOCOL
SERVER_SOFTWARE
Dates and Times
TIME_YEAR
TIME_MON
TIME_DAY
TIME_HOUR
TIME_MIN
TIME_SEC
TIME_WDAY
TIME
Special Items
API_VERSION
THE_REQUEST
REQUEST_URI
REQUEST_FILENAME
IS_SUBREQ
Working With Multiple Rules
The more complicated a site, the more complicated the set of rules
governing it can be. This can be problematic when it comes to resolving
conflicts between rules. You will find this issue rears its ugly head most
often when you add a new rule to a file, and it doesn't work. What you may
find, if the rule itself is not at fault, is that an earlier rule in the file
is matching the URL and so the URL is not being tested against the new rule
you've just added.
RewriteRule ^([A-Za-z0-9-]+)/([A-Za-z0-9-]+)/?$
get_product_by_name.php?category_name=$1&product_name=$2 [NC,L] #
Process product requests
RewriteRule ^([A-Za-z0-9-]+)/([A-Za-z0-9-]+)/?$ get_blog_post_by_title.php?category_name=$1&post_title=$2
[NC,L] # Process blog posts
In the example above, the product pages of a site and the blog post
pages have identical patterns. The second rule will never match a URL, because
anything that would match that pattern will have already been matched by the
first rule.
There are a few ways to work around this. Several CMSes (including
wordpress) handle this by adding an extra portion to the URL to denote the type
of request, like so:
RewriteRule ^products/([A-Za-z0-9-]+)/([A-Za-z0-9-]+)/?$
get_product_by_name.php?category_name=$1&product_name=$2
[NC,L] # Process product requests
RewriteRule ^blog/([A-Za-z0-9-]+)/([A-Za-z0-9-]+)/?$
get_blog_post_by_title.php?category_name=$1&post_title=$2 [NC,L]# Process
blog posts
You could also write a single PHP script to process all requests, which
checked to see if the second part of the URL matched a blog post or a product.
I usually go for this option, as while it may increase the load on the server
slightly, it gives much cleaner URLs.
RewriteRule
^([A-Za-z0-9-]+)/([A-Za-z0-9-]+)/?$ get_product_or_blog_post.php?category_name=$1&item_name=$2
[NC,L] # Process product and blog requests
There are certain situations where you can work around this issue by
writing more precise rules and ordering your rules intelligently. Imagine a
blog where there were two archives - one by topic and one by year.
RewriteRule ^([A-Za-z0-9-]+)/?$ get_archives_by_topic.php?topic_name=$1
[NC,L] # Get archive by topic
RewriteRule ^([A-Za-z0-9-]+)/?$ get_archives_by_year.php?year=$1
[NC,L] # Get archive by
The above rules will conflict. Of course, years are numeric and only 4
digits, so you can make that rule more precise, and by running it first the
only type of conflict you cound encounter would be if you had a topic with a
4-digit number for a name.
RewriteRule ^([0-9]{4})/?$ get_archives_by_year.php?year=$1
[NC,L] # Get archive by year RewriteRule ^([A-Za-z0-9-]+)/?$
get_archives_by_topic.php?topic_name=$1 [NC,L] # Get archive by topic
mod_rewrite
Apache's mod_rewrite comes as standard with most Apache hosting
accounts, so if you're on shared hosting, you are unlikely to have to do
anything. If you're managing your own box, then you most likely just have to
turn on mod_rewrite. If you are using Apache1, you will need to edit your
httpd.conf file and remove the leading '#' from the following lines:
#LoadModule rewrite_module modules/mod_rewrite.so #AddModule
mod_rewrite.c
If you are using Apache2 on a Debian-based distribution, you need to run
the following command and then restart Apache:
sudo a2enmod rewrite
Other distubutions and platforms differ. If the above instructions are
not suitable for your system, then Google is your friend. You may need to edit
your apache2 configuration file and add "rewrite" to the
"APACHE_MODULES" list, or edit httpd.conf, or even download and
compile mod_rewrite yourself. For the majority, however, installation should be
simple.
ISAPI_Rewrite
ISAPI_Rewrite is a URL rewriting plugin for IIS based on mod_rewrite and
is not free. It performs most of the same functionality as mod_rewrite, and
there is a good quality ISAPI_Rewrite forum where most common questions are
answered. As ISAPI_Rewrite works with IIS, installation is relatively simple -
there are installation instructions available.
ISAPI_Rewrite rules go into a file named httpd.ini. Errors will go into
a file named httpd.parse.errors by default.
Leading Slashes
I have found myself tripped up numerous times by leading slashes in URL
rewriting systems. Whether they should be used in the pattern or in the
substitution section of a RewriteRule or used in a RewriteCond statement is a
constant source of frustration to me. This may be in part because I work with
different URL rewriting engines, but I would advise being careful of leading
slashes - if a rule is not working, that's often a good place to start looking.
I never include leading slashes in mod_rewrite rules and always include them in
ISAPI_Rewrite.
Sample Rules
To redirect an old domain to a new domain:
RewriteCond %{HTTP_HOST} old_domain\.com [NC] RewriteRule ^(.*)$
http://www.new_domain.com/$1 [L,R=301]
To redirect all requests missing "www" (yes www):
RewriteCond %{HTTP_HOST} ^domain\.com [NC] RewriteRule ^(.*)$
http://www.domain.com/$1 [L,R=301]
To redirect all requests with "www" (no www):
RewriteCond %{HTTP_HOST} ^www\.domain\.com [NC] RewriteRule ^(.*)$
http://domain.com/$1 [L,R=301]
Redirect old page to new page:
RewriteRule ^old-url\.htm$ http://www.domain.com/new-url.htm
[NC,R=301,L]
.htaccess
Error Documents
In Apache, you can set up each directory on your server individually, giving them different properties or requirements for access. And while you can do this through normal Apache configuration, some hosts may wish to give users the ability to set up their own virtual server how they like. And so we have .htaccess files, a way to set Apache directives on a directory by directory basis without the need for direct server access, and without being able to affect other directories on the same server.
One up-side of this (amongst many) is that with a few short lines in an .htaccess file, you can tell your server that, for example, when a user asks for a page that doesn't exist, they are shown a customized error page instead of the bog-standard error page they've seen a million times before. If you visit http://www.addedbytes.com/random_made_up_address then you'll see this in action - instead of your browser's default error page, you see an error page sent by my server to you, telling you that the page you asked for doesn't exist.
This has a fair few uses. For example, my 404 (page not found) error page also sends me an email whenever somebody ends up there, telling me which page they were trying to find, and where they came from to find it - hopefully, this will help me to fix broken links without needing to trawl through mind-numbing error logs.
[Aside: If you set up your custom error page to email you whenever a page isn't found, remember that "/favicon.ico" requests failing doesn't mean that a page is missing. Internet Explorer 5 assumes everyone has a "favicon" and so asks the server for it. It's best to filter error messages about missing "/favicon.ico" files from your error logging, if you plan to do any.]
Setting up your htaccess file is a piece of cake. First things first, open notepad (or better yet, [url=http://www.editplus.com/]EditPlus2[/url]), and add the following to a new document:
ErrorDocument 404 /404.html
Next you need to save the file. You need to save it as ".htaccess". Not ".htaccess.txt", or "mysite.htaccess" - just ".htaccess". I know it sounds strange, but that is what these files are - just .htaccess files. Nothing else. Happy? If not, take a look at this [url=http://wsabstract.com/howto/htaccess.shtml].htaccess guide[/url], which also explains the naming convention of .htaccess in a little more depth. If you do use Notepad, you may need to rename the file after saving it, and you can do this before or after uploading the file to your server.
Now, create a page called 404.html, containing whatever you want a visitor to your site to see when they try to visit a page that doesn't exist. Now, upload both to your website, and type in a random, made-up address. You should, with any luck, see your custom error page instead of the traditional "Page Not Found" error message. If you do not see that, then there is a good chance your server does not support .htaccess, or it has been disabled. I suggest the next thing you do is check quickly with your server administrator that you are allowed to use .htaccess to serve custom error pages.
If all went well, and you are now viewing a custom 404 (page not found) error page, then you are well on your way to a complete set of error documents to match your web site. There are more errors out there, you know, not just missing pages. Of course, you can also use PHP, ASP or CFML pages as error documents - very useful for keeping track of errors.
You can customize these directives a great deal. For example, you can add directives for any of the status codes below, to show custom pages for any error the server may report. You can also, if you want, specify a full URL instead of a relative one. And if you are truly adventurous, you could even use pure HTML in the .htaccess file to be displayed in case of an error, as below. Note that if you want to use HTML, you must start the HTML with a quotation mark, however you should not put one at the other end of the HTML (you can include quotation marks within the HTML itself as normal).
ErrorDocument 404 "Ooops, that page was <b>not found</b>. Please try a different one or <a href="mailto:owner@site.com">email the site owner</a> for assistance.
Server response codes
A server reponse code is a three digit number sent by a server to a user in response to a request for a web page or document. They tell the user whether the request can be completed, or if the server needs more information, or if the server cannot complete the request. Usually, these codes are sent 'silently' - so you never see them, as a user - however, there are some common ones that you may wish to set up error pages for, and they are listed below. Most people will only ever need to set up error pages for server codes 400, 401, 403, 404 and 500, and you would be wise to always have an error document for 404 errors at the very least.
It is also relatively important to ensure that any error page is over 512 bytes in size. Internet Explorer 5, when sent an error page of less than 512 bytes, will display its own default error document instead of your one. Feel free to use padding if this is an issue - personally, I'm not going to increase the size of a page because Internet Explorer 5 doesn't behave well.
In order to set up an error page for any other error codes, you simply add more lines to your .htaccess file. If you wanted to have error pages for the above five errors, your .htaccess file might look something like this:
ErrorDocument 400 /400.html
ErrorDocument 401 /401.html
ErrorDocument 403 /403.html
ErrorDocument 404 /404.html
ErrorDocument 500 /500.html
Thanks for the useful information on apache. This makes me easier to learn apache.
ReplyDelete