1) General Security
(i) Choose your sources carefully
One of the great aspects of PHP is the sheer volume of articles, tutorials and code snippets available on the internet. The problem with the internet as an informative resource is it's simply not reliable. Anyone can publish anything and there's no opportunity for other readers to highlight any issues the article has missed or, more crucially, correct any mistakes made by the author. Fortunately this is less of a problem nowadays, as most websites allow readers to post comments and voice any criticisms. However this is still an important factor to remember when using a search engine to find resources - always verify your source, read any user posted comments and if appropriate, (e.g. if you find conflicting sources), stay with reputable websites. Additionally, with the fast evolving nature of PHP, ensure the article is not outdated.
(ii) Keep updated
No matter how secure your own code is, if you are using any third party software, make sure you keep up to date. There's nothing wrong with integrating other scripts with your own but unless you ensure that code is also secure by fixing any vulnerabilities, you are still at risk. This is even more important once a bug fix is officially documented and released, for obvious reasons. This applies to everything you're running on your server - including Apache and even PHP itself.
(iii) Protect your server
Dedicated and VPS servers are becoming increasingly popular. Unfortunately, knowledge and understanding of server administration is not. If you are unsure how to secure your server, seek guidance from your hosting provider and research your options or better yet, hire someone to handle that for you.
(iv) Password complexity
And of course, always use a complex password - no matter how secure you make your code, if someone finds out your password, they can do everything you can do. For instance, if you've coded yourself a wonderfully powerful and secure backend to manage your entire collection of websites and then you protect it with a password of "apple", the chances are it will not last very long.
2) Coding Secure PHP
Never, ever assume anything. If you follow good coding standards anyway, this should not be a new concept to you. There are two different types of assumptions commonly and mistakenly made when coding - firstly, that PHP will behave in a particular manner (in terms of the configuration, extensions enabled and functions available) and secondly that the contents of a particular variable will always be what you are expecting.
(i) Cross platform compatibility
Another way of describing the first assumption is the environment. If you are coding a particular script for your own personal use only, this becomes almost irrelevant. Problems arise when you create scripts to be distributed and you do not know how the script will behave in other environments, i.e. on other servers. As you write your code, you need to be aware of the functions you are using. Always check the manual (see php.net) and read the documentation carefully. Most crucially, make a note of the requirements which are listed in parenthesis below the function name. If you use functions that require a particular version of PHP, make sure you list that version or higher as a requirement to run your script when releasing it. You should also check the return values and ensure your script can handle anything that can be returned, not merely what you are expecting to be returned.
Now you may be wondering where this explanation of simply good coding practices becomes a security issue and it is in error reporting. Every error message contains a full path to the problematic script, which is useful for debugging but not so helpful when it is displayed to all your visitors.
Other environmental factors include the handling of particular file extensions - if you use a different extension for particular parts of your application, for example .inc for your configuration, you need to ensure that the server the script is running on will not display .inc files as plain text files and reveal sensitive data. The easiest method around this is to use always use .php and if you really want to keep a separate extension, use .inc.php.
(i) Incoming data
Regardless of your environment, the other assumption will always be an issue. Every time you use data from an external source (POST, GET, COOKIE, SERVER, ENV), you need to sanitize it. Never think "what will this variable contain"; always think "what might this variable contain and how could that be used maliciously"; (if you want to paraphrase that thought process into something a little less awkward, feel free to do so). Let's start with the most obvious:
$page = $_GET['page'];
include 'pages/' . $page;
Going back to the overlap with coding standards, we do not know if the page= variable will be set in the query string, nor do we know if the requested file exists. A better method to avoid any error messages would be:
$page = empty($_GET['page']) ? 'index.php' : $_GET['page'];
if ( file_exists('pages/' . $page) )
include 'pages/' . $page;
else
include 'pages/404.php';
You may think that will allow any page in the 'pages' directory to be loaded, using the value of page from the URL. And you would be right to think so, but it will also allow absolutely any file to be loaded. By using "..", you can move up a directory and access any page you want. This has serious implications if the user is able to go up beyond the webroot and execute scripts or read files that are normally inaccessible - such as your password files for http authentication! You may never intend to link to any page that is not in the format myscript.php?page=somepage.php but do not think that that is all $_GET['page'] can contain. Instead you should be aware that a malicious user could call the script as myscript.php?page=../../somewhere/ishouldnot.be and so you must code accordingly. The next improvement would be to run a str_replace() on the incoming GET value and strip out any instances of ".." but let's go one step further for total peace of mind and say the only files it should be including are .php ones, and the only valid values for the page variable are alphanumeric strings:
if ( empty($_GET['page']) ) {
$page = 'index';
} else {
$page = preg_replace('/[^a-zA-Z0-9]/', '', $_GET['page']);
}
if ( file_exists('pages/' . $page . '.php') )
include 'pages/' . $page . '.php';
else
include 'pages/404.php';
Note: I have written out this example in full for readability but this entire code snippet could be condensed to two lines using the ternary operator. I mention this because it shows how simple it is to convert a huge vulnerability into a piece of secure, functional code.
The principle shown here applies to every GET, POST or COOKIE value you use in your scripts. Also be aware that the request headers sent by the browser will be available in the SERVER array - this includes the user agent and referrer. Always assume the worst case and create your code to cope with any unexpected input either by throwing an error or resorting to a default value. Do not be tempted to think that using POST data is more secure than GET data. It's not. All (cookies included) can easily be manipulated by a user to contain anything the user wants.
There are two common types of attack that rely on manipulating the data sent to a script. The first of which is effectively the same thing as I've already described, except rather than relating to the filesystem it exploits databases.
(a) SQL injection
SQL injection refers to the method of inserting malicious code into a query. This occurs when you use input data directly in a query and the resulting statement is not as intended. For instance, consider the following query:
SELECT * FROM table WHERE id = $id
Assume $id contains the value of id=42 from the URL. The query that is executed becomes:
SELECT * FROM table WHERE id = 42
Now let's say we have a malicious user wanting to cause trouble. If they call the script with id=42; DROP DATABAS.E dbname, the query becomes two queries:
SELECT * FROM table WHERE id = 42; DROP DATABAS.E dbname
The implications of this are obvious: by using unsanitized input in your queries, anyone can execute any query possible. This includes both read and write operations, which could lead to your entire database being deleted as illustrated. However, deletion is not the worst of it - you can have your most sensitive data read, overwritten or even additional false data inserted. For example, if you run a membership site that requires payment, you could have users simply creating their own accounts by directly inserting rows into your database. The problems need no further explanation but what can you do to prevent it?
As you may have noticed, I opted for a generic example that only illustrated the queries themselves. The reason I did not show you the corresponding PHP code is because this major problem is not so significant when working with PHP and mySQL. According to the manual, the function mysql_query() does not support multiple queries - the reasoning behind this is, I assume, to help prevent SQL injection attacks of this nature. By only permitting one query per mysql_query() call, the worst attacks can be prevented. As this also hinders developers wishing to send multiple queries, this behaviour may go the same way as magic quotes (which was another safeguard against malicious inputs but has since been removed). Either way, even with this protection you are by no means safe from SQL injections.
Let us now look at a basic member system. There is a members table in the database that stores the username and password for each account. Our login page looks up the submitted username and password in the database - if found, the user is logged in. Here is the code we could use (this code is to illustrate the vulnerability only, it is not intended to be a good example of how to code a login system):
// Assume the script has already connected to the database
// Check submitted data exists
if ( empty($_POST['username']) || empty($_POST['password']) ) {
die('Please fill in your username and password.');
}
// Convert to lowercase (username is not case sensitive)
$username = strtolower(trim($_POST['username']));
// Hash the password
$password = md5(trim($_POST['password']));
// Lookup in database
if ( ! $result = mysql_query("SELECT * FROM members WHERE LOWER(username) = '$username' AND password = '$password' LIMIT 1") ) {
die('MySQL error.');
}
// Process login or display failure message
if ( mysql_num_rows($result) == 1 ) {
// Successful login
} else {
// Failed login
}
You should be able to spot the problem with this code immediately. The password is fine, we know the return value for the md5() function is always a 32 character hexadecimal string. Regardless of what is submitted as the password, there can be no problems with that part. However, we do not validate the username at all - simply convert it to lowercase and remove unnecessary spaces. Under normal circumstances, I am sure you can imagine what the query would look like but what if I enter the following as my login credentials?
Username: admin' #
Password: apple
The resulting query that is executed is:
SELECT * FROM members WHERE LOWER(username) = 'admin' #' AND password = '1f3870be274f6c49b3e31a0c6728957f' LIMIT 1
And as # symbol marks a comment, everything after it is ignored. In effect, the query we are now running is simply:
SELECT * FROM members WHERE LOWER(username) = 'admin'
A malicious user could therefore log in to any account they desire simply by knowing the username! Another variant is injecting a OR 1=1 clause into the query, and since 1 always equals 1, the where conditions are satisfied.
Most crucially, the character causing all this fuss is the single quote '. We could protect ourselves from SQL injection by simply replacing the ' with it's HTML entity equivalent, or more commonly, escaping it with a backslash. The mySQL extension actually contains a function to do all this for us - we only need to pass our data through the mysql_real_escape_string() function to ensure it is safe to use in a query. Going back to our example, if we pass $username through this mysql_real_escape_string(), the resulting query is:
SELECT * FROM members WHERE LOWER(username) = 'admin\' #' AND password = '1f3870be274f6c49b3e31a0c6728957f' LIMIT 1
It now must look through the database for an account with username admin' # and the correct matching password. There is one other useful function for making data safe for use with queries and that is intval(). Obviously it is only applicable to numeric data but this function will always return a number - if the value passed in is numeric (or begins with a number), it returns that number and otherwise returns 0. If we were to use this on our first example, whatever we added after our "42" will be ignored. As you often will be relying on numeric values, intval() is a fast and easy way of keeping your script secure.
(b) Session hijacking
Sessions rely on a unique session ID for each visitor. The ID usually corresponds to a temporary file on the webserver used to store all the session data, although the underlying technology is irrelevant as PHP handles that for us. Every time a request is made, the session ID is sent to the server (either in a cookie or as a GET parameter in the URL query string) and the server looks up the corresponding session data for that particular user. With PHP, this data is then available in the $_SESSION array. All this does is allow us to maintain data across multiple page requests for a particular user. Every website that allows you to login will be using sessions.
A problem occurs if someone else gets hold of your session ID. All a malicious user needs to become you is your session ID - if they then send it to the server as if it were their own, the server will assign all your session data to the hacker. Effectively, they will be logged in as you. You want to ensure that the session ID you receive is the same ID as you sent out to the same user, and no one else.
The bad news is there is not a quick fix for this problem. The entire point of sessions is that they allow you to maintain state across a stateless protocol - that is, each request to a server occurs separately and has no relation to any previous or future request. Since we have no way of identifying the user, we have no way of determining if a session is hijacked.
Luckily there are factors that do, for the most part, remain constant. It is highly likely that the IP address of the user and the browser being used (i.e. user agent) will remain constant throughout the session. Therefore if we add to our session code a quick check that compares the values of the current request with previously stored values, we can ensure the user does not have their session stolen. The downside to this method is that the IP address and user agent can legitimately (albeit rarely) change throughout a session and any users experiencing this would then become logged out, or whatever else you choose to do with suspected hijacks. As with all security precautions involving user actions, you need to find the right balance between security and functionality. If there is nothing to be lost if a session does get hijacked, there is no need to add this to your applications. However if you're handling sensitive data you will want to be as safe as possible. The following example illustrates how to protect against session hijacking:
// Activate PHPs session handling
session_start();
// Check for stored IP address / user agent
if ( isset($_SESSION['authenticate']) ) {
if ( md5($_SERVER['REMOTE_ADDR'] . $_SERVER['USER_AGENT']) != $_SESSION['authenticate'] ) {
die('Suspected session hijack.');
unset($_SESSION);
session_destroy();
}
} else {
// Create a store a md5 hash of identifying data
$_SESSION['authenticate'] = md5($_SERVER['REMOTE_ADDR'] . $_SERVER['USER_AGENT']);
}
Despite any additional security you put in place, you can never guarantee it will be enough. For the most important actions, the only way to ensure the current user is a valid user is to require them to input their password again. (Even this is not foolproof but if an account is illegally accessed in this way, it will not be the fault of your coding). Each time you hassle them to re-enter their password, you are negatively affecting their browsing experience. Once again, it is up to you to strike an appropriate balance between security and functionality.