What are the mechanics behing storing/checking website passwords?

Assume that I’m going to be building my own website that has users that need to login and that security is paramount (like maybe storing CC#s). I have the basic coding down with HTML5, CSS3 and javascript and I’m not looking for the quick and dirty overview on how passwords work or how it’s better to use SHA-256 hash but rather the in-depth mechanics of when a user registers/logins on my website.

How does the site store the data on my server? Does it use SQL? Do I have to write my own database or is it part of the package? Do I have to code the lookup/retrieve functions or again is it part of a package? How do I lockout someone without the proper password or maybe the better question is how does a person gain access to a page after login?

Another question is how standard is all of this? Are there one or two techniques/packages that everyone uses or a ton of different ones or everyone writes their own?

Another is if data is stored with the security data or if that is a separate database. Yet another is are there questions I should be asking?

From the sound of it you’re pretty clear on the client-side technologies (HTML5, CSS3) but are unclear about how the server side fits together. Is that a fair assessment?

No, you never need to write your own database. Almost everyone uses one of the standard ones like MySQL or MongoDB or something of the like if they need a database. Almost no one who needs a database needs to roll their own.

Also, if you are doing password security right, I don’t see any major reason why the hashes couldn’t be stored in the same database as everything else.

There are “standard configurations” for servers, like LAMP (Linux/Apache/MySQL/PHP), which I think is a little dated. More modern are web frameworks of varying complexity like Play or Scalatra. But there’s a wide variety of possible things.

I guess one conceptual thing is when you go to “http://www.example.com/cow/moo/”, the request does not have to be fulfilled by returning the contents of an html file in some “/cow/moo/” directory on the server. The server can be set up so “/cow/moo/” can mean “read in some template and fill in the values from some DB before sending the result out”, or even “run some function that generates the returned data from scratch”.

I’m not a web app developer, but I am an IT security guy, so I can talk in more general terms.

The proper way to store passwords is by using a hash function, otherwise known as a one-way or trap door function. So you take the plaintext password input, put it through a secure hash function, and you get back a “hash” of the password.

Hash functions are known as one-way functions because plaintext can pass through the function into a hash value, but there’s no good way to take that hash value and use it to get back to the original plaintext. But a hash function will generate the same hash value for the same plaintext value every time.

Hash functions will always produce the same length hash for any cleartext value. So whether my password is 1 character long or 1,000,000 characters long, the hash function will generate a fixed length value. Since you are taking an infinite number of input values and turning it into a hash that’s a finite number of characters long, you can see why it would be impossible to take that finite hash and turn it back into one of an infinite number of values.

So, from a web site perspective, the first time a user enters a password for the site, the web app takes that password, hashes it, and stores it in a database. The next time the user logs in, the system hashes the newly entered password and compares it to the stored hash value. If the stored hash matches the newly computed hash, you let the user in because they gave you the same password.

There are some deeper concepts in hashing. One is the idea of hash collisions. Since you are taking infinite input and producing finite output, there are going to be many input values that would produce the same hash. Understand that this doesn’t make the hash function insecure by any means. It is like any branch of crypto in that you are hiding secrets in plain site, knowing that the math needed to turn the hash into something useful is extreme.

Another hash concept is the idea of adding a salt value to the hashing mechanism. To salt the hash, you add a value to the hash algorithm that is known. Why? Hashes can be precalculated, and in many cases, have been precalculated. Example: my password is “password” and hashed it becomes “1q2w3e4r5t”. An attacker can take a list of commonly used passwords (called a dictionary) and hash all of those words. If I break into your system and steal your password hashes, I already know that “1q2w3e4r5t” is the output of the hashing mechanism for the word “password”, so if I find that string in any password hash fields, I know that the password is “password”.

To make those calculations more difficult, the hashing mechanism may add a salt, one of X values. This value is stored in the clear with the hash, but it makes the precalculation of the hashes much more difficult. My password of “password” may become “1q2w3e4r5t” with a salt value of one, but it will become something different with a salt of 4096. So now an attacker must precalculate multiple values for each plaintext, which makes this “dictionary attack” much more costly from a processing standpoint.

To answer your questions:

How does the site store the data on my server? It can be stored in any fashion (UNIX passwords are still, in many cases, stored in a plain text file), but a database is the right tool for this.

Does it use SQL? It can. Most modern web apps use some back-end database to store all important information.

Do I have to write my own database or is it part of the package? Grab MySQL or something similar and use that. Writing a new database is significantly non-trivial.

Do I have to code the lookup/retrieve functions or again is it part of a package? You’ll pull the user authentication information out of the database using a standard query and use it to compare hashes, set privileges, etc.

How do I lockout someone without the proper password or maybe the better question is how does a person gain access to a page after login? A lockout value should be stored in your database, so every time a user gets a password wrong, that lockout value increments until either the user gets in, or you lock them out because that value got too large. The user gains access when new password hash value matches the stored hash value.

Another question is how standard is all of this? Very standard. There are lots of frameworks out there that have all of this stuff already written and tested.

Are there one or two techniques/packages that everyone uses or a ton of different ones or everyone writes their own? For password hashing, everyone uses one of just a few studied and vetted hash algorithms. Most of these have already been coded in just about any language you’d care to use, so you can just drop a pre-made chunk of code into your program. NEVER NEVER attempt to write your own encryption methods.

Another is if data is stored with the security data or if that is a separate database. This depends on how you design your system. Generally, it would be the same database.

Yet another is are there questions I should be asking? If you are looking to process credit cards, the project will get a lot more complex unless you use a third-party (like Paypal) to process for you. As soon as you touch credit card information, you are required to go through the PCI-DSS process to ensure you are doing the right things to protect the data, and that can get to be a HUGE burden.

So it looks like in addition to jQuery and AJAX, I should add SQL to my reading list. Oh and maybe I can convince Mrs Cad that this summer I have to start my M.Eng in CS with a specialization in IT security (she vetoed it until I turn in my dissertation and pay off my loans).

It’s also worth noting that storing the full PAN (credit card number, expiration date, etc) data would violate the PCI-DSS, which is the industry standard established by the major card brands.

You can hold minimized portions of the card number, or you can tokenize the number so that it can’t be easily compromised, but holding the data in the “raw” is a very serious no-no.

Not necessarily. MongoDB doesn’t use SQL. If your planned website is small enough, you can put your passwords (by which I mean, your password hashes and salt info) into a flat file even.

Also you’re missing out on the server-side application layer stuff, like PHP, JSP, Ruby, Scalata, &c.

What are you actually planning to do with this knowledge? Web stacks are deep and widely varied with lots of different technologies at each layer, so it probably pays to specialize a bit, at least at first.

Don’t store the password file in the web data directory. You don’t want someone to be able to download it with a url like http://stcad.com/info/secure/passwd.txt. Make sure whatever webcontainer you use doesn’t allow people to retrieve files outside the web data dir by creating urls like http://stcad.com/../../../etc/passwd

Look into servlets. They allow you to limit access to pages if authentication is not done. Basically, there is a web.xml that has security constraints on the paths for your application. If someone tries to access a path, the webcontainer will verify that the authentication has been done. If not, it will redirect the user to the login page.

For example, the user will enter ‘https://stcad.com/home.html’. If the user already was authenticated, the home.html page will come up. If not, then the user will be redirected to login.html. Once they enter a valid user/password, they will be redirected to home.html.

The webcontainer will typically have a way for you to plugin an authentication mechanism. This means that you can tell the webcontainer to call your code to validate the user/password. It gives you the user/password, you validate it, and return true or false.

Website security is very easy to get wrong. There are many things that need to be done correctly. It’s very easy for a beginner to make mistakes and open up security holes. You may want to look into one of online hosting providers that has templates for websites. Often, they will have support for password-protected areas and they will manage all of that for you.

How does Amazon do it? When I buy on Amazon, I pick which stored card details I want it to use and it charges the transaction.

They store symmetrically-encrypted credit card details in a manner which is regularly audited by the credit card industry. PCI compliance is not trivial. There’s a reason why everybody that’s not Amazon-sized outsources credit card processing to third parties.

This. If you’re doing it just to learn, fine. But if you’re the sole programmer working on an actual commercial project and hoping to just sort of wing it, it opens you up to a lot of liability when the next LAMP exploit is discovered. Outsourcing it seems far safer… and also potentially opens up more customers who wouldn’t give out their credit cards to some small independent site. Paypal, Amazon, Google, and Square can all process payments for you for a small charge and you don’t ever have to store customer payment details.

Agreed. Don’t do it yourself - the hassles are severe, the exposure dramatic, and the cost of doing it right is not cheap. Use a service, never store CC information on your own servers. You won’t do it right and you’ll be in a never ending battle with hackers.

We can ignore CC#s for now although the discussion on it is interesting, Let’s just say I would be storing data that needs to be private like test results (assuming FERPA or HIPPA), SSN or ethnicity.

Storing data securely is sufficiently complicated that you should be looking for a framework that already does the work (assuming just you, not a team), not doing it yourself. Which is to say, the correct answer is, decide which framework you are going to use and then figure out how that framework deals with it. IE, on the free side, something like Drupal, Cakephp, Codeignitor, Ruby on Rails, etc can all be used to build an ecommerce site that does password hashing correctly, and at various levels of work and customization available to you.

Maybe we are focusing too much on the commerce end of it.
Let’s say I want to have a database with my students grades in it. It needs to be private under FERPA and I don’t want to have to use “I’ve been hacked.” if the information gets out. That is closer to the scenario I am looking for.

That doesn’t really change the answer. It doesn’t matter why the data is private (be it HIPAA, CC#, FERPA, etc), the way of securing it is pretty much the same.

If you’re doing it on your own, I definitely agree with the other posters here: find a framework that does what you need, and follow its method of doing it as closely as possible. There are at least a couple good frameworks out there for every language/server.

Otherwise, you can roll-your-own, but you’ll need to do a lot of reading, or hitting the job market to find someone else who’s done it before.

I’m not a security professional, but I study it a little bit for fun: The hardest part isn’t necessarily the basic theory of it: Encrypt, hash, salt, pepper, authenticate, log, audit, backup, etc. You can learn to do all that.

The hard part is keeping up with all the incessant little details of cryptographic science in an ever-quickening arms race. Today this algorithm was found to be weak under X conditions, tomorrow OpenSSL will have this vulnerability under Y setup, or the day after that module Z is found to have an issue under certain version of Apache, etc.

The privacy standards (CC processing, HIPAA, etc.) usually have specific technical guidelines that give you a good idea of what you need to do (use this firewall, train your staff, don’t let outsiders near your computers, use different user accounts, etc.). They’re either outlined in statute (as in HIPAA) or with thick merchant guidelines whitepapers or online certification quizzes (for PCI compliance) that ask you step-by-step in very detailed questions like “Did you close all unnecessary ports?”. These you can read and supplement with a book or two on general security practices.

But there’s really no way for a single programmer to stay on top of all the daily changes to every security, server, and database-related module out there and patch all your code in time, every time. Each of those projects is maintained by dozens if not hundreds of people, and vulnerabilities are still found with alarming regularity.

It’s not that you yourself can’t do it right, it’s that what’s right today probably won’t be in three months’ time, and it takes a full-time staff to keep up with all that and then patch and test updates. Finding a web host that is already FERPA/HIPAA complaint takes a lot of that grunt work off your shoulders.

As for packages, most of the Web is built on modular components. You can decide how much of it you want to deal with. If your host provides a FERPA-compliant LAMP stack, they will presumably take care of updating all the server and database software for you, and then you only have to worry about your own application following best practices, and maybe updating to new major PHP versions as they become available. If you add another CMS on top of that, your job then becomes to make sure that CMS and any plug-ins are regularly updated – and in that circumstance it’s a double-edged sword because there are both more attackers and more defenders, so new exploits are found quickly and patched quickly. If you roll your own code, only dedicated hackers might try to get through it, but then once they do, you’re on your own in terms of tracking down their impact and fixing the holes. It really just depends…

For frameworks, you can look at student information systems like OpenSIS (both free for your use or paid cloud hosting). They have many competitors. Our university uses the dreadful Oracle PeopleSoft, which is so detestable I refuse to link to it. Or you can look for generic CRM software and determine their FERPA/HIPAA compliance – but again, it also depends on everything else (the entire software stack) that it runs on top.

It’s really up to you how you want to store your information, but yeah, SQL-like databases are the standard way. Non-database approaches (text files, for example) will run into performance issues pretty quickly, if not file-system problems and such. Typically the frameworks will just ask for SQL permissions and do their own database calls. It’s up to you whether you want to keep confidential information on separate, more hardened servers, from your everyday content.

So, in summary: You can code as much or as little of it as you’d like. It’s probably safer to leave the security to the pros because in the end it’s you vs the entire world of hackers.

OK, so since my server-side knowledge is lacking, can I get a quick crash course on frameworks and how they enter into the picture?

I use an automated pharmacy system, run by my HMO, to refill my meds. When I order a refill, one of the automated questions is:
To use the same credit card that you used the last time, with the last four digits of <XXXX>, press 1 now.
So they are certainly storing everybody’s full credit card number and other necessary details.

TLDR: It’s just a pre-written application that can do some portion of what you need, so you don’t have to code it all yourself. You know how Wordpress lets you easily set up a blog without much coding? These aim to be the equivalent of that, but for managing students instead of blogging.

Long version:
So let’s say you want to have a website where you can look up student grades and such. It’s usually built in several layers, like so:

  1. Network (how it connects to the Internet)
  2. Server hardware (varies)
  3. Operating system (*nix, Windows, whatever)
  4. Web server (Apache, IIS, etc)
  5. A database (MySQL, etc.)
  6. A serverside processing language (PHP, Perl, Ruby, MS .Net)
  7. (Optionally) Front-end programming (JavaScript, Flash, Java if you’re really behind)
  8. Your web application that ties it all together with HTML and CSS

A good webhost will usually provide an environment where parts 1 to 6 are available and kept updated for you (by them).

It’s up to you to put steps 5 to 8 together into some sort of usable web app. You can do it from scratch yourself, programming every line of code, every raw SQL lookup, every single submit button, etc.

But the thing is, what you’re doing isn’t terribly unique. Bazillions of schools want to do the same thing. So a cottage industry popped up where companies said “Hey, you don’t have to do all that yourself. Pay us money and use our framework, and we’ll take care of the nitty gritty for you. Then you can just log in and look up the grades that you need to. We’ll take care of the behind the scenes stuff.”

That’s a fully-managed environment, where they program and maintain everything. If you want to retain some control, they can also usually work with you to make custom code changes. Who maintains what, exactly, is to you guys to figure out and put into contract.

In the meantime, the open-source world also decided to make similar frameworks available (OpenSIS is apparently one, or perhaps other open CRMs will do). They’ll give you something that runs on top some sort of LAMP-ish stack (meaning Linux+Apache+MySQL+PHP, the environment that your webhost should provide). Basically a barebones version of steps 5-8 for you to tinker with and build upon, taking care of the basics like how to store all that information in a database, the basic UI forms, login methods and user account storage and all that. Stuff that’s usually left up to you is just tailoring to your school: how your student IDs work, what your school colors are, how your schedule periods are set up, how the departments are set up, who has access to various parts of the system, etc.

Look up “student information system” and “customer relationship management software” and evaluate some yourself to see what they can do. Add “open” or “free” if you don’t want to pay too much for them.