Web Developers Conference: Bringing together great minds and great students

This is the transcript of my talk on Web standards pragmatism: from validation to the real world, presented at the Web Developers Conference, Bristol, 12 November 2008.

Web standards pragmatism: from validation to the real world

Patrick H. Lauke: Going to make up for a bit of lost time. But, I'm famous for rambling on way over the normal timing, so I'll probably be forcibly removed.

Who am I? The formatting's a bit screwy because we had to change laptops, but who am I? I'm Web Editor for the University of Salford, since 2001. So, I'm responsible for anything you see on the outside about the University of Salford. For those who don't know where it is — which I wouldn't blame you for — it's just outside of Manchester.

I'm also co-lead of the Web Standards Project Accessibility Task Force. For those who haven't heard about it, it's kind of a grassroots coalition of people around the world that want to further the use of web standards, et cetera.

I'm an occasional author. I've written a chapter in a book that I think…yep, we're going to give out. Also, write for .net magazine every now and again. I sit on the advisory board there as well, just to make sure that the magazine follows the latest standards and covers the right things.

So, what are web standards? I think — I hope, at least — that most of you in this room know what web standards are. Just raise your hand if you do, or you think you do.

Perfect. I'm still going to go through just a few slides, just as a remedial, just in case we're not quite sure about it.

So, for a lot of people, web standards means "OK, my document validates. Now, I've run it through the W3C checker. It validates." But, there is a little bit more than that. The validator is pretty much like a spell-checker. So, you might be writing something in Word. You run it through the spell-check. It might come up saying, "Yeah, it's perfect, " but it still doesn't mean that it's actually great prose or great literature that you've written. So, it's only one of the many steps.

So, the web standards tenets, really, what it boils down to…three things. Yes, your code should really be valid, so, to published grammar, and that's where the validator comes in — it goes through, makes sure that you've just followed all the right conventions, you've closed the tags that you've opened, et cetera, et cetera. But also, it's about separation of content and presentation. And it's about semantic markup and code.

So, beautiful websites, or whatever you want to define as beautiful websites, can really be created in any kind of technology. And there are a lot of very nice websites from back in the days before web standards was kind of all the rage. And, yeah, of course, you can use tables, you can use fonts and giant images, you can slap loads of Flash into pages. But, why do we bother with web standards?

Mainly because using web standards offers a few advantages. You've got lighter code in the end, once it starts separating things out, remove all the kind of presentational guff from your HTML. It's easier to maintain, particularly if you've got a large number of pages, and maybe even multiple authors. If you just hand a page over to somebody to edit the content, it's a lot more human-readable because, really, all it says inside the HTML is "This is a heading," "This is a paragraph," et cetera. They don't have to really understand all the kind of triple-nested table layouts and which font size you've set for this particular thing and what color they need to define. It's fairly straightforward. And it's also easier to change the look and feel of a page, or an entire site. Once you've split out your CSS, you can quickly change that and change the look and feel of your entire site.

Doing that, as well, you can do multiple outputs. So, you don't have to create printer-friendly pages separately. You can just create a print style sheet and you're done. Or, if you want to make a mobile version, you can quickly do that as well, if you want to present the same content, but in a slightly kind of linearized way, for instance.

Accessibility-wise, if you're using semantic markup and code, you're defining what the content actually is rather than what it looks like, so it can be reinterpreted, so a user — say, the extreme case, a blind user with a screen reader — they can get to the page, and the screen reader can actually extract that semantic information, the structural information, so it can actually say to the user "There are four headings, there are three subheadings, here's a list, there are 20 bullets in this bullet-point list, et cetera, et cetera."

And search-engine optimization. If you're doing semantic markup, Google and other search engines seem to weight things a lot more if you're sticking things, say, in a heading one, a heading two, et cetera…if it finds keywords, it adds a little bit more weight to those. So, if you structure your document properly — obviously, you shouldn't abuse it — but if you structure it properly, your pages will have a higher chance of actually getting higher up in Google.

Maybe not so much nowadays, but there used to be this stigma of web standards pages — they all were very boxy, very simple, it's all just really horrible-looking stuff. The problem with that was, mainly, when we started with web standards, it was mostly the more techie kind of people that started getting into web standards, learning the CSS kind of stuff, and learning how to separate things, how to create layouts with CSS. It was very much the domain of kind of the coders rather than the designers.

But, it has changed. In recent years, a lot of nice websites have emerged. They all use web standards. So, the big ones — Adobe, even Quark. Nice, kind of complex layouts for newspapers. I wouldn't personally say that that's adequate to read on the screen, emulating the same kind of layout that they've got on paper, but you can do quite a lot of stuff. And really, now that designers, or hybrids between designers and coders, have started kind of working in the industry, you're really starting to see a lot of really beautiful sites, and under the hood, they actually use really nice, clean code, CSS, that sort of thing. So, that stigma has kind of, hopefully, gone away nowadays.

So, for those who were lucky enough never to see this before, the traditional, old-school way, before web standards, of doing web pages, really was down to choosing markup purely because of the way the tags kind of look once they're put in the browser. And that brought out a lot of problems, like designers would shy away from using heading ones, because they just said, "Ah, they just look too big in the page. I'm just going to go for a heading four. So, I'll start all my pages with heading fours. Or even just stick a paragraph in there. Nobody cares, really."

Using block quotes to indent stuff. I want a bit of text. I want it slightly off from the margin. Stick it in a block quote, because the browser traditionally shows that slightly indented, use it for that purpose. Or you've got a few paragraphs, you want a bit more space between them…oh, just stick a few more empty paragraphs in between, or maybe a few line breaks. It'll be fine.

And then, you'd sprinkle kind of presentational stuff on top. So, you want to make a heading. You decided, "OK, I'm just going to put a paragraph in there. And I want it big, so I'm going to put a <font size="+3">. And I want it a certain color, so I'll just also add the color itself. And then, later on, I've got a bit of text, there's a bit of an important thing, I'll just make that important bit bold, because all I care, in this case, is about how will it look in the browser."

But, with the web standards, really, with use of web standards, what we want to do is defining what each piece of content is and then what it looks like in the CSS. So, these are two kind of separate processes. We're not mixing and matching anymore in the HTML, how big should text be, should it be bold, et cetera. We just define: what is the meaning of the particular piece of content that I'm marking up, and then how do I present it?

So, these are really distinct tasks, which can be done, also, by different people. So, you could have a designer looking at the actual styling of stuff, overriding the browser defaults for things like heading one, and somebody else who might be a content specialist just looking at the content. They don't have to mess about with the really complex kind of HTML. They can just go in and just say, "This is a heading," "This is a paragraph," et cetera.

So, yeah. Moving away from the example before. If it's the first heading, it is a heading one. If something is important, we just use strong for that one.

So, after that…I know kung fu. It's very easy to say, "OK. I think I know web standards now." But, there's a few common pitfalls that I just want to make you aware of.

So, it's quite easy to say, "OK. I'm doing it the web standards way now. I'm not using font anymore. When, traditionally, I would have used <font size="+3"> and set the color and everything else, now I'm doing web standards and it validates…and all I'm doing instead is I'm sticking spans where before I used to put fonts in." That is not kind of the idea behind it.

Or, I'm trying to be a little bit better. I've got my paragraph. I don't put any kind of empty spans in there anymore. But, I'm still defining the actual styling with CSS, but I'm defining it in the HTML. Well, that kind of goes against the separation of content and presentation. So, here, you might as well go back to using fonts. You're not really doing what you're supposed to do with the separation of content and presentation.

And also, meaningless content. Again, you're thinking, "OK. I'm now using CSS. I'm not defining things, the way they look, with font and stuff." But, all you end up doing is, instead of actually putting a heading in an h1 or h2 or whatever it is, you're just still sticking with paragraphs, but you're now adding classes everywhere. You're saying, "OK. That's a paragraph, but with class="headingbig"."

And again, instead of using bold, I'm not using bold anymore because it doesn't validate, I'll just stick a span in with a class of bold or something, and you define in the CSS that that actually shows it bold, that sort of thing…where, really, again, you want to use the proper semantic markup — h1, strong, et cetera.

Also, it's quite easy to fall in the trap of, yeah, using CSS and defining your classes in your CSS, but you're actually defining them in a very visual way. You're actually calling the class based on the kind of visual effect it has.

So, say we've got a site, we've got a few external links, and you decide you want all your external links to be red, for whatever reason. So, you're linking out to wherever — say, here, the CSS specification, for instance. And you're marking it all up, and then you add a class of red, and in your CSS you actually say, "Anything that has got a class of red, you add color red." And it's great. And it works fine.

And when I started back at Salford as Web Editor, I inherited loads of code that pretty much used kind of this approach of, "Hey, I know CSS. I know how to do it."

The problem happens later when, for, say a redesign, you decide, "OK. Well, sod the red. We're now going to make all our external links green," for instance. But, you don't want to touch your HTML. You just go into your CSS and you just start messing about with things. And all of a sudden, you've got a class that says red, which, in effect, when you apply it, makes things green. So, it starts getting very complicated.

And to solve that, if you want to keep that approach, you'd have to then go back through all your HTML, and through your CSS, rename that class, and then rename all the HTML class="red" into class="green". That's not the idea. You don't want to mess about with the HTML if all you're doing is just changing the styling.

So, really, instead of using class names, or ids — the same applies to ids — based on what things look like, you want to use semantic class or id names. You want to be a bit more descriptive about what that class does or what does it apply to, rather than what visual effect it actually has.

So, in this case before, if we're saying all external links should be red, in our initial design, let's just call it a class of external. So, later on, even if we change the way it looks in the browser, we don't have any conflicts. You'll just know that, "OK. It's a class of external."

A few examples, now, also, of common kind of things that I see when people start with putting CSS into things.

So, here we've got a little navigation bar, a heading, and then a navigation, and then a "more" link at the end. And for those who kind of follow the web standards way, navigation bars and navigation lists usually are marked up as unordered lists.

So, one of the things I see very often is, when you start with CSS, you start adding classes to everything. So, here you say, "OK. All the links, I want to make sure that they're not underlined." So, you mark it up as an unordered list, you put loads of list items in there, and then every single link in there, we've added a class of nav, which we then use to suppress the underline. Well, that's valid code, but it's not really the cleanest way of doing it. This is like you're going around and adding labels to every single little thing.

But, with CSS, you can actually be a lot more clever, and you use the structure to your advantage. And instead of littering your markup, in the end, with loads of classes, which, then again, you have to maintain, just be a little bit more smart with how you do things. So, in this case, instead of putting classes on every link, I've, in this case, added an id to the list, and then in the CSS I just say, "Anything inside the ul with an id of nav, every link, text decoration: none"

So, you're kind of, again, slimming down your HTML. You're not kind of overcharging it with loads and loads of little class names and things.

Again, you start off with learning web standards, and all of a sudden, you think, "OK. Well, tables are out. That's what web standards are all about. You don't use tables for layout anymore. I shouldn't use tables ever anymore."

I'm going to take a little example. That's my personal site, and here, with the news items I've got on my site here. And I've used a table for this particular thing, because I think it is tabular data: you've got the titles in one column, and you've got the data in another column. But, if you start talking to people who've just started with web standards, they think, "Tables are evil," and they start really thinking about bizarre ways of not having to use a table, even down to actually replicating a structure of a table, but just using divs, spans, or definition lists, that sort of thing…they kind of take it to an extreme…or like double, triple-nested, unordered lists, just for the sake of saying, "OK. I'm not using a table anymore. I'm actually doing it the web standards way."

Well, that's not particularly true. As I said, tabular data, as in that case, is best served being marked up as a table. It's the best way to define a relationship where you've got your rows and your columns. And also, for accessibility reasons. If, say, again, in the extreme case of a screen-reader user, they have very special keystrokes to navigate their way through tables so they can jump to the cell above, cell below, to the right, to the left. Once you start using very bizarre constructs and style them with CSS to look like a table, but they're not really a table, they haven't got that ability to directly jump, say, one row up, one row down, while staying in the same column.

So, again, tables aren't evil if you're using them to mark up tabular data. Think of an Excel spreadsheet. Anything you'd put in there, it's pretty much a safe bet that you'd want to mark it up as a table.

I call them fluff images. The traditional way: say you've got a little error message, and you want to have a little warning triangle in front of it. Again, a lot of people would end up doing something like just put an image in there, maybe float it to the left or however you want to do it, and then you just put the normal text in there. I've got an empty alt there, because it's just a visual image. And that's a valid enough approach.

Then, you'd start thinking, "OK. Well, I can actually do all that stuff with CSS as well. Doing it the web standards way, I have to remove any kind of images from my pages." And they start kind of looking at ways of doing that.

It is a valid approach, again. But, if it's a decorative image, you could say it's presentation. You can add images into your pages in that way, as non-repeating CSS background images.

And all of a sudden, you can basically remove the image, and you do a little bit of styling in your CSS and you say, "I'm leaving a little bit of padding on the left, and then I'm adding the warning triangle GIF as a non-repeating background, so that it only shows it once, and I'm putting it, basically, where I left that padding." And that's a nice, clean way of getting rid of kind of those fluff images in your markup.

But, sometimes the reality of it is you have to stick with images. It's not always feasible to do this sort of thing. It's a nice approach. It's a clean approach. But, what happens if, say, you've set up a CMS for somebody and they have to use a WYSIWYG editor behind the scenes to put their pages in? You might have very unskilled kind of authors. They don't know anything about HTML…they shouldn't! All they get is a little WYSIWYG editor. You can't expect them to now start thinking about, especially for one-off images, "I need to give this thing a class or an id. I then need to jump into the CSS. I need to work out what the padding is, apply a kind of non-repeating background." That's just not feasible.

So, sometimes you have to live with the fact that, yeah, you will still have some images, and it's not completely against the web standards way of having images in your pages.

Similar kind of tactic with image replacement…how many of you have heard of image replacement techniques? A few, yeah.

So, basically, HTML is quite limited when it comes to fonts, although they're working on it now — in CSS 3, you can embed fonts and send them down the wire to the browser. But, traditionally, you can only use the fonts that are actually installed on the system of the user that's looking at your pages.

But, what if you want something, a very specific typeface, for instance? Because of branding issues, because your company says, "Our logo or our headings always need to be set in, I don't know, Helvetica Bold or Helvetica Condensed, or something like that," or using a fancy font, something that is very unlikely to be available on users' machines, or you want to do a very heavily styled kind of graphical button.

So, something like that, you want to kind of use loads and loads of fonts that aren't available on the user's machine.

Well, image replacement techniques — there are very, very varied kind of techniques out there. I think, at the last count, there was about 12 or 13 that I've seen recently, all with their pros and cons.

But, most of them come down to: you mark up your text in the HTML the proper way — so, say if you're replacing a heading, you still wrap things up as a h1 — and then, using some kind of trickery, you hide the text that the browser would normally show and, instead of that, you show an image, usually in the background.

So, one of the approaches, for instance, is using a heading and a span — you usually end up wanting two kinds of elements. That is still semantic. h1, there's a span around it, and then the actual text. You put the actual fancy styled ready-made image of the heading, using that particular font…you put that in the background, you give it a width and height that matches that image, and then move your span. In this case, we moved it off-screen. Again, there are different techniques of doing that.

What you have done is you have moved the text away. All you are seeing is the background image, which is the fancy styled image. It is a very nice technique. It is a valid technique. It makes your markup very clean. But, what do you do when you have unskilled web authors who need to just edit their content? You can't expect them to work out the specifics of "How do I do this again? How do I wrap things up with a heading and span? Do I have to mess about with a CSS to do things?"

Sometimes, a humble image there will also do the job. So, it's being realistic. If you are creating your little web pages, it's fine. But, if you are doing something for an audience that might not be technically skilled and has to maintain it, sometimes you have to live with the fact that they will stick an image in, at some point, anyway. It is not a major sin to do that.

Dogmatism and standards. There is a lot going on when you start going on, say, forums about web standards, on mailing lists, when you're asking for advice.

A lot of the time, the first thing people — those arrogant people on those lists — do is they go to your website. They run it through the validator. As soon as they see that there is one little error on your page…OK. You have failed. Nobody wants to talk to you any more. You are not doing web standards. The dreaded "Your page isn't valid."

Being realistic, validation is not an end in and of itself. It is a quality assurance issue of your code. There are situations where, sometimes, you have to break validation. You know, for instance, that you have a particular user group that is still using browsers that do not completely follow web standards.

Sometimes you have to make slight changes to your markup, and you have to live with that. Or, if you are opening up comment functions on your site, for instance, you cannot guarantee that…say somebody posts something in there — if you are not cleaning the markup that they're posting, there might arise certain issues in terms of validation of the comments themselves.

Or, you are using a third party service, like an ad server that puts ad words into your pages. Sometimes it is outside of your control to say if it is actually outputting clean markup or not.

Also, for instance, if you just miss out an ampersand and you haven't encoded an ampersand, that would fail your validation. Does that really matter? Yes, in an ideal world, your pages should all be valid. But, it is not the end of the world.

So, it is kind of moving away from the "everything has to always validate." It is quality assurance. If you can work your way through your pages, especially if you're maintaining a large site with multiple authors, if you can guarantee that all of your pages are valid — good on you. But, if every now and again, an ampersand or something slips through, it is not the end of the world. It really depends on your particular situation.

When you start talking on lists and with other people who have done web standards for years, you get a lot of pedants who really like to go into things like "What is the most semantic way of marking things up?" They will argue for hours and hours about "This needs to really be that. If you look at the HTML specification, it clearly says that in this particular edge case you should use that particular element, et cetera, et cetera."

HTML is really…It does not offer you a lot of elements. It is very much like the traditional Lego bricks. There are a few shapes, few and far between. Yes, you can create whatever you want. But, sometimes it is more of an approximation of what you intended to build. Just because you only get a certain type of shape, a certain number of elements really to work with. So, you have to be realistic.

One of the big arguments that I remember from a few years ago was a simple question about "What is the most semantic way to mark up bread crumb navigation on a site?"

Looking at that, you start thinking "OK, the sensible way is..well it's a list — it's a list with a few links." It could be an unordered list. It could be an ordered list. My argument is that it's an ordered list, because it's a number of ordered steps from your home page to the page that you're in at the moment.

But then, you started getting a lot of people really getting into the pedantic "Well, no, not really. It needs to really show the hierarchical relationship." Some people ended up proposing that what you actually want is a nested list of nested lists of nested lists of nested lists. They argued the toss-over, that this is really the most semantic way, because it really represents what the data is all about. I say rubbish to that. So, sometimes, you have to go with the simple solution.

A quick one here. I have already touched a few times on…if you are using as CMS and you have WYSIWYG authors that need to do stuff behind the scenes. I wrote an article recently in .net magazine called The Artisan and the Mass Producer. In that one I argue that yes, ideally, if you are making your pages…If you're the only person who makes your pages, you're like an artisan. You can craft your little pages. You can be as semantic as you want. You can move all of your images out of your pages. You can add ids and stuff and do all of the stuff with CSS to put those images in as non-repeating backgrounds. You can spend hours and hours working out what is the most semantic way of marking this particular little piece of content up. That's great. More power to you.

But the problem with that is that it is not a scalable kind of format. If you are maintaining a large site with thousands of pages…If it is all down to you and you are spending hours or days pondering carefully over what kind of particular markup construct would be best in this situation, you're not going to be able to produce a lot.

In a similar way to mass production, when it first started, a lot of stuff has to be simplified a lot. You can't spend a lot of time carving out little details and everything else. If it has to be produced in large numbers, some kinds of compromises need to be made.

That is the case, most of the time now, with CMSs. They have these WYSIWYG editing tools behind the scenes. They are very generic and they don't let you do the really refined kind of stuff at the moment. You can't easily in a WYSIWYG way say "Right, this needs an image, but it's only a presentation image, so please add a class here, add an equivalent bit of CSS that positions this in the background, et cetera."

That's when your authors will end up sticking in images. They will end up marking things up…that should really be a citation, so it should be a cite, they will end up just styling it so that it looks italic and stuff.

And sometimes, if you're churning out thousands of pages a day, you really have to live with that. There is not much you can do by going to all of your authors and saying "No, here…learn HTML. This is how you need to do it, et cetera." You have to live with it. As long as they do the basics and they mark up things in paragraphs, ordered lists, et cetera, you're halfway there. Sometimes, doing that is good enough for the time being.

So, a little bit of take away advice…I kind of rushed through this one, very quickly.

Take away advice, hopefully…this is very opinionated stuff.

Web standards, obviously, are not just about validation. The whole chapter about what is the most semantic markup that you can use, et cetera. All of that is far more than running it through a validator and making sure that you have dotted your 'i's and crossed your 't's. Just because your pages validate, still have a look that you are actually doing the proper thing and separating content from presentation, using the most appropriate markup. And it is easy to create pages that actually validate, but if you start looking under the hood, they are actually a complete mess. You are still doing very presentational stuff. Yes, using CSS maybe. But, you are still not doing it the way it was intended.

Tables…they're not all evil. You might start off, when you learn web standards, thinking "OK, I can ditch all of the tables stuff." No, they're still valid when you're doing tabular data.

And sometimes you have to compromise. As I said, depending on your situation, if you are maintaining a CMS based system or you have to churn out a lot of pages…yes, you could sit there and carefully craft away at every individual page, really making sure that every little bit of "micro markup" is right…but, sometimes, you have to really settle for good enough.

And that is me. Thank you.

Initial transcript provided by CastingWords, with subsequent editing by Patrick H. Lauke. Released under Creative Commons Attribution-NonCommercial-ShareAlike 3.0.