Intro
HTML Tidy
Documentation Introduction
On this page you can refer to nearly everything you need to know about HTML
Tidy. If you’re on macOS, Linux, or UNIX you can also use man tidy
and
read the purpose-built documentation for the version of Tidy that you have
installed.
You can find configuration quick references in the API and Quick Reference Site.
If you’re a developer using libtidy
please consult the
API and Quick Reference Site here, and the libtidy Introduction page.
And if you simply want to use Tidy, then please read on.
What
What Tidy does
Tidy corrects and cleans up HTML content by fixing markup errors. Here are a few examples:
-
Mismatched end tags:
<h2>subheading</h3>
…is converted to:
<h2>subheading</h2>
-
Misnested tags:
<p>here is a para <b>bold <i>bold italic</b> bold?</i> normal?
…is converted to:
<p>here is a para <b>bold <i>bold italic</i> bold?</b> normal?
-
Missing end tags:
<h1>heading <h2>subheading</h2>
…is converted to:
<h1>heading</h1> <h2>subheading</h2>
…and
<h1><i>italic heading</h1>
…is converted to:
<h1><i>italic heading</i></h1>
-
Mixed-up tags
<i><h1>heading</h1></i> <p>new paragraph <b>bold text <p>some more bold text
…is converted to:
<h1><i>heading</i></h1> <p>new paragraph <b>bold text</b> <p><b>some more bold text</b>
-
Tag in the wrong place:
<h1><hr>heading</h1> <h2>sub<hr>heading</h2>
…is converted to:
<hr> <h1>heading</h1> <h2>sub</h2> <hr> <h2>heading</h2>
-
Missing “/” in end tags:
<a href="#refs">References<a>
…is converted to:
<a href="#refs">References</a>
-
List markup with missing tags:
<body> <ul> <li>1st list item <li>2nd list item
…is converted to:
<body> <ul> <li>1st list item</li> <li>2nd list item</li> </ul>
Note Tidy will warn about the missing ul close tag, but not about the optional li close tag.
-
Missing quotation marks around attribute values
Tidy inserts quotation marks around all attribute values for you. It can also detect when you have forgotten the closing quotation mark, although this is something you will have to fix yourself.
-
Unknown/proprietary attributes
Tidy has a comprehensive knowledge of the attributes defined in HTML5. That often allows you to spot where you have mis-typed an attribute.
-
Tags lacking a terminating
>
This is something you then have to fix yourself as Tidy cannot determine where the
>
was meant to be inserted.
Use
Running Tidy
Running Tidy in a Terminal (Console)
This is the syntax for invoking Tidy from the command line:
tidy [[options] filename]
Tidy defaults to reading from standard input, so if you run Tidy without
specifying the filename
argument, it will just sit there waiting for input
to read.
Tidy defaults to writing to standard output. So you can pipe output from Tidy to other programs, as well as pipe output from other programs to Tidy. You can page through the output from Tidy by piping it to a pager, e.g.:
tidy file.html | less
To have Tidy write its output to a file instead, either use the
-o filename
or -output filename
option, or redirect standard output to the file. For example:
tidy -o output.html index.html
tidy index.html > output.html
Both of those run Tidy on the file index.html and write the output to the file output.html, while writing any error messages to standard error.
Tidy defaults to writing its error messages to standard error (that is, to the console where you’re running Tidy). To page through the error messages along with the output, redirect standard error to standard output, and pipe it to your pager:
tidy index.html 2>&1 | less
To have Tidy write the errors to a file instead, either use the
-f filename
or -file filename
option, or redirect standard error to a file:
tidy -o output.html -f errs.txt index.html
tidy index.html > output.html 2> errs.txt
Both of those run Tidy on the file index.html and write the output to the file output.html, while writing any error messages to the file errs.txt.
Writing the error messages to a file is especially useful if the file you are checking has many errors; reading them from a file instead of the console or pager can make it easier to review them.
You can use the or -m
or -modify
option to modify (in-place) the contents
of the input file you are checking; that is, to overwrite those contents with
the output from Tidy. For example:
tidy -f errs.txt -m index.html
That runs Tidy on the file index.html, modifying it in place and writing the error messages to the file errs.txt.
Caution: If you use the -m
option, you should first ensure that you have a
backup copy of your file.
Running Tidy in Scripts
If you want to run Tidy from a Perl, bash, or other scripting language
you may find it of value to inspect the result returned by Tidy
when it exits: 0
if everything is fine, 1
if there were warnings
and 2
if there were errors. This is an example using Perl:
if (close(TIDY) == 0) {
my $exitcode = $? >> 8;
if ($exitcode == 1) {
printf STDERR "tidy issued warning messages\n";
} elsif ($exitcode == 2) {
printf STDERR "tidy issued error messages\n";
} else {
die "tidy exited with code: $exitcode\n";
}
} else {
printf STDERR "tidy detected no errors\n";
}
Featured
Featured Options and Solutions
Indenting output for readability
Indenting the source markup of an HTML document makes the markup easier
to read. Tidy can indent the markup for an HTML document while recognizing
elements whose contents should not be indented. In the example below, Tidy
indents the output while preserving the formatting of the <pre>
element:
Input:
<html>
<head>
<title>Test document</title>
</head>
<body>
<p>This example shows how Tidy can indent output while preserving
formatting of particular elements.</p>
<pre>This is
<em>genuine
preformatted</em>
text
</pre>
</body>
</html>
Output:
<html>
<head>
<title>Test document</title>
</head>
<body>
<p>This example shows how Tidy can indent output while preserving
formatting of particular elements.</p>
<pre>
This is
<em>genuine
preformatted</em>
text
</pre>
</body>
</html>
Tidy’s indenting behavior is not perfect and can sometimes cause your
output to be rendered by browsers in a different way than the input.
You can avoid unexpected indenting-related rendering problems by setting
indent:no
or indent:auto
in a config file.
Preserving original indenting not possible
Tidy is not capable of preserving the original indenting of the markup from the input it receives. That’s because Tidy starts by building a clean parse tree from the input, and that parse tree doesn’t contain any information about the original indenting. Tidy then pretty-prints the parse tree using the current config settings. Trying to preserve the original indenting from the input would interact badly with the repair operations needed to build a clean parse tree, and would considerably complicate the code.
Encodings and character references
Tidy defaults to assuming you want output to be encoded in UTF-8. But Tidy offers you a choice of other character encodings: US ASCII, ISO Latin-1, and the ISO 2022 family of 7 bit encodings.
Tidy doesn’t yet recognize the use of the HTML <meta>
element for
specifying the character encoding.
The full set of HTML character references are defined. Cleaned-up output uses named character references for characters when appropriate. Otherwise, characters outside the normal range are output as numeric character references.
Accessibility
Tidy offers advice on potential accessibility problems for people using non-graphical browsers. Have a look at our rescued HTML Tidy Accessibility Checker page.
Cleaning up presentational markup
Some tools generate HTML with presentational elements such as <font>
,
<nobr>
, and <center>
. Tidy’s ‑clean
option will replace those elements
with <style>
elements and CSS.
Some HTML documents rely on the presentational effects of <p>
start
tags that are not followed by any content. Tidy deletes such <p>
tags
(as well as any headings that don’t have content). So do not use <p>
tags simply for adding vertical whitespace; instead use CSS, or the
<br>
element. However, note that Tidy won’t discard <p>
tags that
are followed by any non-breaking space (that is, the
named
character reference).
Teaching Tidy about new tags
You can teach Tidy about new tags by declaring them in the configuration file, the syntax is:
new-inline-tags: tag1, tag2, tag3
new-empty-tags: tag1, tag2, tag3
new-blocklevel-tags: tag1, tag2, tag3
new-pre-tags: tag1, tag2, tag3
The same tag can be defined as empty and as inline, or as empty and as block.
These declarations can be combined to define a new empty inline or empty block element, but you are not advised to declare tags as being both inline and block.
Note that the new tags can only appear where Tidy expects inline or block-level tags respectively. That means you can’t place new tags within the document head or other contexts with restricted content models.
Ignoring PHP, ASP, and JSTE instructions
Tidy will gracefully ignore many cases of PHP, ASP, and JSTE instructions within element content and as replacements for attributes, and preserve them as-is in output; for example:
<option <% if rsSchool.Fields("ID").Value
= session("sessSchoolID")
then Response.Write("selected") %>
value='<%=rsSchool.Fields("ID").Value%>'>
<%=rsSchool.Fields("Name").Value%>
(<%=rsSchool.Fields("ID").Value%>)
</option>
But note that Tidy may report missing attributes when those are “hidden” within the PHP, ASP, or JSTE code. If you use PHP, ASP, or JSTE code to create a start tag, but place the end tag explicitly in the HTML markup, Tidy won’t be able to match them up, and will delete the end tag. In that case you are advised to make the start tag explicit and to use PHP, ASP, or JSTE code for just the attributes; for example:
<a href="<%=random.site()%>">do you feel lucky?</a>
Tidy can also get things wrong if the PHP, ASP, or JSTE code includes quotation marks; for example:
value="<%=rsSchool.Fields("ID").Value%>"
Tidy will see the quotation mark preceding ID
as ending the
attribute value, and proceed to complain about what follows.
Tidy allows you to control whether line wrapping on spaces within PHP, ASP,
and JSTE instructions is enabled; see the wrap-php
, wrap-asp
,
and wrap-jste
config options.
Correcting well-formedness errors in XML markup
Tidy can help you to correct well-formedness errors in XML markup. Tidy doesn’t yet recognize all XML features, though; for example, it doesn’t understand CDATA sections or DTD subsets.
Build
Building Tidy
Source code
Tidy’s sourcecode can be found at https://github.com/htacg/tidy-html5. There are sometimes several branches, but in general Master is the most recently updated version. Note that as “cutting edge,” it may have bugs or other unstable behavior. If you prefer a stable, officially released version, be sure to have a look at Releases on the github page.
In general you can use the Download ZIP button on the github page to download the most recent version of a branch. If you prefer Git then you can use, e.g.:
git clone git@github.com:htacg/tidy-html5.git
…to clone the repository to your working machine.
Build the tidy
command-line tool and libtidy
library
For Linux/BSD/Mac platforms, you can build and install the tidy
command-line
tool from the source code using the following steps:
-
cd {your-tidy-html5-directory}/build/cmake
-
cmake ../.. [-DCMAKE_INSTALL_PREFIX=/path/for/install]
-
Windows:
cmake --build . --config Release
Unix/OS X:make
-
Install, if desired:
Windows:cmake --build . --config Release --target INSTALL
Unix/OS X:[sudo] make install
Note that you will either need to run make install
as root,
or with sudo make install
.
FAQ
FAQs
- What now?
-
If you have a popup screen that reads similar to the below:
HTML Tidy for Windows <vers 1st August 2002; built on Aug 8 2002, at 15:41:13> Parsing Console input <stdin>
…and do not know what to do next, read on.
Tidy is waiting for your HTML to come in so that it can parse it. Tidy is fundamentally a tool that reads in HTML, cleans it up, and then writes it out again. It was developed as a program you run from the console prompt, but there are GUI encapsulations available, e.g. HTML-Kit, which you might prefer.
From the console prompt you can run Tidy like this:
C> tidy -m mywebpage.html
In this case, the
-m
option requests Tidy to write the tidied file back to the same filename as it read from (mywebpage.html
). Tidy will give you a breakdown of the problems it found and the version of HTML the file appears to be using.To get a listing of Tidy command line options, just type
tidy -?
. To see a listing on configuration options, trytidy -help-config
. To get more info on the config options, see the applicable Quick Reference. - How to get support and/or file a bug report and/or feature request
-
For support and/or to file a bug report for HTACG’s HTML Tidy, please use our bug tracker. For general Tidy support, including for different versions of Tidy and for products that use
libtidy
, a good location is the original W3C mailing list html-tidy@w3.org. - Best practice to submit a bug report
-
Prior to submitting a bug report, please check that the bug is not already known. Many are. If you are not sure, just ask. If it is new bug, make sure to include at least the following information in your report:
- A description of what you think went wrong.
- The HTML Tidy version (find it out by running
tidy -v
), and operating system you are running. - The input that exposes the bug. A small HTML document that reproduces the problem is best.
- The configuration options you’ve used. Command line options like
-asxml
, configuration files, etc. You may usetidy -show-config
to get an overview of the active Tidy settings. - Your e-mail address for further questions and comments.
This information is necessary to reproduce whatever is failing; without them we cannot help you.
Please include only one bug per report. Reports with multiple bugs are less easy to track and some bugs may get missed.
- Best practice to submit a feature request
-
If you want Tidy to do something new that it doesn’t do today (or to stop doing something), then it is probably a feature request.
As with bugs, please be sure that the feature has not already been requested. If the feature has already been requested, you can add your comments to the issue tracker. If the feature has not already been requested, send the same information as for a bug report, but place special emphasis on the desired output for a given input, desired options, etc. Please be as specific as possible about what you want Tidy to do.
- How Do I Control the Output Layout?
-
There are three primary options that control how Tidy formats your markup:
indent
indent-attributes
vertical-space
Briefly,
indent
sets the level of left-to-right indenting and, somewhat, how often elements are put onto a new line. The options areyes
,no
, andauto
.indent-attributes
is a flag that, when set, tells Tidy to put each attribute on a new line.vertical-space
is a flag that, when set, tells Tidy to add some empty lines for readability.The default for all three is
no
. These options may be used in any combination to control how you want your markup to look. The best thing is to experiment a bit to see what you like. Be aware thatindent yes
is deprecated for production use as it will cause visual changes in most browsers.To get Tidy Classic
--indent auto
layout, use the following options:indent: auto indent-attributes: no vertical-space: yes
You can read about more pretty print options in the applicable Quick Reference.
- What version of Tidy should I Use?
-
The current HTACG builds are recommended. You can find these on the github repository or from our website.
Please continue to report examples where Tidy does not catch some ill-formed HTML, or (worse) generates ill-formed HTML. These cases have been significantly reduced. That said, be sure to test Tidy with some representative files from your environment.
For building a front end (e.g. GUI or language binding), the simplest approach is to use
libtidy
. For more information about building and coding withlibtidy
, see the Introduction Tolibtidy
. - How do I Run a regression test?
-
You might ask, “Why should I run a regression test?” If you are a Tidy user, you might want to compare a new version of Tidy to the version you are currently running. This is a good idea if you are using Tidy in production applications such as web publishing. If you are a Tidy developer, it is a good idea to run the regression test suite to make sure your fix or enhancement doesn’t add new bugs.
Detecting new bugs is easier said than done because sometimes they are subtle and can only be seen in browsers (or one particular browser you don’t even have). You can catch most crashes and many layout problems by running the test suite as described here.
The basic process is simple: run the test suite before and after making changes to
libtidy
and compare the output markup and messages. Be aware that the test scripts for Windows (alltest.cmd
) and Linux/Unix (testall.sh
) place the output files intidy/test/tmp
. If you forget to run the before test, you can always download a binary or checkout the previous version of the branch you are testing.Here are the steps to evaluate the impact of a
libtidy
change.Note: these steps may or may not be accurate as of 2015-October-16. Please submit a bug report if you verify whether or not these instructions still work before we do.
- Regression test for Windows
-
Before making changes:
C:\tidy\test> alltest.cmd C:\tidy\test> ren tmp baseline
After making changes and building Tidy:
C:\tidy\test> alltest.cmd C:\tidy\test> windiff tmp baseline
- Regression test for Mac/Linux/Unix
-
Before making changes:
~/tidy/test$ ./testall.sh ~/tidy/test$ mv tmp baseline
After making changes and building Tidy:
~/tidy/test$ ./testall.sh ~/tidy/test$ diff -u tmp baseline > diff.txt
License
License
HTML parser and pretty printer
Copyright © 1998-2003 World Wide Web Consortium (Massachusetts Institute of Technology, European Research Consortium for Informatics and Mathematics, Keio University). All Rights Reserved.
Copyright © 2003-2015 by additional contributors.
This software and documentation is provided “as is,” and the copyright holders and contributing author(s) make no representations or warranties, express or implied, including but not limited to, warranties of merchantability or fitness for any particular purpose or that the use of the software or documentation will not infringe any third party patents, copyrights, trademarks or other rights.
The copyright holders and contributing author(s) will not be held liable for any direct, indirect, special or consequential damages arising out of any use of the software or documentation, even if advised of the possibility of such damage.
Permission is hereby granted to use, copy, modify, and distribute this source code, or portions hereof, documentation and executables, for any purpose, without fee, subject to the following restrictions:
- The origin of this source code must not be misrepresented.
- Altered versions must be plainly marked as such and must not be misrepresented as being the original source.
- This Copyright notice may not be removed or altered from any source or altered source distribution.
The copyright holders and contributing author(s) specifically permit, without fee, and encourage the use of this source code as a component for supporting the Hypertext Markup Language in commercial products. If you use this source code in a product, acknowledgment is not required but would be appreciated.