CLVIII. Tidy Functions

Introduction

Tidy is a binding for the Tidy HTML clean and repair utility which allows you to not only clean and otherwise manipulate HTML documents, but also traverse the document tree.

Requirements

To use Tidy, you will need libtidy installed, available on the tidy homepage http://tidy.sourceforge.net/.

Installation

Tidy is currently available for PHP 4.3.x and PHP 5 as a PECL extension from http://pecl.php.net/package/tidy.

Note: Tidy 1.0 is just for PHP 4.3.x, while Tidy 2.0 is just for PHP 5.

If PEAR is available on your *nix-like system you can use the pear installer to install the tidy extension, by the following command: pecl install tidy.

You can always download the tar.gz package and install tidy by hand:

Example 1. tidy install by hand in PHP 4.3.x

gunzip tidy-xxx.tgz
tar -xvf tidy-xxx.tar
cd tidy-xxx
phpize
./configure && make && make install

Windows users can download the extension dll from http://pecl4win.php.net/ext.php/php_tidy.dll.

In PHP 5 you need only to compile using the --with-tidy option.

Runtime Configuration

The behaviour of these functions is affected by settings in php.ini.

Table 1. Tidy Configuration Options

NameDefaultChangeableChangelog
tidy.default_config""PHP_INI_SYSTEMAvailable since PHP 5.0.0.
tidy.clean_output"0"PHP_INI_PERDIRAvailable since PHP 5.0.0.
For further details and definitions of the PHP_INI_* constants, see the Appendix G.

Here's a short explanation of the configuration directives.

tidy.default_config string

Default path for tidy config file.

tidy.clean_output boolean

Turns on/off the output repairing by Tidy.

Warning

Do not turn on tidy.clean_output if you are generating non-html content such as dynamic images.

Resource Types

This extension has no resource types defined.

Predefined Classes

tidyNode

Methods

Properties

  • value - the value of the node (e.g. the html text)

  • name - the name of the tag (e.g. html, a, etc..)

  • type - the type of the node (one of the constants above, e.g. TIDY_NODETYPE_PHP)

  • line* - the line where the node starts

  • column* - the column where the node starts

  • proprietary* - TRUE if the node refers to a proprietary tag

  • id - the ID of the tag (one of the constants above, e.g. TIDY_TAG_FRAME)

  • attribute - an array with the attributes of the current node, or NULL if there aren't any

  • child - an array with the child tidyNodes, or NULL if there aren't any

Note: The properties marked with * are just available since PHP 5.1.0.

Predefined Constants

The constants below are defined by this extension, and will only be available when the extension has either been compiled into PHP or dynamically loaded at runtime.

Each TIDY_TAG_XXX represents a HTML tag. For example, TIDY_TAG_A represents a <a href="XX">link</a> tag. Each TIDY_ATTR_XXX represents a HTML atribute. For example TIDY_ATTR_HREF would represent the href atribute in the previous example.

The following constants are defined:

Table 2. tidy tag constants

constant
TIDY_TAG_UNKNOWN
TIDY_TAG_A
TIDY_TAG_ABBR
TIDY_TAG_ACRONYM
TIDY_TAG_ALIGN
TIDY_TAG_APPLET
TIDY_TAG_AREA
TIDY_TAG_B
TIDY_TAG_BASE
TIDY_TAG_BASEFONT
TIDY_TAG_BDO
TIDY_TAG_BGSOUND
TIDY_TAG_BIG
TIDY_TAG_BLINK
TIDY_TAG_BLOCKQUOTE
TIDY_TAG_BODY
TIDY_TAG_BR
TIDY_TAG_BUTTON
TIDY_TAG_CAPTION
TIDY_TAG_CENTER
TIDY_TAG_CITE
TIDY_TAG_CODE
TIDY_TAG_COL
TIDY_TAG_COLGROUP
TIDY_TAG_COMMENT
TIDY_TAG_DD
TIDY_TAG_DEL
TIDY_TAG_DFN
TIDY_TAG_DIR
TIDY_TAG_DIV
TIDY_TAG_DL
TIDY_TAG_DT
TIDY_TAG_EM
TIDY_TAG_EMBED
TIDY_TAG_FIELDSET
TIDY_TAG_FONT
TIDY_TAG_FORM
TIDY_TAG_FRAME
TIDY_TAG_FRAMESET
TIDY_TAG_H1
TIDY_TAG_H2
TIDY_TAG_H3
TIDY_TAG_H4
TIDY_TAG_H5
TIDY_TAG_H6
TIDY_TAG_HEAD
TIDY_TAG_HR
TIDY_TAG_HTML
TIDY_TAG_I
TIDY_TAG_IFRAME
TIDY_TAG_ILAYER
TIDY_TAG_IMG
TIDY_TAG_INPUT
TIDY_TAG_INS
TIDY_TAG_ISINDEX
TIDY_TAG_KBD
TIDY_TAG_KEYGEN
TIDY_TAG_LABEL
TIDY_TAG_LAYER
TIDY_TAG_LEGEND
TIDY_TAG_LI
TIDY_TAG_LINK
TIDY_TAG_LISTING
TIDY_TAG_MAP
TIDY_TAG_MARQUEE
TIDY_TAG_MENU
TIDY_TAG_META
TIDY_TAG_MULTICOL
TIDY_TAG_NOBR
TIDY_TAG_NOEMBED
TIDY_TAG_NOFRAMES
TIDY_TAG_NOLAYER
TIDY_TAG_NOSAVE
TIDY_TAG_NOSCRIPT
TIDY_TAG_OBJECT
TIDY_TAG_OL
TIDY_TAG_OPTGROUP
TIDY_TAG_OPTION
TIDY_TAG_P
TIDY_TAG_PARAM
TIDY_TAG_PLAINTEXT
TIDY_TAG_PRE
TIDY_TAG_Q
TIDY_TAG_RP
TIDY_TAG_RT
TIDY_TAG_RTC
TIDY_TAG_RUBY
TIDY_TAG_S
TIDY_TAG_SAMP
TIDY_TAG_SCRIPT
TIDY_TAG_SELECT
TIDY_TAG_SERVER
TIDY_TAG_SERVLET
TIDY_TAG_SMALL
TIDY_TAG_SPACER
TIDY_TAG_SPAN
TIDY_TAG_STRIKE
TIDY_TAG_STRONG
TIDY_TAG_STYLE
TIDY_TAG_SUB
TIDY_TAG_TABLE
TIDY_TAG_TBODY
TIDY_TAG_TD
TIDY_TAG_TEXTAREA
TIDY_TAG_TFOOT
TIDY_TAG_TH
TIDY_TAG_THEAD
TIDY_TAG_TITLE
TIDY_TAG_TR
TIDY_TAG_TR
TIDY_TAG_TT
TIDY_TAG_U
TIDY_TAG_UL
TIDY_TAG_VAR
TIDY_TAG_WBR
TIDY_TAG_XMP

Table 3. tidy attribute constants

constant
TIDY_ATTR_UNKNOWN
TIDY_ATTR_ABBR
TIDY_ATTR_ACCEPT
TIDY_ATTR_ACCEPT_CHARSET
TIDY_ATTR_ACCESSKEY
TIDY_ATTR_ACTION
TIDY_ATTR_ADD_DATE
TIDY_ATTR_ALIGN
TIDY_ATTR_ALINK
TIDY_ATTR_ALT
TIDY_ATTR_ARCHIVE
TIDY_ATTR_AXIS
TIDY_ATTR_BACKGROUND
TIDY_ATTR_BGCOLOR
TIDY_ATTR_BGPROPERTIES
TIDY_ATTR_BORDER
TIDY_ATTR_BORDERCOLOR
TIDY_ATTR_BOTTOMMARGIN
TIDY_ATTR_CELLPADDING
TIDY_ATTR_CELLSPACING
TIDY_ATTR_CHAR
TIDY_ATTR_CHAROFF
TIDY_ATTR_CHARSET
TIDY_ATTR_CHECKED
TIDY_ATTR_CITE
TIDY_ATTR_CLASS
TIDY_ATTR_CLASSID
TIDY_ATTR_CLEAR
TIDY_ATTR_CODE
TIDY_ATTR_CODEBASE
TIDY_ATTR_CODETYPE
TIDY_ATTR_COLOR
TIDY_ATTR_COLS
TIDY_ATTR_COLSPAN
TIDY_ATTR_COMPACT
TIDY_ATTR_CONTENT
TIDY_ATTR_COORDS
TIDY_ATTR_DATA
TIDY_ATTR_DATAFLD
TIDY_ATTR_DATAPAGESIZE
TIDY_ATTR_DATASRC
TIDY_ATTR_DATETIME
TIDY_ATTR_DECLARE
TIDY_ATTR_DEFER
TIDY_ATTR_DIR
TIDY_ATTR_DISABLED
TIDY_ATTR_ENCODING
TIDY_ATTR_ENCTYPE
TIDY_ATTR_FACE
TIDY_ATTR_FOR
TIDY_ATTR_FRAME
TIDY_ATTR_FRAMEBORDER
TIDY_ATTR_FRAMESPACING
TIDY_ATTR_GRIDX
TIDY_ATTR_GRIDY
TIDY_ATTR_HEADERS
TIDY_ATTR_HEIGHT
TIDY_ATTR_HREF
TIDY_ATTR_HREFLANG
TIDY_ATTR_HSPACE
TIDY_ATTR_HTTP_EQUIV
TIDY_ATTR_ID
TIDY_ATTR_ISMAP
TIDY_ATTR_LABEL
TIDY_ATTR_LANG
TIDY_ATTR_LANGUAGE
TIDY_ATTR_LAST_MODIFIED
TIDY_ATTR_LAST_VISIT
TIDY_ATTR_LEFTMARGIN
TIDY_ATTR_LINK
TIDY_ATTR_LONGDESC
TIDY_ATTR_LOWSRC
TIDY_ATTR_MARGINHEIGHT
TIDY_ATTR_MARGINWIDTH
TIDY_ATTR_MAXLENGTH
TIDY_ATTR_MEDIA
TIDY_ATTR_METHOD
TIDY_ATTR_MULTIPLE
TIDY_ATTR_NAME
TIDY_ATTR_NOHREF
TIDY_ATTR_NORESIZE
TIDY_ATTR_NOSHADE
TIDY_ATTR_NOWRAP
TIDY_ATTR_OBJECT
TIDY_ATTR_OnAFTERUPDATE
TIDY_ATTR_OnBEFOREUNLOAD
TIDY_ATTR_OnBEFOREUPDATE
TIDY_ATTR_OnBLUR
TIDY_ATTR_OnCHANGE
TIDY_ATTR_OnCLICK
TIDY_ATTR_OnDATAAVAILABLE
TIDY_ATTR_OnDATASETCHANGED
TIDY_ATTR_OnDATASETCOMPLETE
TIDY_ATTR_OnDBLCLICK
TIDY_ATTR_OnERRORUPDATE
TIDY_ATTR_OnFOCUS
TIDY_ATTR_OnKEYDOWN
TIDY_ATTR_OnKEYPRESS
TIDY_ATTR_OnKEYUP
TIDY_ATTR_OnLOAD
TIDY_ATTR_OnMOUSEDOWN
TIDY_ATTR_OnMOUSEMOVE
TIDY_ATTR_OnMOUSEOUT
TIDY_ATTR_OnMOUSEOVER
TIDY_ATTR_OnMOUSEUP
TIDY_ATTR_OnRESET
TIDY_ATTR_OnROWENTER
TIDY_ATTR_OnROWEXIT
TIDY_ATTR_OnSELECT
TIDY_ATTR_OnSUBMIT
TIDY_ATTR_OnUNLOAD
TIDY_ATTR_PROFILE
TIDY_ATTR_PROMPT
TIDY_ATTR_RBSPAN
TIDY_ATTR_READONLY
TIDY_ATTR_REL
TIDY_ATTR_REV
TIDY_ATTR_RIGHTMARGIN
TIDY_ATTR_ROWS
TIDY_ATTR_ROWSPAN
TIDY_ATTR_RULES
TIDY_ATTR_SCHEME
TIDY_ATTR_SCOPE
TIDY_ATTR_SCROLLING
TIDY_ATTR_SELECTED
TIDY_ATTR_SHAPE
TIDY_ATTR_SHOWGRID
TIDY_ATTR_SHOWGRIDX
TIDY_ATTR_SHOWGRIDY
TIDY_ATTR_SIZE
TIDY_ATTR_SPAN
TIDY_ATTR_SRC
TIDY_ATTR_STANDBY
TIDY_ATTR_START
TIDY_ATTR_STYLE
TIDY_ATTR_SUMMARY
TIDY_ATTR_TABINDEX
TIDY_ATTR_TARGET
TIDY_ATTR_TEXT
TIDY_ATTR_TITLE
TIDY_ATTR_TOPMARGIN
TIDY_ATTR_TYPE
TIDY_ATTR_USEMAP
TIDY_ATTR_VALIGN
TIDY_ATTR_VALUE
TIDY_ATTR_VALUETYPE
TIDY_ATTR_VERSION
TIDY_ATTR_VLINK
TIDY_ATTR_VSPACE
TIDY_ATTR_WIDTH
TIDY_ATTR_WRAP
TIDY_ATTR_XML_LANG
TIDY_ATTR_XML_SPACE
TIDY_ATTR_XMLNS

Table 4. tidy nodetype constants

constantdescription
TIDY_NODETYPE_ROOTroot node
TIDY_NODETYPE_DOCTYPEdoctype
TIDY_NODETYPE_COMMENTHTML comment
TIDY_NODETYPE_PROCINSProcessing Instruction
TIDY_NODETYPE_TEXTText
TIDY_NODETYPE_STARTstart tag
TIDY_NODETYPE_ENDend tag
TIDY_NODETYPE_STARTENDempty tag
TIDY_NODETYPE_CDATACDATA
TIDY_NODETYPE_SECTIONXML section
TIDY_NODETYPE_ASPASP code
TIDY_NODETYPE_JSTEJSTE code
TIDY_NODETYPE_PHPPHP code
TIDY_NODETYPE_XMLDECLXML declaration

Examples

This simple example shows basic Tidy usage.

Example 2. Basic Tidy usage

<?php
ob_start
();
?>
<html>a html document</html>
<?
$html 
ob_get_clean();

// Specify configuration
$config = array(
           
'indent'         => true,
           
'output-xhtml'   => true,
           
'wrap'           => 200);

// Tidy
$tidy = new tidy;
$tidy->parseString($html$config'utf8');
$tidy->cleanRepair();

// Output
echo $tidy;
?>

Table of Contents
ob_tidyhandler --  ob_start callback function to repair the buffer
tidy_access_count --  Returns the Number of Tidy accessibility warnings encountered for specified document
tidy_clean_repair --  Execute configured cleanup and repair operations on parsed markup
tidy_config_count --  Returns the Number of Tidy configuration errors encountered for specified document
tidy::__construct --  Constructs a new tidy object
tidy_diagnose --  Run configured diagnostics on parsed and repaired markup
tidy_error_count --  Returns the Number of Tidy errors encountered for specified document
tidy_get_body --  Returns a tidyNode Object starting from the <body> tag of the tidy parse tree
tidy_get_config --  Get current Tidy configuration
tidy_get_error_buffer --  Return warnings and errors which occurred parsing the specified document
tidy_get_head --  Returns a tidyNode Object starting from the <head> tag of the tidy parse tree
tidy_get_html_ver --  Get the Detected HTML version for the specified document
tidy_get_html --  Returns a tidyNode Object starting from the <html> tag of the tidy parse tree
tidy_get_opt_doc --  Returns the documentation for the given option name
tidy_get_output --  Return a string representing the parsed tidy markup
tidy_get_release --  Get release date (version) for Tidy library
tidy_get_root --  Returns a tidyNode object representing the root of the tidy parse tree
tidy_get_status --  Get status of specified document
tidy_getopt --  Returns the value of the specified configuration option for the tidy document
tidy_is_xhtml --  Indicates if the document is a XHTML document
tidy_is_xml --  Indicates if the document is a generic (non HTML/XHTML) XML document
tidy_load_config --  Load an ASCII Tidy configuration file with the specified encoding
tidy_node->get_attr --  Return the attribute with the provided attribute id
tidy_node->get_nodes --  Return an array of nodes under this node with the specified id
tidy_node->next --  Returns the next sibling to this node
tidy_node->prev --  Returns the previous sibling to this node
tidy_parse_file --  Parse markup in file or URI
tidy_parse_string --  Parse a document stored in a string
tidy_repair_file --  Repair a file and return it as a string
tidy_repair_string --  Repair a string using an optionally provided configuration file
tidy_reset_config --  Restore Tidy configuration to default values
tidy_save_config --  Save current settings to named file
tidy_set_encoding --  Set the input/output character encoding for parsing markup
tidy_setopt --  Updates the configuration settings for the specified tidy document
tidy_warning_count --  Returns the Number of Tidy warnings encountered for specified document
tidyNode->hasChildren --  Returns true if this node has children
tidyNode->hasSiblings --  Returns true if this node has siblings
tidyNode->isAsp --  Returns true if this node is ASP
tidyNode->isComment --  Returns true if this node represents a comment
tidyNode->isHtml --  Returns true if this node is part of a HTML document
tidyNode->isJste --  Returns true if this node is JSTE
tidyNode->isPhp --  Returns true if this node is PHP
tidyNode->isText --  Returns true if this node represents text (no markup)