bs4.formatter#

Module Contents#

Classes#

Formatter

Describes a strategy to use when outputting a parse tree to a string.

HTMLFormatter

A generic Formatter for HTML.

XMLFormatter

A generic Formatter for XML.

class bs4.formatter.Formatter(language=None, entity_substitution=None, void_element_close_prefix='/', cdata_containing_tags=None, empty_attributes_are_booleans=False, indent=1)#

Bases: bs4.dammit.EntitySubstitution

Describes a strategy to use when outputting a parse tree to a string.

Some parts of this strategy come from the distinction between HTML4, HTML5, and XML. Others are configurable by the user.

Formatters are passed in as the formatter argument to methods like PageElement.encode. Most people won’t need to think about formatters, and most people who need to think about them can pass in one of these predefined strings as formatter rather than making a new Formatter object:

For HTML documents:
  • ‘html’ - HTML entity substitution for generic HTML documents. (default)

  • ‘html5’ - HTML entity substitution for HTML5 documents, as

    well as some optimizations in the way tags are rendered.

  • ‘minimal’ - Only make the substitutions necessary to guarantee

    valid HTML.

  • None - Do not perform any substitution. This will be faster

    but may result in invalid markup.

For XML documents:
  • ‘html’ - Entity substitution for XHTML documents.

  • ‘minimal’ - Only make the substitutions necessary to guarantee

    valid XML. (default)

  • None - Do not perform any substitution. This will be faster

    but may result in invalid markup.

XML_FORMATTERS#
HTML_FORMATTERS#
HTML = 'html'#
XML = 'xml'#
HTML_DEFAULTS#
_default(language, value, kwarg)#
substitute(ns)#

Process a string that needs to undergo entity substitution. This may be a string encountered in an attribute value or as text.

Parameters:

ns – A string.

Returns:

A string with certain characters replaced by named or numeric entities.

attribute_value(value)#

Process the value of an attribute.

Parameters:

ns – A string.

Returns:

A string with certain characters replaced by named or numeric entities.

attributes(tag)#

Reorder a tag’s attributes however you want.

By default, attributes are sorted alphabetically. This makes behavior consistent between Python 2 and Python 3, and preserves backwards compatibility with older versions of Beautiful Soup.

If empty_boolean_attributes is True, then attributes whose values are set to the empty string will be treated as boolean attributes.

class bs4.formatter.HTMLFormatter(*args, **kwargs)#

Bases: Formatter

A generic Formatter for HTML.

REGISTRY#
class bs4.formatter.XMLFormatter(*args, **kwargs)#

Bases: Formatter

A generic Formatter for XML.

REGISTRY#