bs4.builder._htmlparser#

Use the HTMLParser library to parse HTML files that aren’t too bad.

Module Contents#

Classes#

HTMLParserTreeBuilder

A Beautiful soup TreeBuilder that uses the HTMLParser parser,

class bs4.builder._htmlparser.HTMLParserTreeBuilder(parser_args=None, parser_kwargs=None, **kwargs)#

Bases: bs4.builder.HTMLTreeBuilder

A Beautiful soup TreeBuilder that uses the HTMLParser parser, found in the Python standard library.

is_xml = False#
picklable = True#
NAME#
features#
TRACKS_LINE_NUMBERS = True#
prepare_markup(markup, user_specified_encoding=None, document_declared_encoding=None, exclude_encodings=None)#

Run any preliminary steps necessary to make incoming markup acceptable to the parser.

Parameters:
  • markup – Some markup – probably a bytestring.

  • user_specified_encoding – The user asked to try this encoding.

  • document_declared_encoding – The markup itself claims to be in this encoding.

  • exclude_encodings – The user asked _not_ to try any of these encodings.

Yield:

A series of 4-tuples: (markup, encoding, declared encoding,

has undergone character replacement)

Each 4-tuple represents a strategy for converting the document to Unicode and parsing it. Each strategy will be tried in turn.

feed(markup)#

Run some incoming markup through some parsing process, populating the BeautifulSoup object in self.soup.