`bs4.builder._htmlparser`#

Use the HTMLParser library to parse HTML files that aren’t too bad.

Module Contents#

A Beautiful soup TreeBuilder that uses the HTMLParser parser,

class bs4.builder._htmlparser.HTMLParserTreeBuilder(parser_args=None, parser_kwargs=None, **kwargs)#

A Beautiful soup TreeBuilder that uses the HTMLParser parser, found in the Python standard library.

prepare_markup(markup, user_specified_encoding=None, document_declared_encoding=None, exclude_encodings=None)#

Run any preliminary steps necessary to make incoming markup acceptable to the parser.

Parameters:

markup – Some markup – probably a bytestring.
user_specified_encoding – The user asked to try this encoding.
document_declared_encoding – The markup itself claims to be in this encoding.
exclude_encodings – The user asked _not_ to try any of these encodings.

Yield:

A series of 4-tuples: (markup, encoding, declared encoding,

has undergone character replacement)

Each 4-tuple represents a strategy for converting the document to Unicode and parsing it. Each strategy will be tried in turn.

feed(markup)#: Run some incoming markup through some parsing process, populating the BeautifulSoup object in self.soup.