bs4.builder._htmlparser
#
Use the HTMLParser library to parse HTML files that aren’t too bad.
Module Contents#
Classes#
A Beautiful soup TreeBuilder that uses the HTMLParser parser, |
- class bs4.builder._htmlparser.HTMLParserTreeBuilder(parser_args=None, parser_kwargs=None, **kwargs)#
Bases:
bs4.builder.HTMLTreeBuilder
A Beautiful soup TreeBuilder that uses the HTMLParser parser, found in the Python standard library.
- is_xml = False#
- picklable = True#
- NAME#
- features#
- TRACKS_LINE_NUMBERS = True#
- prepare_markup(markup, user_specified_encoding=None, document_declared_encoding=None, exclude_encodings=None)#
Run any preliminary steps necessary to make incoming markup acceptable to the parser.
- Parameters:
markup – Some markup – probably a bytestring.
user_specified_encoding – The user asked to try this encoding.
document_declared_encoding – The markup itself claims to be in this encoding.
exclude_encodings – The user asked _not_ to try any of these encodings.
- Yield:
A series of 4-tuples: (markup, encoding, declared encoding,
has undergone character replacement)
Each 4-tuple represents a strategy for converting the document to Unicode and parsing it. Each strategy will be tried in turn.
- feed(markup)#
Run some incoming markup through some parsing process, populating the BeautifulSoup object in self.soup.