`bs4.diagnose`#

Diagnostic functions, mainly for use when doing tech support.

Module Contents#

Classes#

AnnouncingParser

Subclass of HTMLParser that announces parse events, without doing

Functions#

`diagnose`(data)	Diagnostic suite for isolating common problems.
`lxml_trace`(data[, html])	Print out the lxml events that occur during parsing.
`htmlparser_trace`(data)	Print out the HTMLParser events that occur during parsing.
`rword`([length])	Generate a random word-like string.
`rsentence`([length])	Generate a random sentence-like string.
`rdoc`([num_elements])	Randomly generate an invalid HTML document.
`benchmark_parsers`([num_elements])	Very basic head-to-head performance benchmark.
`profile`([num_elements, parser])	Use Python's profiler on a randomly generated document.

Attributes#

`__license__`
`_vowels`
`_consonants`

bs4.diagnose.__license__ = 'MIT'#

bs4.diagnose.diagnose(data)#

Diagnostic suite for isolating common problems.

Parameters:: data – A string containing markup that needs to be explained.
Returns:: None; diagnostics are printed to standard output.

bs4.diagnose.lxml_trace(data, html=True, **kwargs)#

Print out the lxml events that occur during parsing.

This lets you see how lxml parses a document when no Beautiful Soup code is running. You can use this to determine whether an lxml-specific problem is in Beautiful Soup’s lxml tree builders or in lxml itself.

Parameters:

data – Some markup.
html – If True, markup will be parsed with lxml’s HTML parser. if False, lxml’s XML parser will be used.

class bs4.diagnose.AnnouncingParser(*, convert_charrefs=True)#

Bases: html.parser.HTMLParser

Subclass of HTMLParser that announces parse events, without doing anything else.

You can use this to get a picture of how html.parser sees a given document. The easiest way to do this is to call htmlparser_trace.

_p(s)#

handle_starttag(name, attrs)#

handle_endtag(name)#

handle_data(data)#

handle_charref(name)#

handle_entityref(name)#

handle_comment(data)#

handle_decl(data)#

unknown_decl(data)#

handle_pi(data)#

bs4.diagnose.htmlparser_trace(data)#

Print out the HTMLParser events that occur during parsing.

This lets you see how HTMLParser parses a document when no Beautiful Soup code is running.

Parameters:: data – Some markup.

bs4.diagnose._vowels = 'aeiou'#

bs4.diagnose._consonants = 'bcdfghjklmnpqrstvwxyz'#

bs4.diagnose.rword(length=5)#: Generate a random word-like string.

bs4.diagnose.rsentence(length=4)#: Generate a random sentence-like string.

bs4.diagnose.rdoc(num_elements=1000)#: Randomly generate an invalid HTML document.

bs4.diagnose.benchmark_parsers(num_elements=100000)#: Very basic head-to-head performance benchmark.

bs4.diagnose.profile(num_elements=100000, parser='lxml')#: Use Python’s profiler on a randomly generated document.

bs4.diagnose#

Module Contents#

Classes#

Functions#

Attributes#

`bs4.diagnose`#