bs4.diagnose
#
Diagnostic functions, mainly for use when doing tech support.
Module Contents#
Classes#
Subclass of HTMLParser that announces parse events, without doing |
Functions#
|
Diagnostic suite for isolating common problems. |
|
Print out the lxml events that occur during parsing. |
|
Print out the HTMLParser events that occur during parsing. |
|
Generate a random word-like string. |
|
Generate a random sentence-like string. |
|
Randomly generate an invalid HTML document. |
|
Very basic head-to-head performance benchmark. |
|
Use Python's profiler on a randomly generated document. |
Attributes#
- bs4.diagnose.__license__ = 'MIT'#
- bs4.diagnose.diagnose(data)#
Diagnostic suite for isolating common problems.
- Parameters:
data – A string containing markup that needs to be explained.
- Returns:
None; diagnostics are printed to standard output.
- bs4.diagnose.lxml_trace(data, html=True, **kwargs)#
Print out the lxml events that occur during parsing.
This lets you see how lxml parses a document when no Beautiful Soup code is running. You can use this to determine whether an lxml-specific problem is in Beautiful Soup’s lxml tree builders or in lxml itself.
- Parameters:
data – Some markup.
html – If True, markup will be parsed with lxml’s HTML parser. if False, lxml’s XML parser will be used.
- class bs4.diagnose.AnnouncingParser(*, convert_charrefs=True)#
Bases:
html.parser.HTMLParser
Subclass of HTMLParser that announces parse events, without doing anything else.
You can use this to get a picture of how html.parser sees a given document. The easiest way to do this is to call htmlparser_trace.
- _p(s)#
- handle_starttag(name, attrs)#
- handle_endtag(name)#
- handle_data(data)#
- handle_charref(name)#
- handle_entityref(name)#
- handle_comment(data)#
- handle_decl(data)#
- unknown_decl(data)#
- handle_pi(data)#
- bs4.diagnose.htmlparser_trace(data)#
Print out the HTMLParser events that occur during parsing.
This lets you see how HTMLParser parses a document when no Beautiful Soup code is running.
- Parameters:
data – Some markup.
- bs4.diagnose._vowels = 'aeiou'#
- bs4.diagnose._consonants = 'bcdfghjklmnpqrstvwxyz'#
- bs4.diagnose.rword(length=5)#
Generate a random word-like string.
- bs4.diagnose.rsentence(length=4)#
Generate a random sentence-like string.
- bs4.diagnose.rdoc(num_elements=1000)#
Randomly generate an invalid HTML document.
- bs4.diagnose.benchmark_parsers(num_elements=100000)#
Very basic head-to-head performance benchmark.
- bs4.diagnose.profile(num_elements=100000, parser='lxml')#
Use Python’s profiler on a randomly generated document.