bs4.diagnose#

Diagnostic functions, mainly for use when doing tech support.

Module Contents#

Classes#

AnnouncingParser

Subclass of HTMLParser that announces parse events, without doing

Functions#

diagnose(data)

Diagnostic suite for isolating common problems.

lxml_trace(data[, html])

Print out the lxml events that occur during parsing.

htmlparser_trace(data)

Print out the HTMLParser events that occur during parsing.

rword([length])

Generate a random word-like string.

rsentence([length])

Generate a random sentence-like string.

rdoc([num_elements])

Randomly generate an invalid HTML document.

benchmark_parsers([num_elements])

Very basic head-to-head performance benchmark.

profile([num_elements, parser])

Use Python's profiler on a randomly generated document.

Attributes#

bs4.diagnose.__license__ = 'MIT'#
bs4.diagnose.diagnose(data)#

Diagnostic suite for isolating common problems.

Parameters:

data – A string containing markup that needs to be explained.

Returns:

None; diagnostics are printed to standard output.

bs4.diagnose.lxml_trace(data, html=True, **kwargs)#

Print out the lxml events that occur during parsing.

This lets you see how lxml parses a document when no Beautiful Soup code is running. You can use this to determine whether an lxml-specific problem is in Beautiful Soup’s lxml tree builders or in lxml itself.

Parameters:
  • data – Some markup.

  • html – If True, markup will be parsed with lxml’s HTML parser. if False, lxml’s XML parser will be used.

class bs4.diagnose.AnnouncingParser(*, convert_charrefs=True)#

Bases: html.parser.HTMLParser

Subclass of HTMLParser that announces parse events, without doing anything else.

You can use this to get a picture of how html.parser sees a given document. The easiest way to do this is to call htmlparser_trace.

_p(s)#
handle_starttag(name, attrs)#
handle_endtag(name)#
handle_data(data)#
handle_charref(name)#
handle_entityref(name)#
handle_comment(data)#
handle_decl(data)#
unknown_decl(data)#
handle_pi(data)#
bs4.diagnose.htmlparser_trace(data)#

Print out the HTMLParser events that occur during parsing.

This lets you see how HTMLParser parses a document when no Beautiful Soup code is running.

Parameters:

data – Some markup.

bs4.diagnose._vowels = 'aeiou'#
bs4.diagnose._consonants = 'bcdfghjklmnpqrstvwxyz'#
bs4.diagnose.rword(length=5)#

Generate a random word-like string.

bs4.diagnose.rsentence(length=4)#

Generate a random sentence-like string.

bs4.diagnose.rdoc(num_elements=1000)#

Randomly generate an invalid HTML document.

bs4.diagnose.benchmark_parsers(num_elements=100000)#

Very basic head-to-head performance benchmark.

bs4.diagnose.profile(num_elements=100000, parser='lxml')#

Use Python’s profiler on a randomly generated document.