Parses HTML until the first displayable body character and provides methods for accessing head and body contents.
The current body contents. The <body> tag is guaranteed to be present. If a <body> was included in the input, it’s preserved with original attributes; otherwise, a <body> tag is inserted. The inject argument can be used to insert a string as the immediate descendant of the <body> tag.
# File lib/bcat/html.rb, line 49 def body(inject=nil) if @body =~ /\A\s*(<body.*?>)(.*)/i [$1, inject, $2].compact.join("\n") else ["<body>", inject, @body].compact.join("\n") end end
Truthy once the first displayed character of the body has arrived.
# File lib/bcat/html.rb, line 27 def complete? !@body.nil? end
Called to parse new data as it arrives.
# File lib/bcat/html.rb, line 16 def feed(data) if complete? @body << data else @buf << data parse(@buf) end complete? end
The head contents without any DOCTYPE, <html>, or <head> tags. This should consist of only <style>, <script>, <link>, <meta>, and <title> tags.
# File lib/bcat/html.rb, line 41 def head @head.join.gsub(/<\/?(?:html|head|!DOCTYPE).*?>/i, '') end
Determine if the input is HTML. This nil before the first non-whitespace character is received, true if the first non-whitespace character is a ‘<’, and false if the first non-whitespace character is something other than ‘<’.
# File lib/bcat/html.rb, line 35 def html? @html end
Parses buf into head and body parts. Basic approach is to eat anything possibly body related until we hit text or a body element.
# File lib/bcat/html.rb, line 74 def parse(buf=@buf) if @html.nil? if buf =~ /\A\s*[<]/ @html = true elsif buf =~ /\A\s*[^<]/ @html = false end end while !buf.empty? buf.sub!(/\A(\s+)/) { @head << $1 ; '' } matched = HEAD_TOKS.any? do |tok| buf.sub!(tok) do @head << $1 '' end end break unless matched end if buf.empty? buf elsif BODY_TOKS.any? { |tok| buf =~ tok } @body = buf nil else buf end end
Generated with the Darkfish Rdoc Generator 2.