In Files

Parent

Files

Bcat::HeadParser

Parses HTML until the first displayable body character and provides methods for accessing head and body contents.

Constants

BODY_TOKS
HEAD_TOKS

Attributes

buf[RW]

Public Class Methods

new() click to toggle source
# File lib/bcat/html.rb, line 8
def initialize
  @buf = ''
  @head = []
  @body = nil
  @html = nil
end

Public Instance Methods

body(inject=nil) click to toggle source

The current body contents. The <body> tag is guaranteed to be present. If a <body> was included in the input, it’s preserved with original attributes; otherwise, a <body> tag is inserted. The inject argument can be used to insert a string as the immediate descendant of the <body> tag.

# File lib/bcat/html.rb, line 49
def body(inject=nil)
  if @body =~ /\A\s*(<body.*?>)(.*)/i
    [$1, inject, $2].compact.join("\n")
  else
    ["<body>", inject, @body].compact.join("\n")
  end
end
complete?() click to toggle source

Truthy once the first displayed character of the body has arrived.

# File lib/bcat/html.rb, line 27
def complete?
  !@body.nil?
end
feed(data) click to toggle source

Called to parse new data as it arrives.

# File lib/bcat/html.rb, line 16
def feed(data)
  if complete?
    @body << data
  else
    @buf << data
    parse(@buf)
  end
  complete?
end
head() click to toggle source

The head contents without any DOCTYPE, <html>, or <head> tags. This should consist of only <style>, <script>, <link>, <meta>, and <title> tags.

# File lib/bcat/html.rb, line 41
def head
  @head.join.gsub(/<\/?(?:html|head|!DOCTYPE).*?>/i, '')
end
html?() click to toggle source

Determine if the input is HTML. This nil before the first non-whitespace character is received, true if the first non-whitespace character is a ‘<’, and false if the first non-whitespace character is something other than ‘<’.

# File lib/bcat/html.rb, line 35
def html?
  @html
end
parse(buf=@buf) click to toggle source

Parses buf into head and body parts. Basic approach is to eat anything possibly body related until we hit text or a body element.

# File lib/bcat/html.rb, line 74
def parse(buf=@buf)
  if @html.nil?
    if buf =~ /\A\s*[<]/
      @html = true
    elsif buf =~ /\A\s*[^<]/
      @html = false
    end
  end

  while !buf.empty?
    buf.sub!(/\A(\s+)/) { @head << $1 ; '' }
    matched =
      HEAD_TOKS.any? do |tok|
        buf.sub!(tok) do
          @head << $1
          ''
        end
      end
    break unless matched
  end


  if buf.empty?
    buf
  elsif BODY_TOKS.any? { |tok| buf =~ tok }
    @body = buf
    nil
  else
    buf
  end
end

[Validate]

Generated with the Darkfish Rdoc Generator 2.