Skip to content

Commit

Permalink
Modernize and tighten up HTMLDocumentParser
Browse files Browse the repository at this point in the history
https://bugs.webkit.org/show_bug.cgi?id=140041

Reviewed by Sam Weinig.

* dom/DocumentFragment.cpp:
(WebCore::DocumentFragment::parseHTML): Pass a reference instead of
a pointer for the context element.

* html/FTPDirectoryDocument.cpp: Removed unneeded includes, made more
things in FTPDirectoryDocumentParser private. Use Ref instead of RefPtr
in a could places. Initialize in class instead of in constructor.
(WebCore::FTPDirectoryDocumentParser::FTPDirectoryDocumentParser):
Less initialization here.
(WebCore::FTPDirectoryDocumentParser::createTDForFilename): More Ref here.
(WebCore::createTemplateDocumentData): Removed unneeded initialization
of RefPtr, which is initialized without explicitly asking for it.
(WebCore::FTPDirectoryDocumentParser::loadDocumentTemplate): Reworded
comment slightly.

* html/parser/HTMLDocumentParser.cpp: Cut down on includes.
(WebCore::tokenizerStateForContextElement): Fixed URL. Changed argument
to be a reference rather than a pointer.
(WebCore::HTMLDocumentParser::inPumpSession):
(WebCore::HTMLDocumentParser::shouldDelayEnd):
(WebCore::HTMLDocumentParser::HTMLDocumentParser): Marked constructors
inline. Updated for data members that are now objects instead of pointers.
Removed explicit initialization for scalars that are now initialized in
the class definition.
(WebCore::HTMLDocumentParser::create): Moved the private creation
functions in here, out of the header file.
(WebCore::HTMLDocumentParser::~HTMLDocumentParser): Removed unused
m_haveBackgroundParser.
(WebCore::HTMLDocumentParser::prepareToStopParsing): Updated URL and
removed m_haveBackgroundParser reference.
(WebCore::HTMLDocumentParser::processingData): Removed a check of
m_haveBackgroundParser.
(WebCore::HTMLDocumentParser::resumeParsingAfterYield): Tweak comment.
(WebCore::HTMLDocumentParser::runScriptsForPausedTreeBuilder): Added
a null check of the result of takeScriptToProcess, since there really
is no guarantee it's non-null.
(WebCore::HTMLDocumentParser::canTakeNextToken): Removed assertion
that was for m_haveBackgroundParser cases only. Rewrapped comment.
(WebCore::HTMLDocumentParser::contextForParsingSession): Use nullptr.
(WebCore::HTMLDocumentParser::pumpTokenizer): Rework comments,
remove assertions that no longer make sense, use auto instead of
repeating a long type name, update to use m_token and m_tokenizer.
(WebCore::HTMLDocumentParser::hasInsertionPoint): Rewrapped comment.
(WebCore::HTMLDocumentParser::insert): Got rid of braces around a
single-line if body.
(WebCore::HTMLDocumentParser::attemptToRunDeferredScriptsAndEnd):
Removed comment about incorrect m_haveBackgroundParser assertion.
(WebCore::HTMLDocumentParser::isExecutingScript): Use && style instead
of early exit for a null check.
(WebCore::HTMLDocumentParser::textPosition): Tightened up code a little.
(WebCore::HTMLDocumentParser::resumeParsingAfterScriptExecution): Added
a Ref to protect the parser, as is already done in every other function
that calls pumpTokenizerIfPossible.
(WebCore::HTMLDocumentParser::parseDocumentFragment): Take a reference
instead of a pointer. Also use auto so we get a Ref instead of a RefPtr.

* html/parser/HTMLDocumentParser.h: Removed unneeded includes.
Made private inheritance explicit instead of just omitting public.
Moved function bodies out of the class, and in some cases, out of the
header entirely. Return a reference from tokenizer(). Marked most
virtual functions final. Made DocumentFragment version of the
constructor private rather than protected. Made the functions
suspendScheduledTasks() and resumeScheduledTasks() private, since
they are always called through a base class. Removed the private
token function since it is better to get at m_token directly.
Removed m_haveBackgroundParser, since we don't have that any more
and it's always false. Also removed forcePlaintextForTextDocument
since the tokenizer is exposed and can be used directly to do that.

* html/parser/HTMLTreeBuilder.cpp:
(WebCore::HTMLTreeBuilder::HTMLTreeBuilder): Made the parser non-const.
It could only be const before because HTMLDocumentParser::tokenizer
took a const parser and returned a non-const tokenizer, but that doesn't
really make sense.
(WebCore::HTMLTreeBuilder::constructTree): Removed null check for
tokenizer, which was never null. Updated since tokenizer is a reference.
(WebCore::HTMLTreeBuilder::processStartTagForInBody): Ditto.
(WebCore::HTMLTreeBuilder::processStartTagForInTable): Ditto.
(WebCore::HTMLTreeBuilder::processEndTag): Ditto. Also fixed and removed
some assertions like the ones I did recently in the rest of this file.
(WebCore::HTMLTreeBuilder::processGenericRCDATAStartTag): Ditto.
(WebCore::HTMLTreeBuilder::processGenericRawTextStartTag): Ditto.
(WebCore::HTMLTreeBuilder::processScriptStartTag): Ditto.

* html/parser/HTMLTreeBuilder.h: Made HTMLDocumentParser& non-const.

* html/parser/TextDocumentParser.cpp: Removed unneeded include and
unneeded explicit destructor.
(WebCore::TextDocumentParser::TextDocumentParser): Updated since
treeBuilder() returns a reference now, and set the tokenizer state
directly since tokenizer() is exposed.

* html/parser/TextDocumentParser.h: Moved initialization of the
data member here instead of the constructor. Also removed unneeded
explicitly defined destructor.

git-svn-id: http://svn.webkit.org/repository/webkit/trunk@177883 268f45cc-cd09-0410-ab3c-d52691b4dbfc
  • Loading branch information
[email protected] committed Jan 5, 2015
1 parent 9b40929 commit dbefd32
Show file tree
Hide file tree
Showing 9 changed files with 281 additions and 222 deletions.
103 changes: 103 additions & 0 deletions Source/WebCore/ChangeLog
Original file line number Diff line number Diff line change
@@ -1,3 +1,106 @@
2015-01-04 Darin Adler <[email protected]>

Modernize and tighten up HTMLDocumentParser
https://bugs.webkit.org/show_bug.cgi?id=140041

Reviewed by Sam Weinig.

* dom/DocumentFragment.cpp:
(WebCore::DocumentFragment::parseHTML): Pass a reference instead of
a pointer for the context element.

* html/FTPDirectoryDocument.cpp: Removed unneeded includes, made more
things in FTPDirectoryDocumentParser private. Use Ref instead of RefPtr
in a could places. Initialize in class instead of in constructor.
(WebCore::FTPDirectoryDocumentParser::FTPDirectoryDocumentParser):
Less initialization here.
(WebCore::FTPDirectoryDocumentParser::createTDForFilename): More Ref here.
(WebCore::createTemplateDocumentData): Removed unneeded initialization
of RefPtr, which is initialized without explicitly asking for it.
(WebCore::FTPDirectoryDocumentParser::loadDocumentTemplate): Reworded
comment slightly.

* html/parser/HTMLDocumentParser.cpp: Cut down on includes.
(WebCore::tokenizerStateForContextElement): Fixed URL. Changed argument
to be a reference rather than a pointer.
(WebCore::HTMLDocumentParser::inPumpSession):
(WebCore::HTMLDocumentParser::shouldDelayEnd):
(WebCore::HTMLDocumentParser::HTMLDocumentParser): Marked constructors
inline. Updated for data members that are now objects instead of pointers.
Removed explicit initialization for scalars that are now initialized in
the class definition.
(WebCore::HTMLDocumentParser::create): Moved the private creation
functions in here, out of the header file.
(WebCore::HTMLDocumentParser::~HTMLDocumentParser): Removed unused
m_haveBackgroundParser.
(WebCore::HTMLDocumentParser::prepareToStopParsing): Updated URL and
removed m_haveBackgroundParser reference.
(WebCore::HTMLDocumentParser::processingData): Removed a check of
m_haveBackgroundParser.
(WebCore::HTMLDocumentParser::resumeParsingAfterYield): Tweak comment.
(WebCore::HTMLDocumentParser::runScriptsForPausedTreeBuilder): Added
a null check of the result of takeScriptToProcess, since there really
is no guarantee it's non-null.
(WebCore::HTMLDocumentParser::canTakeNextToken): Removed assertion
that was for m_haveBackgroundParser cases only. Rewrapped comment.
(WebCore::HTMLDocumentParser::contextForParsingSession): Use nullptr.
(WebCore::HTMLDocumentParser::pumpTokenizer): Rework comments,
remove assertions that no longer make sense, use auto instead of
repeating a long type name, update to use m_token and m_tokenizer.
(WebCore::HTMLDocumentParser::hasInsertionPoint): Rewrapped comment.
(WebCore::HTMLDocumentParser::insert): Got rid of braces around a
single-line if body.
(WebCore::HTMLDocumentParser::attemptToRunDeferredScriptsAndEnd):
Removed comment about incorrect m_haveBackgroundParser assertion.
(WebCore::HTMLDocumentParser::isExecutingScript): Use && style instead
of early exit for a null check.
(WebCore::HTMLDocumentParser::textPosition): Tightened up code a little.
(WebCore::HTMLDocumentParser::resumeParsingAfterScriptExecution): Added
a Ref to protect the parser, as is already done in every other function
that calls pumpTokenizerIfPossible.
(WebCore::HTMLDocumentParser::parseDocumentFragment): Take a reference
instead of a pointer. Also use auto so we get a Ref instead of a RefPtr.

* html/parser/HTMLDocumentParser.h: Removed unneeded includes.
Made private inheritance explicit instead of just omitting public.
Moved function bodies out of the class, and in some cases, out of the
header entirely. Return a reference from tokenizer(). Marked most
virtual functions final. Made DocumentFragment version of the
constructor private rather than protected. Made the functions
suspendScheduledTasks() and resumeScheduledTasks() private, since
they are always called through a base class. Removed the private
token function since it is better to get at m_token directly.
Removed m_haveBackgroundParser, since we don't have that any more
and it's always false. Also removed forcePlaintextForTextDocument
since the tokenizer is exposed and can be used directly to do that.

* html/parser/HTMLTreeBuilder.cpp:
(WebCore::HTMLTreeBuilder::HTMLTreeBuilder): Made the parser non-const.
It could only be const before because HTMLDocumentParser::tokenizer
took a const parser and returned a non-const tokenizer, but that doesn't
really make sense.
(WebCore::HTMLTreeBuilder::constructTree): Removed null check for
tokenizer, which was never null. Updated since tokenizer is a reference.
(WebCore::HTMLTreeBuilder::processStartTagForInBody): Ditto.
(WebCore::HTMLTreeBuilder::processStartTagForInTable): Ditto.
(WebCore::HTMLTreeBuilder::processEndTag): Ditto. Also fixed and removed
some assertions like the ones I did recently in the rest of this file.
(WebCore::HTMLTreeBuilder::processGenericRCDATAStartTag): Ditto.
(WebCore::HTMLTreeBuilder::processGenericRawTextStartTag): Ditto.
(WebCore::HTMLTreeBuilder::processScriptStartTag): Ditto.

* html/parser/HTMLTreeBuilder.h: Made HTMLDocumentParser& non-const.

* html/parser/TextDocumentParser.cpp: Removed unneeded include and
unneeded explicit destructor.
(WebCore::TextDocumentParser::TextDocumentParser): Updated since
treeBuilder() returns a reference now, and set the tokenizer state
directly since tokenizer() is exposed.

* html/parser/TextDocumentParser.h: Moved initialization of the
data member here instead of the constructor. Also removed unneeded
explicitly defined destructor.

2015-01-04 Antti Koivisto <[email protected]>

Remove GlyphPageTree
Expand Down
4 changes: 3 additions & 1 deletion Source/WebCore/dom/DocumentFragment.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -82,11 +82,13 @@ RefPtr<Node> DocumentFragment::cloneNodeInternal(Document& targetDocument, Cloni

void DocumentFragment::parseHTML(const String& source, Element* contextElement, ParserContentPolicy parserContentPolicy)
{
HTMLDocumentParser::parseDocumentFragment(source, *this, contextElement, parserContentPolicy);
ASSERT(contextElement);
HTMLDocumentParser::parseDocumentFragment(source, *this, *contextElement, parserContentPolicy);
}

bool DocumentFragment::parseXML(const String& source, Element* contextElement, ParserContentPolicy parserContentPolicy)
{
ASSERT(contextElement);
return XMLDocumentParser::parseDocumentFragment(source, *this, contextElement, parserContentPolicy);
}

Expand Down
35 changes: 14 additions & 21 deletions Source/WebCore/html/FTPDirectoryDocument.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (C) 2007, 2008, 2014 Apple Inc. All rights reserved.
* Copyright (C) 2007-2008, 2014-2015 Apple Inc. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
Expand All @@ -26,20 +26,15 @@
#if ENABLE(FTPDIR)
#include "FTPDirectoryDocument.h"

#include "ExceptionCodePlaceholder.h"
#include "HTMLDocumentParser.h"
#include "HTMLNames.h"
#include "HTMLTableElement.h"
#include "LocalizedStrings.h"
#include "Logging.h"
#include "FTPDirectoryParser.h"
#include "Settings.h"
#include "SharedBuffer.h"
#include "Text.h"
#include <wtf/CurrentTime.h>
#include <wtf/GregorianDateTime.h>
#include <wtf/StdLibExtras.h>
#include <wtf/text/CString.h>
#include <wtf/unicode/CharacterNames.h>

namespace WebCore {
Expand All @@ -53,12 +48,14 @@ class FTPDirectoryDocumentParser final : public HTMLDocumentParser {
return adoptRef(*new FTPDirectoryDocumentParser(document));
}

private:
virtual void append(PassRefPtr<StringImpl>) override;
virtual void finish() override;

// FIXME: Why do we need this?
virtual bool isWaitingForScripts() const override { return false; }

inline void checkBuffer(int len = 10)
void checkBuffer(int len = 10)
{
if ((m_dest - m_buffer) > m_size - len) {
// Enlarge buffer
Expand All @@ -69,8 +66,7 @@ class FTPDirectoryDocumentParser final : public HTMLDocumentParser {
m_size = newSize;
}
}

private:

FTPDirectoryDocumentParser(HTMLDocument&);

// The parser will attempt to load the document template specified via the preference
Expand All @@ -81,13 +77,13 @@ class FTPDirectoryDocumentParser final : public HTMLDocumentParser {

void parseAndAppendOneLine(const String&);
void appendEntry(const String& name, const String& size, const String& date, bool isDirectory);
RefPtr<Element> createTDForFilename(const String&);
Ref<Element> createTDForFilename(const String&);

RefPtr<HTMLTableElement> m_tableElement;

bool m_skipLF;
bool m_skipLF { false };

int m_size;
int m_size { 254 };
UChar* m_buffer;
UChar* m_dest;
String m_carryOver;
Expand All @@ -97,8 +93,6 @@ class FTPDirectoryDocumentParser final : public HTMLDocumentParser {

FTPDirectoryDocumentParser::FTPDirectoryDocumentParser(HTMLDocument& document)
: HTMLDocumentParser(document)
, m_skipLF(false)
, m_size(254)
, m_buffer(static_cast<UChar*>(fastMalloc(sizeof(UChar) * m_size)))
, m_dest(m_buffer)
{
Expand Down Expand Up @@ -132,7 +126,7 @@ void FTPDirectoryDocumentParser::appendEntry(const String& filename, const Strin
rowElement->appendChild(element, IGNORE_EXCEPTION);
}

RefPtr<Element> FTPDirectoryDocumentParser::createTDForFilename(const String& filename)
Ref<Element> FTPDirectoryDocumentParser::createTDForFilename(const String& filename)
{
String fullURL = document()->baseURL().string();
if (fullURL.endsWith('/'))
Expand All @@ -144,10 +138,10 @@ RefPtr<Element> FTPDirectoryDocumentParser::createTDForFilename(const String& fi
anchorElement->setAttribute(HTMLNames::hrefAttr, fullURL);
anchorElement->appendChild(Text::create(*document(), filename), IGNORE_EXCEPTION);

RefPtr<Element> tdElement = document()->createElement(tdTag, false);
Ref<Element> tdElement = document()->createElement(tdTag, false);
tdElement->appendChild(anchorElement, IGNORE_EXCEPTION);

return tdElement.release();
return tdElement;
}

static String processFilesizeString(const String& size, bool isDirectory)
Expand Down Expand Up @@ -275,7 +269,7 @@ void FTPDirectoryDocumentParser::parseAndAppendOneLine(const String& inputLine)

static inline RefPtr<SharedBuffer> createTemplateDocumentData(Settings* settings)
{
RefPtr<SharedBuffer> buffer = 0;
RefPtr<SharedBuffer> buffer;
if (settings)
buffer = SharedBuffer::createWithContentsOfFile(settings->ftpDirectoryTemplatePath());
if (buffer)
Expand All @@ -286,9 +280,8 @@ static inline RefPtr<SharedBuffer> createTemplateDocumentData(Settings* settings
bool FTPDirectoryDocumentParser::loadDocumentTemplate()
{
static SharedBuffer* templateDocumentData = createTemplateDocumentData(document()->settings()).release().leakRef();
// FIXME: Instead of storing the data, we'd rather actually parse the template data into the template Document once,
// store that document, then "copy" it whenever we get an FTP directory listing. There are complexities with this
// approach that make it worth putting this off.
// FIXME: Instead of storing the data, it would be more efficient if we could parse the template data into the
// template Document once, store that document, then "copy" it whenever we get an FTP directory listing.

if (!templateDocumentData) {
LOG_ERROR("Could not load templateData");
Expand Down
Loading

0 comments on commit dbefd32

Please sign in to comment.