Convert HTML to JSON with this .NET package. Update to Jsonize project.
From version 3.1.0, you can now pass a CancellationToken
to allow cancellation of the parsing methods.
This is an optional parameter, so will not break any existing code.
From version 3.1.0 you can now directly pass in a Stream
object for your HTML to be parsed.
The methods accepting a Stream
are overloads on the same method names as before.
There's no real performance increase, but you'll no longer have to process the Stream
to a string
yourself first!
Version 3.0.0
introduced a major performance regression that could make the run time many hundred times worse
(compared to version 1.0.9)!.
Please upgrade to version 3.1.0
as 3.0.0
will be deprecated once the new package is pushed to NuGet.
You will also get some nice extra methods for working with Stream
objects.
Get the NuGet packages:
Package | Build Status | NuGet Version |
---|---|---|
Jsonize | ||
Jsonize.Abstractions | ||
Jsonize.Parser | ||
Jsonize.Serializer | ||
Jsonize.Serializer.Json.Net |
An example to get the site "https://jackfinlay.com" as a JSON string:
using Jsonize;
using Jsonize.Parser;
using Jsonize.Serializer;
var url = @"https://jackfinlay.com";
Console.WriteLine(await JsonizeTest(url));
static async Task<string> JsonizeTest(string url)
{
using var client = new HttpClient();
var response = await client.GetAsync(url);
var html = await response.Content.ReadAsStringAsync();
var parser = new JsonizeParser();
var serializer = new JsonizeSerializer();
var jsonizer = new Jsonizer(parser, serializer);
return await jsonizer.ParseToStringAsync(html);
}
Alternatively, get the response as a JsonizeNode
:
return jsonizer.ParseToJsonizeNodeAsync();
You can control the output with a JsonizeParserConfiguration
object, which is passed as a parameter to the constructor of the IJsonizeParser
of choice:
JsonizeParserConfiguration parserConfiguration = new JsonizeParserConfiguration()
{
NullValueHandling = NullValueHandling.Ignore,
EmptyTextNodeHandling = EmptyTextNodeHandling.Ignore,
TextTrimHandling = TextTrimHandling.Trim,
ClassAttributeHandling = ClassAttributeHandling.Array
}
JsonizeConfiguration jsonizeConfiguration = new JsonizeConfiguration
{
Parser = new JsonizeParser(parserConfiguration),
Serializer = new JsonizeSerializer()
};
Jsonizer jsonizer = new Jsonizer(jsonizeConfiguration);
Results are in the form:
{
"nodeType":"Node type e.g. Document, Element, or Comment",
"tag":"If node is Element this will display the tag e.g p, h1 ,div etc.",
"text":"If node is of type Text, this will display the text in that node.",
"attr":{
"name":"value",
"class": []
},
"children":[
{
"nodeType":"Node type e.g. Document, Element, or Comment",
"tag":"If node is Element this will display the tag e.g p, h1 ,div etc.",
"text":"If node is of type Text, this will display the text in that node.",
"child": []
}
]
}
MIT
See license.md for details.