Based on my experience, the need of parsing and manipulating HTML appearing surprisingly often. It may be required to clean a HTML file created by tools like Word or FrontPage (these tools are great for the end users, but inject lots of unnecessary information). Or parsing a webpage, or trying to construct a HTML page programmatically.
In all these cases, HtmlAgilityPack may be a handy tool. It allows to load, parse and modify a “real-world” HTML – HTML files which are not necessary clean and well formatted. Even better, for the parsed files, it builds a XML-like DOM which supports XPath and LINQ.
It is easy to learn and the simple example looks like
var doc = new HtmlDocument(); doc.LoadHtml(html); var docNode = doc.DocumentNode; var content = docNode.Descendants() .First(x => x.GetAttributeValue("class", "") .Equals("icon")).InnerText;
This sample code returns content for the first item with the “icon” class.
This is a simple, but very useful library, so check it out at htmlagilitypack.codeplex.comblog comments powered by Disqus