Skip to content Skip to sidebar Skip to footer

Extracting Inner Text From HTML BODY Node With Html Agility Pack

Need a bit of help with HTML Agility Pack! Basically I want to grab plain-text withing the body node of the HTML. So far I have tried this in vb.net and it fails to return the inn

Solution 1:

How about:

Return htmldoc.DocumentNode.SelectSingleNode("//body").InnerText

Solution 2:

Jeff's solution is ok if you haven't tables, because text located in the table is sticking like cell1cell2cell3. To prevent this issue use this code (C# example):

var words = doc.DocumentNode?.SelectNodes("//body//text()")?.Select(x => x.InnerText);
return words != null ? string.Join(" ", words) : String.Empty;

Post a Comment for "Extracting Inner Text From HTML BODY Node With Html Agility Pack"