C# And HtmlAgilityPack Encoding Problem
WebClient GodLikeClient = new WebClient(); HtmlAgilityPack.HtmlDocument GodLikeHTML = new HtmlAgilityPack.HtmlDocument(); GodLikeHTML.Load(GodLikeClient.OpenRead('www.alfa.lt');
Solution 1:
Actually the page is encoded with UTF-8.
GodLikeHTML.Load(GodLikeClient.OpenRead("http://www.alfa.lt"), Encoding.UTF8);
will work.
Or you could use the code in my SO answer which detects encoding from http headers or meta tags, en re-encodes properly. (It also supports gzip to minimize your download).
With the download class your code would look like:
HttpDownloader downloader = new HttpDownloader("http://www.alfa.lt",null,null);
GodLikeHTML.LoadHtml(downloader.GetPage());
Solution 2:
I had a similar encoding problems. I fixed it, in the most current version of HtmlAgilityPack, by adding the following to my WebClient initialization.
var htmlWeb = new HtmlWeb();
htmlWeb.OverrideEncoding = Encoding.UTF8;
var doc = htmlWeb.Load("www.alfa.lt");
Solution 3:
UTF8 didn't work for me, but after setting the encoding like this, most pages i was trying to scrape worked just wel:
web.OverrideEncoding = Encoding.GetEncoding("ISO-8859-1");
Perhaps it might help someone.
Solution 4:
HtmlAgilityPack.HtmlDocument doc = new HtmlDocument();
StreamReader reader = new StreamReader(WebRequest.Create(YourUrl).GetResponse().GetResponseStream(), Encoding.Default); //put your encoding
doc.Load(reader);
hope it helps :)
Solution 5:
try to change that to GodLikeHTML.Load(GodLikeClient.OpenRead("www.alfa.lt"), Encoding.GetEncoding(1257));
Post a Comment for "C# And HtmlAgilityPack Encoding Problem"