This project is read-only.
3
Vote

charset windows-1251

description

Try to create a new story which refers to a page with charset windows-1251
and then retrieve content.
You can see that title and description keep unknown symbols.

comments

Shedon wrote Dec 13, 2010 at 1:51 PM

I've faced the same issue last weeked.
I chaged the ReadResponse function in Source\Core\Infrastructure\Http\HttpForm.cs which is resposible for retrieving content to detect characters set in the HTTP header and read the body using right code page.

Here's my code:
internal static HttpFormResponse ReadResponse(WebRequest request)
    {
        const int maxTry = 3;

        int tryCount = 0;
        var httpFormResponse = new HttpFormResponse();

        // Sometimes the external site can throw exception so we might
        // have to retry few more times
        while (string.IsNullOrEmpty(httpFormResponse.Response) && (tryCount < maxTry))
        {
            try
            {
                using (WebResponse response = request.GetResponse())
                {
                    PopulateHeadersAndCookies(response, httpFormResponse);

                    Encoding original = Encoding.UTF8;
                    string codePage = ((HttpWebResponse)response).CharacterSet;
                    if (!string.IsNullOrEmpty(codePage))
                    {
                        original = Encoding.GetEncoding(codePage);
                    }

                    using (var sr = new StreamReader(response.GetResponseStream(), original))
                    {
                        string responseBody = sr.ReadToEnd();

                        if (!string.IsNullOrEmpty(codePage) && (original != Encoding.UTF8))
                        {
                            httpFormResponse.Response = 
                                Encoding.UTF8.GetString(
                                    Encoding.Convert(original, Encoding.UTF8, original.GetBytes(responseBody)));
                        }
                        else
                        {
                            httpFormResponse.Response = responseBody;
                        }
                    }
                }
            }
            catch (WebException)
            {
                tryCount += 1;
                Thread.Sleep(200);
            }
        }

        return httpFormResponse;
    }
I guess that the function ResponseCallback should be chnaged the same way, but I'm not sure when it's used and how to verify it.