I was reading httpwebresponse data and get the basic data from the html data. So i need to remove all html tag, script tags, style tags from page to get original text data. So use a simple regular expression to remove tags. this regular expression was simple like that.
Regex.Replace(mainData,@”<scripts[^>]*>.*?</script>|<s*(?!/?(?:br?|i|p|u)b[^>]*>)[^>]*>”,”", RegexOptions.IgnoreCase | RegexOptions.Singleline) ;
I was working fine but i found that for some pages these are making problem specially problems are coming to remove javascript and styles from html data and data was not coming in a good format.
Finally I have got a very nice tool to do that . name of the tool is Html Agility .
this is a very nice features in html agility , it has libraires to convert html to xml, html to rss and html to text. I used html to text yo convert my httpresponse data to text.
in html agility you will find a class HTMLDocument.cs If you want to convert your webresponses then just add these codes in Load method, bacuase in load method this is reading html file from a local drive so to read webresponse use that code.
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader sr = new StreamReader(response.GetResponseStream(), OptionDefaultStreamEncoding);

