September
22nd 2007
Requesting Web URL Status From .net Code

Posted under .net & Code & Software Design

There are any number of reasons to want to programmatically interact with resources on the web. Some of the more well known are browsers, like Internet Explorer, FireFox, and Safari; search engine spiders like GoogleBot; and RSS readers. Less famous are automated testing tools checking for 404 (Page not Found) and other errors. As people who maintain sites delve into IIS, cPanel, .htaccess, or custom authentication code, it becomes important to verify that pages are returning an HTTP status code of 200 (OK) when you expect them to, or the right type of redirect (temporary vs permanent). Testing software aims to identify unknown problems in a larger site that becomes difficult to manage by hand.

While some of these tools are worth the price, they can easily become overkill. Especially considering how easy it is to gather this information yourself. Using the built-in functionality of the Microsoft .net Framework, we can gather a wealth of information about any reachable URL. From this point, testing an entire site involves little more than writing a loop. The code provided below doesn’t check whether the host machine is connected to the internet; if not a suitable error will be returned to the caller. For an example of how to test a machine’s connectivity, see this ASP Emporium article.

The WebStatus object defined below is re-usable, in the sense that once an instance of the class has been created, you can either use the object to cache state info until it falls out of scope, or you can use the object again and again to query new URLs. This has nothing to do with object oriented programming, but reduces pressure on the managed heap, allowing your application to use less memory and force less garbage collections.

The Code

Below is the heavy lifting in a C# class called PageStatusTester; this shows how to interact directly with resources on the web from .net code of any flavor.


Hide | Show

protected void TestStatus() {
Clear(requestUrl);
try {
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(requestUrl);
req.AllowAutoRedirect = autoRedirect;
if (!string.IsNullOrEmpty(userAgent))
req.UserAgent = userAgent;
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
status = resp.StatusCode;
headers = resp.Headers;
for (int i = 0; i != resp.Headers.Count; i++) {
cachedHeaderList += resp.Headers.AllKeys[i] + “\t=\t” + resp.Headers[i] + “\n”;
cachedHeaderHtmlTable += “<tr><td class=’httpHeaderName’>” + resp.Headers.AllKeys[i] + “</td><td class=’httpHeaderValue’>” + resp.Headers[i] + “</td></tr>”;
}
cachedHeaderHtmlTable = “<table><tr><td class=’httpHeaderName’>Name</td><td>Value</td><td class=’httpHeaderValue’>” + cachedHeaderHtmlTable + “</td></tr></table>”;
server = resp.Server;
responseUrl = resp.ResponseUri.ToString();
IPAddress[] ips = System.Net.Dns.GetHostAddresses(resp.ResponseUri.DnsSafeHost);
if (ips.Length > 0)
ipAddress = ips[0];
contentType = resp.ContentType;
StreamReader tempReader = new StreamReader(resp.GetResponseStream());
content = tempReader.ReadToEnd();
tempReader.Dispose();
} catch (WebException ex) {
if (ex.Message.Contains(”404″))
status = System.Net.HttpStatusCode.NotFound;
error = ex.Message;
}
isReady = true;
}

public string Url {
get { return requestUrl; }
set {
requestUrl = value;
uri = new Uri(value);
isReady = false;
if (!deferTesting)
TestStatus();
}
}

Using the Code

The PageStatusTester object is pretty straightforward. Create a new instance, set the Url property, and the object returns with the response headers, http status code, server information, and the content itself - the page sent down. If the page is well-formed xhtml, an XmlDocument is provided. Better still, you can turn lazy-loading on or off.

Following is an example of how calling code uses the PageStatusTester class:

PageStatusTester page = new PageStatusTester();
page.AutoRedirect = false;
page.Url = "http://Google.com";
System.Diagnostics.Debug.WriteLine("http://Google.com returns a " + page.HttpStatusCode + " - " + page.HttpStatusDescription);
System.Diagnostics.Debug.WriteLine("New address: " + page.Headers["Location"]);

Output:

http://Google.com returns a 301 - MovedPermanently
New address: http://www.google.com/

Implementation

Web Status Tester is a site testing tool, a Windows application built on the PageStatusTester class. Given a list of urls, which can be discovered from a valid xhtml page with navigation, the tool will test and report the status code of all the pages, images, javascript and css files in your site. Future versions will let you change the User Agent string to test browser sniffing to send down IE specific sytlesheets. Finally, test results can be saved to xml.

One Response to “Requesting Web URL Status From .net Code”

  1. John on 27 Sep 2007 at 1:33 am #

    That’s great!! But what if I have a web page that takes query strings and returns information but it has a header section and then a table. It’s not XML because there are two parent nodes.

    How do I change the code to fix that? I can put and around it by just using a string builder, but in all that code you spelled out, where’s that go??

Trackback URI | Comments RSS

Leave a Reply