Screen Scraping Application Progress
Sunday, August 14th, 2005
So I’ve worked on my application a little more. First, I added an extra tab so that I could see a text-only version of the web page. This also gives me an idea of how search engines see page content.
One major change was behind the sceens. Rather then just returning a NameValueCollection of links, I decided to refector the code to return an array of HtmlAnchor controls instead. Since the .Net framework already has objects to represent links, I figure this should be better in case I need to add any other information later on. This closely comes in line with parsing forms.
Forms have a greater deal of information then just a name and a value. Rather then making my own objects from scratch, I turned to the HtmlForm object that comes in the System.Web.UI.HtmlControls namespace. Forms are very important with screen scraping. You must sometimes go through authentication before you are permitted access to the content.
At the moment, I got so far as to return an array of form elements. I even got the relative addresses translated correctly as well. The next step is to start parsing different input fields, text areas, and select boxes. Parsing alone is going to be a big problem. The toughest challenge that I see is to render a dynamic form for the end-user to modify data.
Tags: Programming, Programmers Browser, Web Browser, Browser, Http, Parsing, Links, Forms, Blogware, Progress








