July 22, 2008
While in production applications we all favour use of an API, there are a lot of situations, such as in ‘Mashups’ (I love how that term has been reappropriated from Jungle music) where you need to do some page scraping.
It’s occurred to me how these very easy techniques seem inaccessible to many people, so I thought I’d post a few bits and bobs about some basic scraping methods.
Here’s a bit of code I wrote to use PHP’s DOMDocument class to treat a HTML page as XML and fetch, in this case, the incredibly useful current world population… fantastic!
Sample output: 6,738,610,278