Just over a year ago I created something I called the Random DannyChoo.com Link Generator. It’s like a Japanese gashapon that drops a shiny surprise into your hand at the push of a button (and by surprise I mean yummy blog content, and by hand I mean web browser ^^;.) You can directly interact with it via DannyChooFan.com, or just watch for unique and random goodness by following DannyChooFan on twitter, or the DannyChooFan Facebook page.
That 2 second interaction was (and is) the result of a database dip into a sea of nearly 6000 unique post links; each gathered by a script that looks through the archive of DannyChoo.com and keeps the important bits. The script was thrown together quickly and not very efficient, but it got the job done. I’ve learned a lot since it was first written and have been looking for the right time to recreate it (and make it public). Now is that time; enter DannyChoo.com Post Scraper v0.1.0.
DannyChoo.com Post Scraper (we’ll call it DCPS) currently gathers 7 attributes from a DannyChoo.com post (URL, Title, Description, Publish Date, Category, and Category URL) by parsing HTML and will gather an 8th (Tags) in the future. These attributes are then stored in a database that can be used for whatever you wish; like the Random DannyChoo.com Link Generator.
The DCPS library is actually part of a bigger initiative I’m working on, that will allow you to gather the same data via a RESTful API. The entire project will be open source, and I’ll do my best to document each piece along the way.
You can visit Github to read more about the DCPS library or grab the code.