Ok, I finally got fed up with all of the spam in my historical dasBlog postings. It’s really embarrassing to send a link to a a colleague, only to have them snicker at all of the spam comments and trackbacks.
For those of you who don’t know what a trackback is, it’s basically an acknowledgement that enables authors to keep track of who is linking to, or referring to their articles. When used properly, trackbacks form a communication link between the two blogs, so that new comments on one blog can basically ping the other, allowing readers to easily follow discussions on both. The problem is that spammers have abused this mechanism and bloggers end up with trackbacks and pingbacks to various gambling, herbal medication, and adult sites.
The big effort was then how to cleanup the <Comment> and <Trackback> elements that were spam, so, like others before me, I built a tool to assist with this.
- Download ScrubDasBlog.zip or ScrubDasBlogSource.zip to your hard drive
- Edit the blacklist.txt to include your own blacklisted URLs *
- Backup your existing feedback files: content*.dayentry.xml
- Run the ScrubDasBlog utility, specifying the path to your content folder and the path to your blacklist.txt file, for example:
scrubdasblog c:inetpubwwwrootmydasblogcontent c:scrubdasblogblacklist.txt
* If you have predominately more SPAM comments and trackbacks in your dasBlog history, then you can generate a starter blacklist by going into your content sub-folder and typing the following:
type *.xml | find “AuthorHomepage” > blacklist.txt
After you generate the blacklist.txt file, you should remove any good sites and remove any duplicates, before running the ScrubDasBlog utility.
I would recommend downloading the Source code version and reading through my code. Please comment on any improvements you might make.