How many SEO people woke up one morning to discover that a multitude of red down-pointing arrows has taken over the keywords ranking charts? How many posts in SEO blogs have started with the same exact words as this one?
Whether these are your sites or a client's, its a nasty feeling. And you forget about all your plans for the day, forget to brush your teeth and find yourself sitting for hours trying to analyze what the hell happened.
If you were responsible for the site's link building than its easier to know where you did good and where you cut a few corners. If you received a site with a history of links that you were not in charge of it can be a bit of a head ache trying to figure out what your predecessors did.
Before starting, a quick note. There are a few good tools out there to monitor back links details, some of them are very reputable and some are less known.
Still, they are only partially accurate. Google regularly updates the list of links it “counts” as pointing to a certain site. Links are dropped if the source site is kicked out of the index or simply expires.
Other tools are not synced with Google on a daily basis. It takes time for them to update and even then you may find on the list of links they supply many which are non existing anymore.
Following the Penguin 2.0 on released on May 22nd I had to review the link profile of a few clients with a clear understanding that disavow lists need to be submitted.
Yes, its true that approaching the low quality sites and asking to remove the links is a good start but my experience (just mine) showed that days of work resulted in very few responses. Therefore I turned to use the Disavow tool offered by Google.
In this cleaning procedure I used 4 tools to prepare the list: Google Webmasters Tools, Majestic SEO site explorer, Detox tool by http://linkresearchtools.com and a simple excel sheet.
Detox Tool example - click image to enlarge
Step 1 - Find Linking Domains: I downloaded the list of linking domains from GWT (over 1000). Copied them to my excel work sheet (together with the number of links from each domain), marked them in one color (red) and added next to them a third column where I wrote “GWT” in each line.
Step 2 - Download Majestic SEO list: Downloaded from Majestic site explorer the list of linking domains with a trust flow of less than 30 (this is an arbitrary decision. It is more or less acceptable that sites with a trust flow parameter of less than 30 are not that good), together with the citation flow and number of links from the domain.
I added those to my worksheet under the GWT domains and gave them a different color (blue). Also here I added another column with “Maj” in every row (in the same column as I used to write “GWT”).
By now I had over 2000 domains on the work sheet.
Step 3 - Download Detox Tool List: I downloaded the list of toxic domains (the ones identified by the tool as potentially dangerous) and added them under the other domains in my worksheet. These got a third color (black) and an additional column saying “detox”. This tool downloads in CSV format the full links and not just the domain so there's some editing work to do, eliminating duplicate lines and removing the http://www prefixes. By now I had almost 4000 domains one the list.
Step 4 - The Secret Procedure: I reorganized the list using the A-Z filter. Now came the slower part of identifying the domains which show on all three lists or at least two of them. In another column I added before the domains' one I marked “d” (for disavow) before each domain that fell under that criterion.
After finishing (about 20 minutes) I reorganized using the A-Z filter but this time I used the new column where I marked “d” as the filter reference. I then filtered again but this time only the domains which were not marked yet for disavow. Now, the domains supplied by Majestic and the Detox tool which are NOT on the GWT list could be easily deleted in bulk, leaving the list of domains marked for disavow, and additional GWT domains which didn't show up on the other two tools.
Checkpoint 1: By now I marked about 300 domains for disavowing and was left with 700 to check. I now downloaded from Detox tool the list of GOOD domains, and from Majestic the domains with a trust flow above 30. I followed the same procedure and compared these two lists to the remaining 700 GWT domains.
Checkpoint 2: After 10 minutes I marked about 500 domains as “g” (good). Using the A-Z filter I separated the good domains from the rest, cut and pasted them into another sheet (for later reference). Another use of the A-Z filter and I got rid of both Majestic's and Detox tool's lists.
Checkpoint 3: By now only 200 domains or so were left from the original GWT list. This is where a little bit of SEO experience came in. I swipped through them markwing with “d” (diavow) the obvious SEO (immediateseoresults.com) / Link Farms (bestfreetechofferswestcoastusa.com / Low quality directory (333superlikdir.com) / junk article sites (freearticlehubspot951.com) ones as well simply dodgy ones that.
Checkpoint 4: 100 domains left after all the last swipe (and 400 in my disavow list). I turned to Open Site Explorer by MOZ (the free version). Five domains can be loaded at once for SEO factors analysis. All the ones with Domain Authority score of less than 30 were respectfully marked for disavow.
Bottom line: One hour, more than 1000 domains to review, 430 bad domains, about 600 verified good linking domains. I evaluate the process as being 90% accurate if you don't use any additional filtering or considerations. To make sure I'm not disavowing top domains or keeping bad ones I quickly passed over the lists generated by the Detox tool while I was editing them, using my experience and a good eye.
Now all is left is to prepare the list for uploading to GWT (in this format – domain:baddomain.com) and wait a few weeks for Google to update. Past experience (few months after the first Penguin update) showed considerate improvement and also a site which I cleaned its links last time was not affected by the current Penguin update.