I think every SEO can agree that one of the most frustrating parts about Google Search Console is that it doesn’t provide enough data.
Whether it’s limiting Search Analytics data to the past 90 days or the fact that Google Search Console only provides high level numbers for certain reports (especially when additional granularity would be more helpful), SEOs certainly have a love/hate relationship with the tool.
One report that could use a refresh is the “Links to Your Site” section.
In this section of Google Search Console, you can see two very important pieces of data that are incredibly useful together but sadly very ineffective when split apart – they are, domains linking to your site and the most linked pages on your site.
For some reason, Google decided not to combine this data so we can’t see at a granular level where external pages are linking to within our site. We can of course see at a high level, which domains link the most and the content on our site that gets linked the most, but again, this data is not connected.
Now don’t get me wrong, these reports can be very helpful in some cases, but let’s say you’re conducting an in-depth backlink audit, high level data alone won’t help you here. You need that data to be connected at a low level for it to be of any use.
This means many SEOs must completely forego the linking data provided directly from Google and instead rely on helpful, yet very expensive and flawed tools such as Majestic and Ahrefs.
Google should just give us the data we need in a report that’s easy for us to analyze but they don’t. That’s why I created a helpful little script that will pull all the backlink data we need and arrange it in a easy to use format.
So how does this script work? Well it’s pretty easy!
It basically takes a list of URLs and parses through the content on each page and looks for all the links pointing to a specific domain. If it finds a link for a specified domain, it then pulls the full link URL, the anchor text and any rel attributes such as nofollow.
Next let’s talk about how to get the script working.
But first, there are several things you need…
I’m assuming you already have Python2.7 installed. If not, there are a lot of tutorials on how to do this.
If you need to install BeautifulSoup, you can just run the following command:
pip install beautifulsoup4
Next you need to grab backlink data for your website from Google Search Console.
Within the "Links to Your Site" section, you’ll need to select the "More >>" button under "Who links the most".
This will take you to a page that will show you the domains that are linking to your site as well as the total number of links from each domain. What we need is the exact pages on these domains where there are links pointing to our site. To get all these URLs, we need to download a CSV by clicking on the "Download more sample links" button.
This gives us all the pages that have links pointing back to our site.
Next, you’ll need to download the script provided below.
After you've downloaded the script, you'll need to make a few changes to it:
Once you’ve made those changes, you’re ready to run the script.
It’s not perfect but it does work pretty good. There may be cases where the script will throw errors but I’ve noticed the sites were typically low quality and I usually just add some exceptions for the errors anyway.
And that’s it! I hope other SEOs can get as much use out of this script as I have.
If you have any questions or suggestions to improve the script, please leave them in the comments below.