Examining Internet Explorer Tracking Protection Lists (with Powershell)

I know that Firefox and Chrome are still the only browsers that most people will ever even consider using, but ever since I started using Windows 8, I've not yet really felt a compelling reason to stop using the built-in IE10. It's fast, it passes Acid tests, and has a nice "Developer Mode" built-in where you can change IE and rendering versions on the fly (hit F12 on your keyboard.) That, and Compatibility Mode, which I think is really under-rated in a corporate environment, where employees are forced to use antiquated corporate websites that were built for IE5.

About the only thing I miss from Chrome is my sweet, sweet AdBlock. Does IE have anything that can compare? Well, it has Tracking Protection Lists, which is a good start:

IE Tracking Protection Lists

IE Tracking Protection Lists

The Personalized List kinda' sucks because you only have two choices - either block everything that IE detects as a Javascript-type thing that pulls info from other domains and is frequently seen in similar pages, or, you have to wait for IE to detect the script 10 times or so, then you have to go back in and manually choose to block it. 

Personalized Tracking List

Of course Google is going to be the number one offender here, since that's how they make money is by shoving ads in your face around every corner and making you want to buy stuff.  Honestly, I sympathize with online advertisers to a certain point, because I believe the internet would not have as much rich, free content as it does if internet advertisers were unable to make money by providing free services.  I mean, Google doesn't keep Youtube up just because they love paying those network bandwidth bills so much. But, the ads can quickly get simply too obnoxius, and we users need a way to filter all that junk that's flashed before our eyes.  Especially the Google ones that break Internet Explorer's back button.  It's Google's code that decides to open 4 instances of some doubleclick.net resource in a way that it causes you to have to hit the Back button in your browser 5 times to get back to where you were. From legalinsurrection.com

The flip-side of the coin is that if you block too much, certain websites will stop working properly. So you need to find a happy medium. Queue Tracking Lists.  Just like AdBlock, Tracking Lists are designed to block just the web content that has been decided is more harmful or annoying than good.  That way we can keep using our Gmail, Youtube, StackExchange, etc., without interruption, while still cutting out a good chunk of the ads.

You can download new Tracking Protection Lists online, right from within the tracking lists dialog box. Just click the "Get A Tracking Protection List Online..." link. It will take you to a website.  Notice in the screenshot above, that I have already added the "EasyList" TPL, from the same people that do AdBlock.  Also notice that the Tracking Protection List is simply an HTTP URI to a *.tpl text file.  And you know what that means... that means we can play with it from Powershell!

Say you wanted to examine a TPL and see if it contained a certain domain name. Well first, let's download the TPL and store it to a variable:

PS C:\> $list = Invoke-WebRequest http://easylist-msie.adblockplus.org/easylist.tpl

Now the content of the tracking list will be in $list.Content. These are the web resources that the list will block.  You can go here to see the syntax of TPLs. Warning: There will be distasteful words in this list... as the list is designed to block distasteful content.

Alright, so what if we want to know whether this TPL will block content from the domain streamcloud.eu. First, let's break $list.Content up into lines by splitting it on newlines:

PS C:\> $list.Content.Count
1
PS C:\> $Content = $list.Content.Split("`r`n")
PS C:\> $Content.Count
10203

After splitting the content element of the web request on newlines, we can see that the TPL contains 10,203 lines. I first thought to split on [String]::NewLine, but that did not yield correct results. (Dat character encoding!) Now, keeping in mind that lines that start with a # are comments, let's see if we can find entries that contain streamcloud.eu:

PS C:\> foreach($_ in $Content) { If(!$_.StartsWith('#') -and $_.Contains("streamcloud.eu")) { $_ } }
-d streamcloud.eu /deliver.php
+d streamcloud.eu

So from this output, it appears that we are allowing streamcloud.eu, but we are specifically blocking any document named deliver.php coming from streamcloud.eu.  This could have also been written as:

PS C:\> foreach($_ in $Content) { If(!$_.StartsWith('#') -and $_ -match "streamcloud.eu") { $_ } }

But a lot of times I just naturally prefer the C# parlance. The good thing about Powershell is that you're free to mix and match.

You can certainly elaborate on the concept I've started on above, and I hope you will.  Until next time!

Comments are closed