Skip to content

Commit

Permalink
Merge branch 'release/v0.4.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
andreaskoch committed May 23, 2016
2 parents 97dbdb6 + a95a463 commit c803584
Show file tree
Hide file tree
Showing 43 changed files with 1,328 additions and 888 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
bin
coverage.out
coverage.html
.DS_Store
23 changes: 23 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,29 @@ This project adheres to [Semantic Versioning](http://semver.org/).
### Added
-

## v0.4.0

### Added
- Add logging for the update command
- Add package documentation
- Add documentation for public functions
- Introduce a new list-spam-domains action

### Changed
- Combine static spam domains with dynamic ones
- Combine multiple referrer spam domain sources
- Display status as a percentage instead of a text-based status
- Allow to specify the number of days for the find-spam action
- Change the filter names prefix to "Referrer Spam Block Segment"

### Removed
- Remove the global status ... it didn't make much sense

### Fixed
- Fixed the update command. Updates did not work before.
- Fix template newline handling
- Fix the filesystem token store. Create the directory if it does not exist.

## v0.3.0

### Added
Expand Down
90 changes: 62 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,59 @@
# Google Analytics Spam Control

Command-line utility for blocking referrer spam from your Google Analytics accounts
Command-line utility for blocking referrer spam from your Google Analytics accounts automatically using the power of community-maintained lists and machine learning.

Google Analytics [referrer spam](https://en.wikipedia.org/wiki/Referer_spam) is pain.
**ga-spam-control** is a small command-line utility that helps you to keep your Google Analytics spam filters up-to-date.
There are hundreds of known referrer spam domains and every other day a new one pops up. And the only way to keep the spammers from skewing your web analytics reports is to block these spam domain names one by one.

## Features
**ga-spam-control** is a small command-line utility that keeps your Google Analytics spam filters up-to-date, automatically.

**ga-spam-control** fetches the latest list of referrer spam domains from [github.com/ddofborg/analytics-ghost-spam-list](https://github.com/ddofborg/analytics-ghost-spam-list) and creates or updates filters in your Google Analytics accounts that prevent any of these spam domains from reaching your analytics reports.
## How does ga-spam-control work?

The command line utility provides the following actions:
**ga-spam-control** creates filters for your Google Analytics accounts that block known referrer spam domains from your analytics reports and keeps these filter up-to-date.

1. Action: **status**
To always protect your analytics reports from annoying false entries ga-spam-control **combines multiple community-maintained lists** of known spam domains:

- [ddofborgs' Analytics Ghost Spam List](https://github.com/ddofborg/analytics-ghost-spam-list)
- [Stevie Rays' apache-nginx-referral-spam-blacklist](https://github.com/Stevie-Ray/apache-nginx-referral-spam-blacklist)
- [Piwik Referrer spam blacklist](https://github.com/piwik/referrer-spam-blacklist)

with the **power of machine learning**. ga-spam-control analyzes your analytics data and identifies spam which went past the existing filters.

**Screenshot of the Azure Machine Learning Web Service that ga-spam-control uses**

![Screenshot of the Azure Machine Learning Model Training Experiment](files/azure-ml-spam-detection/screenshots/Azure-Machine-Learning-Spam-Detection-Screenshot-00003.png)

This gives you the ability to completely automate your spam protection process. Just let ga-spam-control check your Google Analytics accounts daily for new spam. And when it detects new spam; update your filters.

This gives you an additional level of protection; just in case the community spam lists are not updated fast enough.

## Available Commands

The command line utility provides the following actions.

**Spam Control Filter Actions**

In order to protect your Google Analytics account from spam **ga-spam-control** creates filters which blocks known referrer spam domains from your analytics reports. These are the commands that help you to review and update your spam filters:

1. Action: **show-status**
Display the spam-control status of all your accounts or for a specific account
2. Action: **update**
Create or update spam-control filters for an accounts
3. Action: **remove**
Remove spam-control filters from an account
4. Action: **detect-spam**
Check a given account for referrer spam
2. Action: **update-filters**
Create or update the spam-control filters for a specific account
3. Action: **remove-fiters**
Remove all spam-control filters from an account

**Referrer Spam Domains Actions**

The basis for the spam filters is an up-to-date list of known referrer spam domains. And with these commands you can review and update the spam-domain lists:

1. Action: **list-spam-domains**
Print a list of all currently known referrer spam domains
2. Action: **update-spam-domains**
Update the list of referrer spam domain names.
4. Action: **find-spam-domains**
Use a machine learning service to analyze the last `n` days of analytics data for new referrer spam.

The current list of referrer spam domains is stored at this path: `~/.ga-spam-control/domains`

## Usage

Expand All @@ -34,73 +69,72 @@ ga-spam-control --help

### Display spam-control status

Display the current spam-control **status** for all accounts that you have access to:
Display the current spam-control **show-status** for all accounts that you have access to:

```bash
ga-spam-control status
ga-spam-control show-status
```

Display the spam-control status in a parseable format:

```bash
ga-spam-control status -q
ga-spam-control status --quiet
ga-spam-control show-status --quiet
```

Print account IDs of accounts that have the spam-control status of "not-installed"

```bash
ga-spam-control status -q | grep "not-installed" | awk '{print $1}'
ga-spam-control show-status -q | grep "not-installed" | awk '{print $1}'
```

Display the current spam-control **status** for a specific Google Analytics account:

```bash
ga-spam-control status <accountID>
ga-spam-control show-status <accountID>
```

### Install or update spam-control filters

**update** the spam-control filters for a specific Google Analytics account:

```bash
ga-spam-control update <accountID>
ga-spam-control update-filters <accountID>
```

### Uninstall spam-control filters

**remove** the spam-control filters for a specific Google Analytics account:

```bash
ga-spam-control remove <accountID>
ga-spam-control remove-filters <accountID>
```

### Detect spam
### Find new referrer spam with machine learning

**detect-spam** check the given Google Analytics account for referrer spam:
The **find-spam-domains** action analyzes the last `n ` days of analytics data from the given account for new referrer spam:

```bash
ga-spam-control detect-spam <accountID>
ga-spam-control find-spam-domains <accountID>
```

**Authentication**

The first time you perform an action, you will be displayed an oAuth authorization dialog.
If you permit the requested rights the authentication token will be stored in your home directory (`~/.ga-spam-control`).
If you permit the requested rights the authentication token will be stored in your home directory (`~/.ga-spam-control/credentials`).

To sign out you can either delete the file or de-authorize the "Google Analytics Spam Control" app in your Google App Permissions at https://security.google.com/settings/security/permissions.

## Installation

The command-line package is [github.com/andreaskoch/ga-spam-control/cli](cli/main.go). You can clone the repository or install it with `go get github.com/andreaskoch/ga-spam-control` and then run the make script:
The command-line package is [github.com/andreaskoch/ga-spam-control/cli](cli/main.go). You can clone the repository or install it with `go get github.com/andreaskoch/ga-spam-control` and then run the [make.go](make.go) script:

```bash
go run make.go -test
go run make.go -install
go run make.go -crosscompile
```

or
Or with **make**:

```
make test
Expand All @@ -123,10 +157,10 @@ See [LICENSE](LICENSE) for the full license text.

There are multiple curated lists of referrer spam domains out there that you can use to create filters for your analytics accounts.

- [Analytics ghost spam list](https://github.com/ddofborg/analytics-ghost-spam-list)
- [Analytics Ghost Spam List](https://github.com/ddofborg/analytics-ghost-spam-list)
- [Stevie Ray: apache-nginx-referral-spam-blacklist](https://github.com/Stevie-Ray/apache-nginx-referral-spam-blacklist)
- [Piwik Referrer spam blacklist](https://github.com/piwik/referrer-spam-blacklist)
- [Referrer Spam Blocker Blacklist](https://referrerspamblocker.com/blacklist)
- [Stevie Ray: apache-nginx-referral-spam-blacklist](https://github.com/Stevie-Ray/apache-nginx-referral-spam-blacklist)
- [My own list of referral spam domains](spam-domains/referrer-spam-domains.txt)

### Other Spam Blocker Tools
Expand Down
5 changes: 4 additions & 1 deletion api/account.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ import (
// toModelAccounts converts []apiservice.Account to []Account.
func toModelAccounts(sources []apiservice.Account) []Account {

accounts := make([]Account, 0)
var accounts []Account
for _, source := range sources {
accounts = append(accounts, toModelAccount(source))
}
Expand All @@ -30,6 +30,7 @@ func toModelAccount(source apiservice.Account) Account {
}
}

// An Account contains all parameters of an analytics account.
type Account struct {
ID string
Kind string
Expand All @@ -53,8 +54,10 @@ func accountsByID(account1, account2 Account) bool {
return fmt.Sprintf("%012d", int(account1ID)) < fmt.Sprintf("%012d", int(account2ID))
}

// SortAccountsBy sorts two Account models.
type SortAccountsBy func(account1, account2 Account) bool

// Sort a slice of Account models.
func (by SortAccountsBy) Sort(accounts []Account) {
sorter := &accountSorter{
accounts: accounts,
Expand Down
2 changes: 2 additions & 0 deletions api/analytics.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,10 @@ import (
"github.com/andreaskoch/ga-spam-control/api/apiservice"
)

// AnalyticsData is a set of analytics data entries.
type AnalyticsData []AnalyticsDataRow

// AnalyticsDataRow contains a single analytics data record.
type AnalyticsDataRow struct {
// Dimensions
UserType string
Expand Down
7 changes: 4 additions & 3 deletions api/api.go
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
// Package api implements an analytics API interface for interacting
// with the Google Analytics Account and Filter APIs.
package api

import (
Expand All @@ -14,7 +16,7 @@ type AnalyticsAPI interface {
GetAccounts() ([]Account, error)

// GetAnalyticsData analytics data for the given account ID.
GetAnalyticsData(accountID string) (AnalyticsData, error)
GetAnalyticsData(accountID string, numberOfDays int) (AnalyticsData, error)

// CreateFilter creates a new Filter for the given account ID.
CreateFilter(accountID string, filter Filter) (Filter, error)
Expand Down Expand Up @@ -64,7 +66,7 @@ func (api *API) GetAccounts() ([]Account, error) {
}

// GetAnalyticsData analytics data for the given account ID.
func (api *API) GetAnalyticsData(accountID string) (AnalyticsData, error) {
func (api *API) GetAnalyticsData(accountID string, numberOfDays int) (AnalyticsData, error) {

// get the profiles to which the filter will be assigned
profiles, profilesError := api.GetProfiles(accountID)
Expand All @@ -77,7 +79,6 @@ func (api *API) GetAnalyticsData(accountID string) (AnalyticsData, error) {
}

profile := profiles[0]
numberOfDays := 30
serviceData, analyticsDataErr := api.service.GetAnalyticsData(profile.ID, numberOfDays)
if analyticsDataErr != nil {
return AnalyticsData{}, analyticsDataErr
Expand Down
2 changes: 2 additions & 0 deletions api/apicredentials/token.go
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
// Package apicredentials contains the TokenStorer interface that
// is used as the oAuth token provider for the Google Analytics API.
package apicredentials

import "golang.org/x/oauth2"
Expand Down
4 changes: 4 additions & 0 deletions api/apiservice/account.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,19 @@ func (accountResultsSerializer) Deserialize(reader io.Reader) (*AccountResults,
return accountResults, err
}

// AccountResults is response model for Google Analytics Account API requests.
type AccountResults struct {
Results
Items []Account `json:"items"`
}

// AccountPermissions contains the effictive permissions for an account.
type AccountPermissions struct {
Effective []string `json:"effective"`
}

// The Account model contains account details
// such as the account ID, name and type.
type Account struct {
Item
Name string `json:"name"`
Expand Down
5 changes: 5 additions & 0 deletions api/apiservice/analytics.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,26 +43,31 @@ func (analyticsDataSerializer) Deserialize(reader io.Reader) (*AnalyticsData, er
return analyticsData, err
}

// AnalyticsDataResults is response model for Google Analytics data API requests.
type AnalyticsDataResults struct {
Results
Data AnalyticsData `json:"dataTable"`
}

// AnalyticsData represents analytics reports data in columns and rows.
type AnalyticsData struct {
Cols []TableColumn `json:"cols"`
Rows []TableRow `json:"rows"`
}

// TableColumn defines analytics data table columns.
type TableColumn struct {
ID string `json:"id"`
Label string `json:"label"`
Type string `json:"type"`
}

// TableRow defines analytics data table rows.
type TableRow struct {
Cell []TableCell `json:"c"`
}

// TableCell defines analytics data table cell/value.
type TableCell struct {
Value string `json:"v"`
}
39 changes: 2 additions & 37 deletions api/apiservice/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ import (
)

// getAnalyticsClientConfig returns the oAuth client configuration for the Google Analytics API.
func getAnalyticsClientConfig(clientId, clientSecret, redirectURL string) *oauth2.Config {
func getAnalyticsClientConfig(clientID, clientSecret, redirectURL string) *oauth2.Config {
return &oauth2.Config{
ClientID: clientId,
ClientID: clientID,
ClientSecret: clientSecret,
RedirectURL: redirectURL,
Scopes: []string{
Expand Down Expand Up @@ -105,38 +105,3 @@ func getAnalyticsClient(store apicredentials.TokenStorer, oAuthClientConfig *oau
client := oAuthClientConfig.Client(oauth2.NoContext, exchangeToken)
return client, nil
}

// getAccounts returns all accessible accounts.
func getAccounts(apiClient *http.Client) error {

uri := fmt.Sprintf("https://%s/analytics/v3/management/accounts", GoogleAnalyticsHostname)
response, apiError := apiClient.Get(uri)
if apiError != nil {
return apiError
}

serializer := &accountResultsSerializer{}
results, deserializeError := serializer.Deserialize(response.Body)
if deserializeError != nil {
return deserializeError
}

for _, account := range results.Items {
log.Println("Account ID: ", account.ID)
}

return nil
}

// getFilters returns all filters for the account with the given account ID.
func getFilters(apiClient *http.Client, accountId string) error {

uri := fmt.Sprintf("https://%s/analytics/v3/management/accounts/%s/filters", GoogleAnalyticsHostname, accountId)
response, err := apiClient.Get(uri)
if err != nil {
return err
}

fmt.Println(response)
return nil
}
1 change: 1 addition & 0 deletions api/apiservice/error.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ func decodeResponse(response io.Reader) (ErrorResponse, error) {
return errorResponse, nil
}

// ErrorResponse contains the errors details of a Google Analytics API response.
type ErrorResponse struct {
Error struct {
Errors []struct {
Expand Down
Loading

0 comments on commit c803584

Please sign in to comment.