Skip to content

PatrickChodowski/stats

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Stats

Small personal library for sending quick queries to GBQ and getting back aggregates for pandas plots

how to install:

pip:

pip3 install git+https://github.com/PatrickChodowski/stats.git

poetry:

poetry add "git+https://github.com/PatrickChodowski/stats.git#main"

Usage:

Setup

from stats import GBQData

g = GBQData(gbq_path='project.dataset.table',
            sa_path='credentials/sa.json')

Aggregated bar chart

g.set(dimensions=['team_abbreviation', 'player_name'],
      metrics=['pts'],
      aggregations=['sum'],
      sort=('sum_pts', 'desc'),
      filters=[('team_abbreviation', 'eq', 'GUA')],
      limit=30)
g.get()
g.plot("barh")

Histogram

g.set(dimensions=None,
      metrics=['pts'],
      aggregations=['none'],
      filters=[('team_abbreviation', 'eq', 'GUA')])
g.get()
g.plot('hist')

Modules:

  • gbq_data.py - Main interface to work with GBQ connection and sending queries
  • query_builder.py - Which takes the input, validates it and builds query
  • plots.py - Which takes the data from .set() method of validated query and produces optional plot

The goal of this setup is to split query building and validation from plotting, and have already correct data before actually visualizing the report, instead of learning it already when plot is created.

Aggregation options:

  • avg
  • sum
  • min
  • max
  • count
  • count_distinct
  • count_nulls
  • any
  • string_agg
  • array_agg
  • string_agg_distinct
  • array_agg_distinct
  • median
  • q1
  • q3
  • percentiles
  • stdev
  • var
  • none (no aggregation)

Plot options:

  • bar chart
  • horizontal bar chart
  • box plot
  • histogram
  • scatter plot

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages