Skip to content

Commit

Permalink
Manhattan plot updated for fst values
Browse files Browse the repository at this point in the history
  • Loading branch information
reneshbedre committed Mar 7, 2021
1 parent 9f2ef2e commit e7cf807
Show file tree
Hide file tree
Showing 5 changed files with 34 additions and 19 deletions.
19 changes: 10 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -439,20 +439,21 @@ TPM normalized Pandas dataframe as class attributes (tpm_norm)

## Variant analysis

### Manhatten plot
### Manhattan plot

`latest update v0.9.2`
`latest update v1.0.9`

`bioinfokit.visuz.marker.mhat(df, chr, pv, color, dim, r, ar, gwas_sign_line, gwasp, dotsize, markeridcol, markernames,
gfont, valpha, show, figtype, axxlabel, axylabel, axlabelfontsize, ylm, gstyle, figname)`
`bioinfokit.visuz.marker.mhat(df, chr, pv, log_scale, color, dim, r, ar, gwas_sign_line, gwasp, dotsize, markeridcol,
markernames, gfont, valpha, show, figtype, axxlabel, axylabel, axlabelfontsize, ylm, gstyle, figname)`

Parameters | Description
------------ | -------------
`df` |Pandas dataframe object with atleast SNP, chromosome, and P-values columns
`chr` | Name of a column having chromosome numbers [string][default:None]
`pv` | Name of a column having P-values. Must be numeric column [string][default:None]
`log_scale` | Change the values provided in `pv` column to minus log10 scale. If set to `False`, the original values in `pv` will be used. This is useful in case of Fst values. [Boolean (True or False)][default:True]
`color` | List the name of the colors to be plotted. It can accept two alternate colors or the number colors equal to chromosome number. If nothing (None) provided, it will randomly assign the color to each chromosome [list][default:None]
`gwas_sign_line` |Plot statistical significant threshold line defined by option `gwasp` [bool (True or False)][default: False]
`gwas_sign_line` |Plot statistical significant threshold line defined by option `gwasp` [Boolean (True or False)][default: False]
`gwasp` | Statistical significant threshold to identify significant SNPs [float][default: 5E-08]
`dotsize`| The size of the dots in the plot [float][default: 8]
`markeridcol` | Name of a column having SNPs. This is necessary for plotting SNP names on the plot [string][default: None]
Expand All @@ -463,19 +464,19 @@ Parameters | Description
`r` | Figure resolution in dpi [int][default: 300]
`ar` | Rotation of X-axis labels [float][default: 90]
`figtype` | Format of figure to save. Supported format are eps, pdf, pgf, png, ps, raw, rgba, svg, svgz [string][default:'png']
`show` | Show the figure on console instead of saving in current folder [True or False][default:False]
`show` | Show the figure on console instead of saving in current folder [Boolean (True or False)][default:False]
`axxlabel` | Label for X-axis. If you provide this option, default label will be replaced [string][default: None]
`axylabel` | Label for Y-axis. If you provide this option, default label will be replaced [string][default: None]
`axlabelfontsize` | Font size for axis labels [float][default: 9]
`ylm` | Range of ticks to plot on Y-axis [float [Tuple](https://www.reneshbedre.com/blog/python-tuples.html) (bottom, top, interval)][default: None]
`gstyle` | Style of the text for markernames. 1 for default text and 2 for box text [int][default: 1]
`figname` | name of figure [string][default:"manhatten"]
`figname` | name of figure [string][default:"manhattan"]

Returns:

Manhatten plot image in same directory (manhatten.png)
Manhattan plot image in same directory (Manhattan.png)

<a href="https://reneshbedre.github.io/blog/manhat.html" target="_blank">Working example</a>
<a href="https://www.reneshbedre.com/blog/manhattan-plot.html" target="_blank">Working example</a>

### Variant annotation

Expand Down
8 changes: 7 additions & 1 deletion VERSIONLOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
v1.0.8 has the following updates and changes (February 14, 2020)
v1.0.9 has the following updates and changes (March 07, 2021)
- `analys.visuz.marker.mhat` function updated to handle the Fst values for Manhattan plot
- Boolean `log-scale` parameter added for choice of minus log10 conversion of <i>p</i> values
- New marker dataset with Fst values added. This dataset is provided by the Vincent Appiah, which is downloaded from
the Pf3K Project (pilot data release 5). This dataset can be accessed using `analys.get_data('fst').data`.

v1.0.8 has the following updates and changes (February 14, 2021)
- Function for regression metrics added (`bioinfokit.analys.stat.reg_metric`)
- It calculates Root Mean Square Error (RMSE), Mean squared error (MSE), Mean absolute error (MAE),
and Mean absolute percent error (MAPE) from regression fit
Expand Down
2 changes: 1 addition & 1 deletion bioinfokit/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name = "bioinfokit"
__version__ = "1.0.8"
__version__ = "1.0.9"
__author__ = "Renesh Bedre"


8 changes: 6 additions & 2 deletions bioinfokit/analys.py
Original file line number Diff line number Diff line change
Expand Up @@ -2675,9 +2675,13 @@ def __init__(self, data=None):
elif data=='wdbc_test':
self.data = pd.read_csv("https://reneshbedre.github.io/assets/posts/logit/wdbc_test.csv")
elif data=='plant_richness':
self.data = pd.read_csv('https://reneshbedre.github.io/assets/posts/reg/plant_richness_data_mlr.txt', sep='\t')
self.data = pd.read_csv('https://reneshbedre.github.io/assets/posts/reg/plant_richness_data_mlr.txt',
sep='\t')
elif data=='plant_richness_lr':
self.data = pd.read_csv('https://reneshbedre.github.io/assets/posts/reg/plant_richness_data_lr.txt', sep='\t')
self.data = pd.read_csv('https://reneshbedre.github.io/assets/posts/reg/plant_richness_data_lr.txt',
sep='\t')
elif data=='fst':
self.data = pd.read_csv('https://reneshbedre.github.io/assets/posts/mhat/fst.csv')
else:
print("Error: Provide correct parameter for data\n")

Expand Down
16 changes: 10 additions & 6 deletions bioinfokit/visuz.py
Original file line number Diff line number Diff line change
Expand Up @@ -424,20 +424,23 @@ def geneplot_mhat(df, markeridcol, chr, pv, gwasp, markernames, gfont, gstyle, a
else:
raise Exception("provide 'markeridcol' parameter")

def mhat(df="dataframe", chr=None, pv=None, color=None, dim=(6,4), r=300, ar=90, gwas_sign_line=False,
def mhat(df="dataframe", chr=None, pv=None, log_scale=True, color=None, dim=(6,4), r=300, ar=90, gwas_sign_line=False,
gwasp=5E-08, dotsize=8, markeridcol=None, markernames=None, gfont=8, valpha=1, show=False, figtype='png',
axxlabel=None, axylabel=None, axlabelfontsize=9, axlabelfontname="Arial", axtickfontsize=9,
axtickfontname="Arial", ylm=None, gstyle=1, figname='manhatten'):
axtickfontname="Arial", ylm=None, gstyle=1, figname='manhattan'):

_x, _y = 'Chromosomes', r'$ -log_{10}(P)$'
rand_colors = ('#a7414a', '#282726', '#6a8a82', '#a37c27', '#563838', '#0584f2', '#f28a30', '#f05837',
'#6465a5', '#00743f', '#be9063', '#de8cf0', '#888c46', '#c0334d', '#270101', '#8d2f23',
'#ee6c81', '#65734b', '#14325c', '#704307', '#b5b3be', '#f67280', '#ffd082', '#ffd800',
'#ad62aa', '#21bf73', '#a0855b', '#5edfff', '#08ffc8', '#ca3e47', '#c9753d', '#6c5ce7',
'#a997df', '#513b56', '#590925', '#007fff', '#bf1363', '#f39237', '#0a3200', '#8c271e')

# minus log10 of P-value
df['tpval'] = -np.log10(df[pv])
if log_scale:
# minus log10 of P-value
df['tpval'] = -np.log10(df[pv])
else:
# for Fst values
df['tpval'] = df[pv]
# df = df.sort_values(chr)
# if the column contains numeric strings
df = df.loc[pd.to_numeric(df[chr], errors='coerce').sort_values().index]
Expand Down Expand Up @@ -481,7 +484,8 @@ def mhat(df="dataframe", chr=None, pv=None, color=None, dim=(6,4), r=300, ar=90,
ax.margins(x=0)
ax.margins(y=0)
ax.set_xticks(xticks)
ax.set_ylim([0, max(df['tpval'] + 1)])
if log_scale:
ax.set_ylim([0, max(df['tpval'] + 1)])
if ylm:
ylm = np.arange(ylm[0], ylm[1], ylm[2])
else:
Expand Down

0 comments on commit e7cf807

Please sign in to comment.