Skip to content

Commit

Permalink
update references, docs, defaults
Browse files Browse the repository at this point in the history
  • Loading branch information
david-cortes committed Feb 25, 2022
1 parent a83363c commit 0ba7f28
Show file tree
Hide file tree
Showing 26 changed files with 457 additions and 635 deletions.
13 changes: 7 additions & 6 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: poismf
Type: Package
Title: Factorization of Sparse Counts Matrices Through Poisson Likelihood
Version: 0.3.1-6
Version: 0.4.0
Authors@R: c(
person(given="David", family="Cortes", role=c("aut", "cre", "cph"),
email="[email protected]"),
Expand All @@ -13,12 +13,13 @@ Authors@R: c(
Maintainer: David Cortes <[email protected]>
URL: https://github.com/david-cortes/poismf
BugReports: https://github.com/david-cortes/poismf/issues
Description: Creates a low-rank factorization of a sparse counts matrix by maximizing Poisson likelihood with l1/l2 regularization
with all non-negative latent factors (e.g. for recommender systems or topic modeling) (Cortes, (2018) <arXiv:1811.01908>).
Similar to hierarchical Poisson factorization, but follows an optimization-based approach with regularization instead of a
hierarchical structure, and is fit through gradient-based methods instead of variational inference.
Description: Creates a non-negative low-rank approximate factorization of a sparse counts matrix by maximizing Poisson
likelihood with L1/L2 regularization (e.g. for implicit-feedback recommender systems or bag-of-words-based topic modeling)
(Cortes, (2018) <arXiv:1811.01908>), which usually leads to very sparse user and item factors (over 90% zero-valued).
Similar to hierarchical Poisson factorization (HPF), but follows an optimization-based approach with regularization
instead of a hierarchical prior, and is fit through gradient-based methods instead of variational inference.
License: BSD_2_clause + file LICENSE
Imports: Matrix (>= 1.3), methods
RoxygenNote: 7.1.1
RoxygenNote: 7.1.2
NeedsCompilation: yes
Encoding: UTF-8
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
BSD 2-Clause License

Copyright (c) 2018-2021, David Cortes
Copyright (c) 2018-2022, David Cortes
All rights reserved.

Redistribution and use in source and binary forms, with or without
Expand Down
1 change: 0 additions & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ export(topN)
export(topN.new)
export(get.factor.matrices)
export(get.model.mappings)
export(poisson.llk)
export(predict.poismf)
export(print.poismf)
export(summary.poismf)
225 changes: 82 additions & 143 deletions R/poismf.R

Large diffs are not rendered by default.

37 changes: 20 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,31 @@
# Poisson Factorization

Fast and memory-efficient non-negative matrix factorization for sparse counts data, based on Poisson likelihood with regularization. The algorithm is described in ["Fast Non-Bayesian Poisson Factorization for Implicit-Feedback Recommendations"](http://arxiv.org/abs/1811.01908) [*To be updated after last additions*].
Fast and memory-efficient non-negative matrix factorization for sparse counts data which produces sparse factor matrices, based on Poisson likelihood with regularization. The method is described in ["Fast Non-Bayesian Poisson Factorization for Implicit-Feedback Recommendations"](http://arxiv.org/abs/1811.01908).

The model is similar to [Hierarchical Poisson Factorization](https://arxiv.org/abs/1311.1704), but uses regularization instead of a bayesian hierarchical structure, and is fit through gradient-based methods instead of coordinate ascent.
The model is similar to [Hierarchical Poisson Factorization](https://arxiv.org/abs/1311.1704), but uses regularization instead of a bayesian hierarchical structure, and is fit through gradient-based methods instead of coordinate ascent. It tries to approximate a sparse matrix of counts as a product of two lower-dimensional matrices in a way that maximizes Poisson likelihood - i.e.:
```
X ~ Poisson(A * t(B))
```

The implementation is in C with interfaces for Python and R.

# Update 2020-23-05

The conjugate gradient method in this package has been reworked, and it's no longer susceptible to failed optimizations. Using the conjugate gradient method coupled with large numbers of iterations is now very competitive in terms of both quality and speed against HPF (Bayesian version).

Additionally, the package has now added a truncated Newton - Conjugate Gradient solver option (it bundles a modified version of Jean-Sebastien Roy's C version, which was taken from SciPy). This is slower than the other methods, and slower than HPF, but tends to result in better quality solutions and relateively sparse latent factors (i.e. many of them are zero-valued), which is oftentimes desirable.
****************************
[(Example Python notebook here)](http://nbviewer.jupyter.org/github/david-cortes/poismf/blob/master/example/example_poismf_lastfm.ipynb)

# Model description

This library tries to fit a low-rank factorization model in which some sparse `X` matrix of counts data is assumed to be Poisson-distributed, with parameters given by the product of two non-negative and lower-dimensional matrices - that is:
```X ~ Poisson(A * t(B))```.
This library tries to fit a low-rank approximate factorization model in which some sparse `X` matrix of counts data is assumed to be Poisson-distributed, with parameters given by the product of two non-negative and lower-dimensional matrices - that is:
```
X ~ Poisson(A * t(B))
```

The model is fit through maximum likelihood estimation (adding a regularization term on the factor matrices) by alternating between updates to the `A` and `B` matrices, exploiting a shortcut for fast evaluation and optimization of Poisson likelihood when the `A` and `B` matrices are constrained to be non-negative and with no link function.

The model is fit through maximum likelihood estimation (adding a regularization term) by alternating between updates to the `A` and `B` matrices, exploiting a shortcut for fast evaluation and optimization of Poisson likelihood when the `A` and `B` matrices are constrained to be non-negative and with no link function:
![image](formula/pois_llk.png "poisson")
The intended primary use is for recommender systems, in which users are the rows of the `X` matrices, items the columns, and the non-zero values indicate interactions (e.g. clicks, views, plays, etc.) - the idea being that the items with the highest-predicted value for a given user are the best candidates to recommend.

The intended use is for recommender systems, in which users are the rows of the `X` matrices, items the columns, and the non-zero values indicate interactions (e.g. clicks, views, plays, etc.) - the idea being that the items with the highest-predicted value for a given user are the best candidates to recommend. Nevertheless, can also be used for other domains such as topic modeling or as a general dimensionality reduction model - just take any mention of users as rows or documents and any mention of items as columns or words.
Nevertheless, can also be used for other domains such as topic modeling or as a general dimensionality reduction model - just take any mention of users as rows or documents and any mention of items as columns or words.

Compared to other models, and depending on the optimization method used, this model has the advantage of producing sparse user and item factor matrices (i.e. most of the entries are exactly zero), which can be desirable in some situations.
Compared to other models, and depending on the optimization method used, this model has the advantage of producing very sparse user and item factor matrices (e.g. over 90% of the entries being exactly zero), which can be desirable in some situations.

# Installation

Expand All @@ -45,13 +48,13 @@ And then reinstall this package: `pip install --force-reinstall poismf`.
**IMPORTANT:** the setup script will try to add compilation flag `-march=native`. This instructs the compiler to tune the package for the CPU in which it is being installed, but the result might not be usable in other computers. If building a binary wheel of this package or putting it into a docker image which will be used in different machines, this can be overriden by manually supplying compilation `CFLAGS` as an environment variable with something related to architecture. For maximum compatibility (but slowest speed), assuming `x86-64` computers, it's possible to do something like this:

```
export CFLAGS="-msse2"
export CFLAGS="-march=x86-64"
pip install poismf
```

or for creating wheels:
```
export CFLAGS="-msse2"
export CFLAGS="-march=x86-64"
python setup.py bwheel
```
** *
Expand Down Expand Up @@ -215,6 +218,6 @@ int run_poismf(
* Cortes, David. "Fast Non-Bayesian Poisson Factorization for Implicit-Feedback Recommendations." arXiv preprint arXiv:1811.01908 (2018).
* Li, Can. "A conjugate gradient type method for the nonnegative constraints optimization problems." Journal of Applied Mathematics 2013 (2013).
* Nash, Stephen G. "Newton-type minimization via the Lanczos method." SIAM Journal on Numerical Analysis 21.4 (1984): 770-788.
* Carlsson, Christer, et al. "User’s guide for TN/TNBC: Fortran routines for nonlinear optimization." Mathematical Sciences Dept. Tech. Rep. 307, The Johns Hopkins University. 1984.
* Li, Can. "A conjugate gradient type method for the nonnegative constraints optimization problems." Journal of Applied Mathematics 2013 (2013).
18 changes: 9 additions & 9 deletions configure
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#! /bin/sh
# Guess values for system-dependent variables and create Makefiles.
# Generated by GNU Autoconf 2.71 for poismf 0.3.1.
# Generated by GNU Autoconf 2.71 for poismf 0.4.0.
#
#
# Copyright (C) 1992-1996, 1998-2017, 2020-2021 Free Software Foundation,
Expand Down Expand Up @@ -607,8 +607,8 @@ MAKEFLAGS=
# Identity of this package.
PACKAGE_NAME='poismf'
PACKAGE_TARNAME='poismf'
PACKAGE_VERSION='0.3.1'
PACKAGE_STRING='poismf 0.3.1'
PACKAGE_VERSION='0.4.0'
PACKAGE_STRING='poismf 0.4.0'
PACKAGE_BUGREPORT=''
PACKAGE_URL=''

Expand Down Expand Up @@ -1224,7 +1224,7 @@ if test "$ac_init_help" = "long"; then
# Omit some internal or obsolete options to make the list less imposing.
# This message is too long to be a string in the A/UX 3.1 sh.
cat <<_ACEOF
\`configure' configures poismf 0.3.1 to adapt to many kinds of systems.
\`configure' configures poismf 0.4.0 to adapt to many kinds of systems.
Usage: $0 [OPTION]... [VAR=VALUE]...
Expand Down Expand Up @@ -1286,7 +1286,7 @@ fi

if test -n "$ac_init_help"; then
case $ac_init_help in
short | recursive ) echo "Configuration of poismf 0.3.1:";;
short | recursive ) echo "Configuration of poismf 0.4.0:";;
esac
cat <<\_ACEOF
Expand Down Expand Up @@ -1366,7 +1366,7 @@ fi
test -n "$ac_init_help" && exit $ac_status
if $ac_init_version; then
cat <<\_ACEOF
poismf configure 0.3.1
poismf configure 0.4.0
generated by GNU Autoconf 2.71
Copyright (C) 2021 Free Software Foundation, Inc.
Expand Down Expand Up @@ -1489,7 +1489,7 @@ cat >config.log <<_ACEOF
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.
It was created by poismf $as_me 0.3.1, which was
It was created by poismf $as_me 0.4.0, which was
generated by GNU Autoconf 2.71. Invocation command line was
$ $0$ac_configure_args_raw
Expand Down Expand Up @@ -3777,7 +3777,7 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
# report actual input values of CONFIG_FILES etc. instead of their
# values after options handling.
ac_log="
This file was extended by poismf $as_me 0.3.1, which was
This file was extended by poismf $as_me 0.4.0, which was
generated by GNU Autoconf 2.71. Invocation command line was
CONFIG_FILES = $CONFIG_FILES
Expand Down Expand Up @@ -3832,7 +3832,7 @@ ac_cs_config_escaped=`printf "%s\n" "$ac_cs_config" | sed "s/^ //; s/'/'\\\\\\\\
cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
ac_cs_config='$ac_cs_config_escaped'
ac_cs_version="\\
poismf config.status 0.3.1
poismf config.status 0.4.0
configured by $0, generated by GNU Autoconf 2.71,
with options \\"\$ac_cs_config\\"
Expand Down
2 changes: 1 addition & 1 deletion configure.ac
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
AC_INIT([poismf], 0.3.1)
AC_INIT([poismf], 0.4.0)

AC_CHECK_LIB(atomic, __atomic_exchange, [HAS_ATOMIC=TRUE], [HAS_ATOMIC=FALSE])
if test "$HAS_ATOMIC" = "TRUE"; then
Expand Down
7 changes: 7 additions & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
pandas>=0.24.0
numpy
cython
scipy
Sphinx>=4.2.0
docutils==0.16
sphinx-rtd-theme>=1.0.0
59 changes: 43 additions & 16 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,26 +1,53 @@
.. poismf documentation master file, created by
sphinx-quickstart on Wed Jul 31 23:11:33 2019.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Poisson factorization for sparse count matrices
===============================================
For more information, visit the project's homepage
Poisson Factorization
=====================
This is the documentation page for the Python package *poismf*, which produces approximate
non-negative low-rank matrix factorizations of sparse counts matrices by maximizing Poisson
likelihood minus a regularization term, the result of which can be used for e.g. implicit-feedback
recommender systems or bag-of-words-based topic modeling.


For more information, visit the project's GitHub page:

`<https://www.github.com/david-cortes/poismf>`_

For the R version, see the CRAN page:

.. toctree::
:maxdepth: 2
:caption: Contents:
`<https://cran.r-project.org/package=poismf>`_

Installation
============
The Python version of this package can be easily installed from PyPI
::

.. automodule:: poismf
:members:
:undoc-members:
:show-inheritance:
:inherited-members:
pip install poismf

(See the GitHub page for more details)

Quick Example
=============

* `Poisson Factorization on the LastFM dataset <http://nbviewer.jupyter.org/github/david-cortes/poismf/blob/master/example/example_poismf_lastfm.ipynb>`_.

Methods
=======

* `PoisMF <#poismf.PoisMF>`_
* `fit <#poismf.PoisMF.fit>`_
* `fit_unsafe <#poismf.PoisMF.fit_unsafe>`_
* `predict <#poismf.PoisMF.predict>`_
* `predict_factors <#poismf.PoisMF.predict_factors>`_
* `topN <#poismf.PoisMF.topN>`_
* `topN_new <#poismf.PoisMF.topN_new>`_
* `transform <#poismf.PoisMF.transform>`_

PoisMF
======

.. automodule:: poismf
:members:
:undoc-members:
:show-inheritance:
:inherited-members:

Indices and tables
==================
Expand Down
Binary file removed formula/pois_llk.png
Binary file not shown.
9 changes: 3 additions & 6 deletions man/factors.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 4 additions & 4 deletions man/factors.single.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 0ba7f28

Please sign in to comment.