Name	Name	Last commit message	Last commit date
Latest commit History 59 Commits
.github	.github
julia	julia
matlab	matlab
performance_tests	performance_tests
python	python
LICENSE	LICENSE
README.md	README.md

PML

Approximate profile maximum likelihood estimation (in Julia, Matlab, and Python)

Profile maximum likelihood

Suppose we have n samples with empirical distribution (histogram) p̂=(̂p[1], ̂p[2], ...). A relabeling σ̂p = (p̂[σ[1]], p̂[σ[2]], ...) permutes the components of p̂ according to permutation σ. The profile maximum likelihood (PML) distribution maximizes the probability of observing any relabeling of the empirical distribution p̂, computed by:

$$pᴾᴹᴸ = argmaxₚ ∑_σ exp(-n D(σ̂p‖p)) / 𝓕₀!$$

where the sum is over all permutations σ of the support set of distribution p, 𝓕₀ is the number of symbols seen 0 times empirically, and D(·‖·) is the Kullback-Leibler divergence. The support set of p is generally not assumed known.

The PML distribution can be used as a plug-in estimator for symmetric functionals of a distribution (that is, functionals that are invariant under relabeling, like entropy, Rényi entropy, distance to uniformity, support set size, support set coverage, and others) or functionals of multiple distributions (like L₁ distance, Kullback-Leibler divergence, and others). Acharya, Das, Orlitsky, and Suresh 2016 show that the PML approach yields a competitive estimator for symmetric functionals.

The PML distribution is hard to compute, but we can compute it efficiently approximately. This package implements the approximations presented in [Pavlichin, Jiao, and Weissman 2017].

Usage

Julia, Matlab, and Python implementations share the same interface. See language-specific examples below.

Estimating a symmetric functional of a distribution

Julia, Matlab, and Python

We first compute the approximate PML distribution "under the hood" and then return the function(al) evaluated on the approximate PML distribution:

F_est = estimate_fun_from_histogram(F, empirical_distribution, [optional] K)

where F is a function(al) to be estimated and empirical_distribution is a collection of non-negative integers. K is an optional argument setting the assumed support set size (must be at least as large as the number of positive entries in empirical_distribution). If K is not provided, then we optimize over the support set size. Zero-valued entries of the empirical distribution are ignored during estimation.

Estimating a symmetric functional of multiple distributions

Julia and Python only

If F is a function(al) of D distributions -- like L₁ distance for D=2 -- then we need D empirical distributions of the same length to estimate it:

F_est = estimate_fun_from_multiple_histograms(F, [empirical_distribution_1, empirical_distribution_2])

This can be used even for a single empirical distribution with D=1 (e.g. estimate entropy), but then you should expect worse performance than using estimate_fun_from_histogram from the previous section. The reason is that for multiple histograms, the PML approximation relies on a heuristic that can be avoided with a special-purpose D=1 implementation.

For now there is no option to set the assumed support set size.

Computing the approximate PML distribution

Julia, Matlab, and Python

p = approximate_PML_from_histogram(empirical_distribution, [optional] K)

where empirical_distribution is a collection of non-negative integers and K is an optional argument setting the assumed support set size (must be at least as large as the number of positive entries in empirical_distribution). If K is not provided, then we optimize over the support set size. Zero-valued entries of the empirical distribution are ignored, so the inferred support size (the length of the output p) might be smaller than the length of empirical_histogram.

For some inputs, the output p has sum less than 1 (for example, if each symbol occurs once, so empirical_distribution is a vector of ones). The missing probability mass is the "continuous part," distributed over infinitely many unobserved symbols, and the output p is the "discrete part."

Computing multiple approximate PML distributions jointly

Julia and Python only

Given D empirical distributions of the same length:

p_list = approximate_PML_from_multiple_histograms([empirical_distribution_1, empirical_distribution_2, ...])

The output p_list is a list of the jointly approximated PML distributions. For now there is no option to set the assumed support set size.

Examples

Python

Requires numpy and scipy. Empirical histogram can be a list or numpy array.

>>> import pml as pml
>>> pml.approximate_PML_from_histogram([2, 1, 1]) # array([ 0.2,  0.2,  0.2,  0.2,  0.2])
>>> pml.approximate_PML_from_histogram([2, 1, 1], 4) # array([ 0.25,  0.25,  0.25,  0.25])

Some functions of distributions are provided for convenience, others we can define on the fly:

>>> H = pml.entropy_of_distribution # Shannon entropy, log base 2
>>> Renyi = lambda p: pml.renyi_entropy_of_distribution(p, alpha=1.5) # Rényi entropy with α = 1.5, log base 2
>>> support_set_size = lambda p: sum(x > 0 for x in p if x > 0)
>>> L1 = pml.L1_distance # L₁ distance
>>> D_KL = lambda p,q: pml.KL_divergence(p, q, min_ratio=1e-6) # KL divergence with assumed min_x p[x]/q[x] = 1e-6 to avoid infinities

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PML

Profile maximum likelihood

Usage

Estimating a symmetric functional of a distribution

Julia, Matlab, and Python

Estimating a symmetric functional of multiple distributions

Julia and Python only

Computing the approximate PML distribution

Julia, Matlab, and Python

Computing multiple approximate PML distributions jointly

Julia and Python only

Examples

Python

Julia

Matlab

About

Releases

Packages

Contributors 2

Languages

License

dmitrip/PML

Folders and files

Latest commit

History

Repository files navigation

PML

Profile maximum likelihood

Usage

Estimating a symmetric functional of a distribution

Julia, Matlab, and Python

Estimating a symmetric functional of multiple distributions

Julia and Python only

Computing the approximate PML distribution

Julia, Matlab, and Python

Computing multiple approximate PML distributions jointly

Julia and Python only

Examples

Python

Julia

Matlab

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages