Approximate profile maximum likelihood estimation (in Julia, Matlab, and Python)
Suppose we have n
samples with empirical distribution (histogram) p̂=(̂p[1], ̂p[2], ...)
. A relabeling σ̂p = (p̂[σ[1]], p̂[σ[2]], ...)
permutes the components of p̂
according to permutation σ
. The profile maximum likelihood (PML) distribution maximizes the probability of observing any relabeling of the empirical distribution p̂
, computed by:
where the sum is over all permutations σ
of the support set of distribution p
, 𝓕₀
is the number of symbols seen 0 times empirically, and D(·‖·)
is the Kullback-Leibler divergence. The support set of p
is generally not assumed known.
The PML distribution can be used as a plug-in estimator for symmetric functionals of a distribution (that is, functionals that are invariant under relabeling, like entropy, Rényi entropy, distance to uniformity, support set size, support set coverage, and others) or functionals of multiple distributions (like L₁ distance, Kullback-Leibler divergence, and others). Acharya, Das, Orlitsky, and Suresh 2016 show that the PML approach yields a competitive estimator for symmetric functionals.
The PML distribution is hard to compute, but we can compute it efficiently approximately. This package implements the approximations presented in [Pavlichin, Jiao, and Weissman 2017].
Julia, Matlab, and Python implementations share the same interface. See language-specific examples below.
We first compute the approximate PML distribution "under the hood" and then return the function(al) evaluated on the approximate PML distribution:
F_est = estimate_fun_from_histogram(F, empirical_distribution, [optional] K)
where F
is a function(al) to be estimated and empirical_distribution
is a collection of non-negative integers. K
is an optional argument setting the assumed support set size (must be at least as large as the number of positive entries in empirical_distribution
). If K
is not provided, then we optimize over the support set size. Zero-valued entries of the empirical distribution are ignored during estimation.
If F
is a function(al) of D distributions -- like L₁ distance for D=2 -- then we need D empirical distributions of the same length to estimate it:
F_est = estimate_fun_from_multiple_histograms(F, [empirical_distribution_1, empirical_distribution_2])
This can be used even for a single empirical distribution with D=1 (e.g. estimate entropy), but then you should expect worse performance than using estimate_fun_from_histogram
from the previous section. The reason is that for multiple histograms, the PML approximation relies on a heuristic that can be avoided with a special-purpose D=1 implementation.
For now there is no option to set the assumed support set size.
p = approximate_PML_from_histogram(empirical_distribution, [optional] K)
where empirical_distribution
is a collection of non-negative integers and K
is an optional argument setting the assumed support set size (must be at least as large as the number of positive entries in empirical_distribution
). If K
is not provided, then we optimize over the support set size. Zero-valued entries of the empirical distribution are ignored, so the inferred support size (the length of the output p
) might be smaller than the length of empirical_histogram
.
For some inputs, the output p
has sum less than 1 (for example, if each symbol occurs once, so empirical_distribution
is a vector of ones). The missing probability mass is the "continuous part," distributed over infinitely many unobserved symbols, and the output p
is the "discrete part."
Given D empirical distributions of the same length:
p_list = approximate_PML_from_multiple_histograms([empirical_distribution_1, empirical_distribution_2, ...])
The output p_list
is a list of the jointly approximated PML distributions. For now there is no option to set the assumed support set size.
Requires numpy and scipy. Empirical histogram can be a list or numpy array.
>>> import pml as pml
>>> pml.approximate_PML_from_histogram([2, 1, 1]) # array([ 0.2, 0.2, 0.2, 0.2, 0.2])
>>> pml.approximate_PML_from_histogram([2, 1, 1], 4) # array([ 0.25, 0.25, 0.25, 0.25])
Some functions of distributions are provided for convenience, others we can define on the fly:
>>> H = pml.entropy_of_distribution # Shannon entropy, log base 2
>>> Renyi = lambda p: pml.renyi_entropy_of_distribution(p, alpha=1.5) # Rényi entropy with α = 1.5, log base 2
>>> support_set_size = lambda p: sum(x > 0 for x in p if x > 0)
>>> L1 = pml.L1_distance # L₁ distance
>>> D_KL = lambda p,q: pml.KL_divergence(p, q, min_ratio=1e-6) # KL divergence with assumed min_x p[x]/q[x] = 1e-6 to avoid infinities