Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/mimno/Mallet
Browse files Browse the repository at this point in the history
  • Loading branch information
mimno committed Jun 13, 2021
2 parents 001beaf + d9fd566 commit a25dbb1
Showing 2 changed files with 5 additions and 1 deletion.
2 changes: 1 addition & 1 deletion src/cc/mallet/topics/tui/TopicTrainer.java
Original file line number Diff line number Diff line change
@@ -147,7 +147,7 @@ public class TopicTrainer {
"The number of iterations to run before first estimating dirichlet hyperparameters.", null);

static CommandOption.Boolean useSymmetricAlpha = new CommandOption.Boolean(TopicTrainer.class, "use-symmetric-alpha", "true|false", false, false,
"Only optimize the concentration parameter of the prior over document-topic distributions. This may reduce the number of very small, poorly estimated topics, but may disperse common words over several topics.", null);
"Optimize the concentration parameter (SumAlpha) of the prior over document-topic distributions while keeping it symmetric. This may reduce the number of very small, poorly estimated topics, but may disperse common words over several topics.", null);

static CommandOption.Double alpha = new CommandOption.Double(TopicTrainer.class, "alpha", "DECIMAL", true, 5.0,
"SumAlpha parameter: sum over topics of smoothing over doc-topic distributions. alpha_k = [this value] / [num topics]",null);
4 changes: 4 additions & 0 deletions src/cc/mallet/types/Dirichlet.java
Original file line number Diff line number Diff line change
@@ -555,6 +555,8 @@ public static void testSymmetricConcentration(int numDimensions, int numObservat

/**
* Learn Dirichlet parameters using frequency histograms
* described by Hanna Wallach in "Structured Topic Models for Language" (2008), section 2.4
* Method 1: Using the Digamma Recurrence Relation (pp. 27-28)
*
* @param parameters A reference to the current values of the parameters, which will be updated in place
* @param observations An array of count histograms. <code>observations[10][3]</code> could be the number of documents that contain exactly 3 tokens of word type 10.
@@ -571,6 +573,8 @@ public static double learnParameters(double[] parameters,

/**
* Learn Dirichlet parameters using frequency histograms
* described by Hanna Wallach in "Structured Topic Models for Language", section 2.4
* Method 1: Using the Digamma Recurrence Relation (pp. 27-28) and gamma hyperpriors (section 2.5, pp. 37-39)
*
* @param parameters A reference to the current values of the parameters, which will be updated in place
* @param observations An array of count histograms. <code>observations[10][3]</code> could be the number of documents that contain exactly 3 tokens of word type 10.

0 comments on commit a25dbb1

Please sign in to comment.