Add xmatch draft (as a nb) and swift-by-lamassa table

wangli6666 · Dec 20, 2017 · 6149fcc · 6149fcc
1 parent f21c8a1
commit 6149fcc
Show file tree

Hide file tree

Showing 2 changed files with 1,336 additions and 0 deletions.
diff --git a/docs/XMatch.ipynb b/docs/XMatch.ipynb
@@ -0,0 +1,256 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Cross-matching astronomical catalogs\n",
+    "\n",
+    "Cross-matching is the process of finding entries (objects) in two or more tables (catalogs) considered, in reality, to be the same.\n",
+    "\n",
+    "Now, for a longer explanation, consider two astronomical catalogs -- where rows contain properties of  astronomical objects, columns organize those properties and each (real) astronomical object can be present no more than once in each catalog. To picture a very simple situation, without loosing in generality, ee can think of such catalogs as being the products of optical and x-ray observations of a certain region of the sky. It it is expected -- say the *null-hypothesis* -- that not all but some of the objects are in both catalogs. Notice, though, that the catalogs -- as data structures -- are not required to share any other structural property like number or order of rows and columns. This is a typical scenario astronomers handle a cross-matching.\n",
+    "\n",
+    "The process of finding the objects shared by both catalogs is called cross-matching. In extra-galactic astronomy, in practice the objects -- galaxies, QSOs -- do not move; which brings their position in the sky to be used as an identifier. The very basic parameters to be used for the cross-matching is then Right Ascension and Declination: at each of those catalogs, the objects that are in the same position of the sky are said to be the same. The result of this process is a *cross-matched* catalog, containing the matched objects and the merge of columns (*i.e.*, properties) from both catalogs.\n",
+    "\n",
+    "Although it looks like a simple subject, cross-matching is a long-standing issue in astronomy. And it is quite easy to see that once we realize how observational effects (*e.g.*, astronomical seeing) affect, for instance, an object's position measurement. The uncertainties added to the measurements cause the same object to show up slightly different positions in each catalog; given that we have to match not exact values, but coordinates that should match within a tolerance value |egr| -- also called *error radius*  or *search radius*. Intrinsic astrophysical effects can also cause the same astronomical object not to match between catalogs of different wavebands, for example, Radio Lobes generated by an Active Galactic Nuclei (AGN) can cause a mismatch between a radio and an optical catalogs.\n",
+    "\n",
+    "In practice, the basic idea on cross-matching is to associate the closest sources within a search radius. Which is called a *pure positional* method. Other two association methods can be found in the literature, *maximum likelihood* [SeS92]_ and *bayesian statistics* [BeS08]_.\n",
+    "\n",
+    "In this work the maximum likelihood estimator (MLE), great circle (GC) and nearest neighbout (NN)\n",
+    "were implemented.\n",
+    "\n",
+    ".. [SeS92] \"*On the likelihood ratio for source identification*\", W. Sutherland and W. Saunders, MNRAS, 1992\n",
+    ".. [BeS08] \"*Probabilistic Cross Identification of Astronomical Sources*\", T. Budavari and A. Szalay, ApJ, 2008\n",
+    "\n",
+    ".. |egr|  unicode:: U+003B5 .. GREEK SMALL LETTER EPSILON\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Nearest Neighbor algorithm\n",
+    "\n",
+    "The algorithm implement for cross-matching catalogs is based on a *nearest neighbor* search.\n",
+    "\n",
+    "### Two mock catalogs\n",
+    "\n",
+    "Consider two catalogs **A**, **B**. The algorithm does a two-way search, first taking catalog *A* as the *reference catalog* and searching for the nearest neighbor(NN) in *B* (*B->A*), then for the NN of *B* in *A* (*A->B*). Because there can be multiple matches with the same NN (different sources in *A* can have the same object in *B* as NN, when *B->A* for example) a cleaning step is necessary to remove this multiplicity. Such cleaning is done by keeping only the pair of objects (*i.e.*, match) that exist on both matching lists.\n",
+    "\n",
+    "`Figure 1`_ illustrates a spatial distribution of objects from two different catalogs -- *A*,*B*. This image should help us in visualizing the matching algorithm. The size of the circles symbolizes the *error radius* on object positions, they all measure ``20 pixels``.\n",
+    "\n",
+    ".. figure:: images/xmatch/TOY_artificial_obj_distro_2_explain_matchAB_labeled.png\n",
+    "   :name:     Figure 1\n",
+    "   :scale:    75%\n",
+    "   :align:    center\n",
+    "\n",
+    "   Mock spatial distribution of objects. *Blue* and *red* to mimic catalogs \"A\" and \"B\".\n",
+    "\n",
+    "\n",
+    "The search for neighbors\n",
+    "========================\n",
+    "\n",
+    "Searching for the nearest neighbor to the objects in catalog *A* we find the objects in *B* as listed on the  table `Matching (NN) B->A`_ below. On the other hand, looking for the NN in the other direction, *B* as reference and looking for neighbors in *A*, we have the matches shown in `Matching (NN) A->B`_. As we might already been expecting, multiple objects were found from the second (\"neighborhood\") catalog to match.\n",
+    "\n",
+    "Next step is to clean those multiplicities. Going through the *multiple matches* given from both tables we should agree that the real *nearest neighbors* are:\n",
+    "\n",
+    "* **B->A**: [A:3,B:1], [A:7,B:4], [A:4,B:2], [A:12,B:5]\n",
+    "* **A->B**: [B:2,A:4]\n",
+    "\n",
+    "That is to say that all other multiple matches should be taken out. Doing such, we end up with `Table 3`_, where separation (in pixels) are shown, where indeed only those 4 matchings-pairs are kept for the final result.\n",
+    "\n",
+    "Notice, though, that we are not get into the question whether the matches are right or wrong; not yet. The current discussion regards only the a search for a unique set of *nearest neighbors* pairs as a basic idea to establish before going further on qualifying those matches.\n",
+    "\n",
+    ".. table:: Matching (NN) B->A\n",
+    "   :name:  Matching (NN) B->A\n",
+    "\n",
+    "   ==== ==== ==== =======\n",
+    "   A_ID   x    y  NN_in_B\n",
+    "   ==== ==== ==== =======\n",
+    "     1  301   34        1\n",
+    "     2   51   74        1\n",
+    "     3  230  145        1\n",
+    "     4  404  232        2\n",
+    "     5  229  265        4\n",
+    "     6   52  286        4\n",
+    "     7   83  288        4\n",
+    "     8  346  289        2\n",
+    "     9  317  339        2\n",
+    "    10  214  376        4\n",
+    "    11  465  388        5\n",
+    "    12  401  455        5\n",
+    "   ==== ==== ==== =======\n",
+    "\n",
+    "\n",
+    ".. table:: Matching (NN) A->B\n",
+    "   :name:  Matching (NN) A->B\n",
+    "\n",
+    "   ==== ==== ==== =======\n",
+    "   B_ID   x    y  NN_in_A\n",
+    "   ==== ==== ==== =======\n",
+    "     1  299  126        3\n",
+    "     2  391  246        4\n",
+    "     3  445  249        4\n",
+    "     4  120  359        7\n",
+    "     5  428  441       12\n",
+    "     6  125  446       10\n",
+    "   ==== ==== ==== =======\n",
+    "\n",
+    "\n",
+    ".. table:: NN matching-table\n",
+    "   :name:  Table 3\n",
+    "\n",
+    "   ===== ==== ==== ==== ==== ==== =====\n",
+    "    A_ID  x    y   B_ID  x    y   dist\n",
+    "   ===== ==== ==== ==== ==== ==== =====\n",
+    "      1   301   34  nan  nan  nan   nan\n",
+    "      2    51   74  nan  nan  nan   nan\n",
+    "      3   230  145   1   299  126    71\n",
+    "      4   404  232   2   391  246    19\n",
+    "      5   229  265  nan  nan  nan   nan\n",
+    "      6    52  286  nan  nan  nan   nan\n",
+    "      7    83  288   4   120  359    80\n",
+    "      8   346  289  nan  nan  nan   nan\n",
+    "      9   317  339  nan  nan  nan   nan\n",
+    "     10   214  376  nan  nan  nan   nan\n",
+    "     11   465  388  nan  nan  nan   nan\n",
+    "     12   401  455   5   428  441    30\n",
+    "   ===== ==== ==== ==== ==== ==== =====\n",
+    "\n",
+    "\n",
+    "Pseudocode\n",
+    "----------\n",
+    "\n",
+    "The (pseudocode) inputs are ``catalog-A`` and ``catalog-B``, tables (e.g, FITS files) where at least the columns ``RA`` (Right Ascension), ``DEC`` (Declination) and ``OBJID`` (Object ID, unique identifier) are present [*]_.\n",
+    "\n",
+    "```\n",
+    "SkyCoord := ~astropy.coordinates.SkyCoord\n",
+    "\n",
+    "A <- read columns 'RA','DEC','OBJID' from catalog-A\n",
+    "B <- read columns 'RA','DEC','OBJID' from catalog-B\n",
+    "\n",
+    "coords_A <- build SkyCoord objects from 'RA','DEC'\n",
+    "coords_B <- build SkyCoord objects from 'RA','DEC'\n",
+    "\n",
+    "match_A <- find nearest-neighbor from A in B\n",
+    "match_B <- find nearest-neighbor from B in A\n",
+    "\n",
+    "match_AB <- filter intersection matches between match_A & match_B\n",
+    "\n",
+    "matched_catalog <- join A,B following match_AB\n",
+    "```\n",
+    "\n",
+    "The output, ``matched_catalog``, provides all columns in ``A`` and ``B`` -- for instance, ``RA``, ``DEC``, ``OBJID`` -- plus the angular separation -- ``distance_AB`` -- between the matches. The number of rows is the same as catalog A.\n",
+    "Using *relational database* jargon, ``matched_catalog`` is the result of a *left join* between ``A`` and ``B``.\n",
+    "\n",
+    "+--------+-------+----+-----+-------+----+-----+------+\n",
+    "|        | A                | B                |  AB  |\n",
+    "+--------+-------+----+-----+-------+----+-----+------+\n",
+    "| index  | OBJID | RA | DEC | OBJID | RA | DEC | dist |\n",
+    "+========+=======+====+=====+=======+====+=====+======+\n",
+    "| 1      |       |    |     |       |    |     |      |\n",
+    "+--------+-------+----+-----+-------+----+-----+------+\n",
+    "| 2      |       |    |     |       |    |     |      |\n",
+    "+--------+-------+----+-----+-------+----+-----+------+\n",
+    "| ...    |       |    |     |       |    |     |      |\n",
+    "+--------+-------+----+-----+-------+----+-----+------+\n",
+    "|length A|       |    |     |       |    |     |      |\n",
+    "+--------+-------+----+-----+-------+----+-----+------+\n",
+    "\n",
+    "\n",
+    ".. [*] The pipeline does not consider the use of positional errors so far.\n",
+    "\n",
+    "\n",
+    "\n",
+    "Defining a search-radius\n",
+    "========================\n",
+    "\n",
+    "So far, the \"tolerance radius\" we `previously talked <xmatch_introduction>`_ about has not been considered.\n",
+    "On not considering this aspect, we might end up with matches between (\"nearest-neighbors\") objects that are not actually close to each other. In our current mock example, it can be argued that among all the possible matches `Table 3`_ only *[A:4,B:2]* and *[A:12,B:5]* do represent real matches -- if we consider ``30 pixels`` to be sufficiently close.\n",
+    "\n",
+    "Although not used ever so far, the objects in our mock catalogs have an \"error radius\" of ``20 pixels``. Considering the errors between \"A\" and \"B\" to be independent, their intersection is given by their sum in quadrature,\n",
+    "\n",
+    ".. math::\n",
+    "\n",
+    "   \\epsilon &= \\sqrt{err_A^2 + err_B^2} \\\\\n",
+    "            &= \\sqrt{20^2 + 20^2} \\\\\n",
+    "            &= 28.2\n",
+    "\n",
+    ", which gives us a tolerance radius of :math:`28` pixels. Using this value to cut out \"false matchings\" leaves us with the pair *[A:4,B:2]*. As such, our final matching table is given by `Table 4`_ below.\n",
+    "\n",
+    ".. table:: Final matching-table\n",
+    "   :name:  Table 4\n",
+    "\n",
+    "   ===== ==== ==== ==== ==== ==== =====\n",
+    "    A_ID  x    y   B_ID  x    y   dist\n",
+    "   ===== ==== ==== ==== ==== ==== =====\n",
+    "      1   301   34  nan  nan  nan  nan\n",
+    "      2    51   74  nan  nan  nan  nan\n",
+    "      3   230  145  nan  nan  nan  nan\n",
+    "      4   404  232   2   391  246   19\n",
+    "      5   229  265  nan  nan  nan  nan\n",
+    "      6    52  286  nan  nan  nan  nan\n",
+    "      7    83  288  nan  nan  nan  nan\n",
+    "      8   346  289  nan  nan  nan  nan\n",
+    "      9   317  339  nan  nan  nan  nan\n",
+    "     10   214  376  nan  nan  nan  nan\n",
+    "     11   465  388  nan  nan  nan  nan\n",
+    "     12   401  455  nan  nan  nan  nan\n",
+    "   ===== ==== ==== ==== ==== ==== =====\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Great circle algorithm\n",
+    "\n",
+    "The fundamental characteristic of a great-circle (gc) cross-matching is the\n",
+    "restriction on the region to search for counterparts, and that limiting region\n",
+    "is defined by a *radius* parameter around each target source.\n",
+    "\n",
+    "Whereas a *nn* algorithm will *always* find a counterpart for each and every\n",
+    "target, the matching pair may be separated by a unreasonable distance.\n",
+    "The *gc* algorithm on the other hand, by restricting the search region, may not\n",
+    "find a counterpart for each target; and depending on the size of the search\n",
+    "region, it may find multiple counterparts, to decide which one to take is a\n",
+    "second step.\n",
+    "\n",
+    "Typically, in the simplest scenario, the *nearest neighbour* is taken among the\n",
+    "counterpart candidates found by the *great-circle* algorithm.\n",
+    "\n",
+    "The great circle approach provides a better performant algorithm (O(N*logN))\n",
+    "and provide a natural mechanism to sub-sample the datasets.\n",
+    "Such sampling mechanism may then be used to dynamically define the best match.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.5.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}