forked from chbrandt/xmatch
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add xmatch draft (as a nb) and swift-by-lamassa table
- Loading branch information
Carlos Brandt
committed
Dec 20, 2017
1 parent
f21c8a1
commit 6149fcc
Showing
2 changed files
with
1,336 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,256 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Cross-matching astronomical catalogs\n", | ||
"\n", | ||
"Cross-matching is the process of finding entries (objects) in two or more tables (catalogs) considered, in reality, to be the same.\n", | ||
"\n", | ||
"Now, for a longer explanation, consider two astronomical catalogs -- where rows contain properties of astronomical objects, columns organize those properties and each (real) astronomical object can be present no more than once in each catalog. To picture a very simple situation, without loosing in generality, ee can think of such catalogs as being the products of optical and x-ray observations of a certain region of the sky. It it is expected -- say the *null-hypothesis* -- that not all but some of the objects are in both catalogs. Notice, though, that the catalogs -- as data structures -- are not required to share any other structural property like number or order of rows and columns. This is a typical scenario astronomers handle a cross-matching.\n", | ||
"\n", | ||
"The process of finding the objects shared by both catalogs is called cross-matching. In extra-galactic astronomy, in practice the objects -- galaxies, QSOs -- do not move; which brings their position in the sky to be used as an identifier. The very basic parameters to be used for the cross-matching is then Right Ascension and Declination: at each of those catalogs, the objects that are in the same position of the sky are said to be the same. The result of this process is a *cross-matched* catalog, containing the matched objects and the merge of columns (*i.e.*, properties) from both catalogs.\n", | ||
"\n", | ||
"Although it looks like a simple subject, cross-matching is a long-standing issue in astronomy. And it is quite easy to see that once we realize how observational effects (*e.g.*, astronomical seeing) affect, for instance, an object's position measurement. The uncertainties added to the measurements cause the same object to show up slightly different positions in each catalog; given that we have to match not exact values, but coordinates that should match within a tolerance value |egr| -- also called *error radius* or *search radius*. Intrinsic astrophysical effects can also cause the same astronomical object not to match between catalogs of different wavebands, for example, Radio Lobes generated by an Active Galactic Nuclei (AGN) can cause a mismatch between a radio and an optical catalogs.\n", | ||
"\n", | ||
"In practice, the basic idea on cross-matching is to associate the closest sources within a search radius. Which is called a *pure positional* method. Other two association methods can be found in the literature, *maximum likelihood* [SeS92]_ and *bayesian statistics* [BeS08]_.\n", | ||
"\n", | ||
"In this work the maximum likelihood estimator (MLE), great circle (GC) and nearest neighbout (NN)\n", | ||
"were implemented.\n", | ||
"\n", | ||
".. [SeS92] \"*On the likelihood ratio for source identification*\", W. Sutherland and W. Saunders, MNRAS, 1992\n", | ||
".. [BeS08] \"*Probabilistic Cross Identification of Astronomical Sources*\", T. Budavari and A. Szalay, ApJ, 2008\n", | ||
"\n", | ||
".. |egr| unicode:: U+003B5 .. GREEK SMALL LETTER EPSILON\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Nearest Neighbor algorithm\n", | ||
"\n", | ||
"The algorithm implement for cross-matching catalogs is based on a *nearest neighbor* search.\n", | ||
"\n", | ||
"### Two mock catalogs\n", | ||
"\n", | ||
"Consider two catalogs **A**, **B**. The algorithm does a two-way search, first taking catalog *A* as the *reference catalog* and searching for the nearest neighbor(NN) in *B* (*B->A*), then for the NN of *B* in *A* (*A->B*). Because there can be multiple matches with the same NN (different sources in *A* can have the same object in *B* as NN, when *B->A* for example) a cleaning step is necessary to remove this multiplicity. Such cleaning is done by keeping only the pair of objects (*i.e.*, match) that exist on both matching lists.\n", | ||
"\n", | ||
"`Figure 1`_ illustrates a spatial distribution of objects from two different catalogs -- *A*,*B*. This image should help us in visualizing the matching algorithm. The size of the circles symbolizes the *error radius* on object positions, they all measure ``20 pixels``.\n", | ||
"\n", | ||
".. figure:: images/xmatch/TOY_artificial_obj_distro_2_explain_matchAB_labeled.png\n", | ||
" :name: Figure 1\n", | ||
" :scale: 75%\n", | ||
" :align: center\n", | ||
"\n", | ||
" Mock spatial distribution of objects. *Blue* and *red* to mimic catalogs \"A\" and \"B\".\n", | ||
"\n", | ||
"\n", | ||
"The search for neighbors\n", | ||
"========================\n", | ||
"\n", | ||
"Searching for the nearest neighbor to the objects in catalog *A* we find the objects in *B* as listed on the table `Matching (NN) B->A`_ below. On the other hand, looking for the NN in the other direction, *B* as reference and looking for neighbors in *A*, we have the matches shown in `Matching (NN) A->B`_. As we might already been expecting, multiple objects were found from the second (\"neighborhood\") catalog to match.\n", | ||
"\n", | ||
"Next step is to clean those multiplicities. Going through the *multiple matches* given from both tables we should agree that the real *nearest neighbors* are:\n", | ||
"\n", | ||
"* **B->A**: [A:3,B:1], [A:7,B:4], [A:4,B:2], [A:12,B:5]\n", | ||
"* **A->B**: [B:2,A:4]\n", | ||
"\n", | ||
"That is to say that all other multiple matches should be taken out. Doing such, we end up with `Table 3`_, where separation (in pixels) are shown, where indeed only those 4 matchings-pairs are kept for the final result.\n", | ||
"\n", | ||
"Notice, though, that we are not get into the question whether the matches are right or wrong; not yet. The current discussion regards only the a search for a unique set of *nearest neighbors* pairs as a basic idea to establish before going further on qualifying those matches.\n", | ||
"\n", | ||
".. table:: Matching (NN) B->A\n", | ||
" :name: Matching (NN) B->A\n", | ||
"\n", | ||
" ==== ==== ==== =======\n", | ||
" A_ID x y NN_in_B\n", | ||
" ==== ==== ==== =======\n", | ||
" 1 301 34 1\n", | ||
" 2 51 74 1\n", | ||
" 3 230 145 1\n", | ||
" 4 404 232 2\n", | ||
" 5 229 265 4\n", | ||
" 6 52 286 4\n", | ||
" 7 83 288 4\n", | ||
" 8 346 289 2\n", | ||
" 9 317 339 2\n", | ||
" 10 214 376 4\n", | ||
" 11 465 388 5\n", | ||
" 12 401 455 5\n", | ||
" ==== ==== ==== =======\n", | ||
"\n", | ||
"\n", | ||
".. table:: Matching (NN) A->B\n", | ||
" :name: Matching (NN) A->B\n", | ||
"\n", | ||
" ==== ==== ==== =======\n", | ||
" B_ID x y NN_in_A\n", | ||
" ==== ==== ==== =======\n", | ||
" 1 299 126 3\n", | ||
" 2 391 246 4\n", | ||
" 3 445 249 4\n", | ||
" 4 120 359 7\n", | ||
" 5 428 441 12\n", | ||
" 6 125 446 10\n", | ||
" ==== ==== ==== =======\n", | ||
"\n", | ||
"\n", | ||
".. table:: NN matching-table\n", | ||
" :name: Table 3\n", | ||
"\n", | ||
" ===== ==== ==== ==== ==== ==== =====\n", | ||
" A_ID x y B_ID x y dist\n", | ||
" ===== ==== ==== ==== ==== ==== =====\n", | ||
" 1 301 34 nan nan nan nan\n", | ||
" 2 51 74 nan nan nan nan\n", | ||
" 3 230 145 1 299 126 71\n", | ||
" 4 404 232 2 391 246 19\n", | ||
" 5 229 265 nan nan nan nan\n", | ||
" 6 52 286 nan nan nan nan\n", | ||
" 7 83 288 4 120 359 80\n", | ||
" 8 346 289 nan nan nan nan\n", | ||
" 9 317 339 nan nan nan nan\n", | ||
" 10 214 376 nan nan nan nan\n", | ||
" 11 465 388 nan nan nan nan\n", | ||
" 12 401 455 5 428 441 30\n", | ||
" ===== ==== ==== ==== ==== ==== =====\n", | ||
"\n", | ||
"\n", | ||
"Pseudocode\n", | ||
"----------\n", | ||
"\n", | ||
"The (pseudocode) inputs are ``catalog-A`` and ``catalog-B``, tables (e.g, FITS files) where at least the columns ``RA`` (Right Ascension), ``DEC`` (Declination) and ``OBJID`` (Object ID, unique identifier) are present [*]_.\n", | ||
"\n", | ||
"```\n", | ||
"SkyCoord := ~astropy.coordinates.SkyCoord\n", | ||
"\n", | ||
"A <- read columns 'RA','DEC','OBJID' from catalog-A\n", | ||
"B <- read columns 'RA','DEC','OBJID' from catalog-B\n", | ||
"\n", | ||
"coords_A <- build SkyCoord objects from 'RA','DEC'\n", | ||
"coords_B <- build SkyCoord objects from 'RA','DEC'\n", | ||
"\n", | ||
"match_A <- find nearest-neighbor from A in B\n", | ||
"match_B <- find nearest-neighbor from B in A\n", | ||
"\n", | ||
"match_AB <- filter intersection matches between match_A & match_B\n", | ||
"\n", | ||
"matched_catalog <- join A,B following match_AB\n", | ||
"```\n", | ||
"\n", | ||
"The output, ``matched_catalog``, provides all columns in ``A`` and ``B`` -- for instance, ``RA``, ``DEC``, ``OBJID`` -- plus the angular separation -- ``distance_AB`` -- between the matches. The number of rows is the same as catalog A.\n", | ||
"Using *relational database* jargon, ``matched_catalog`` is the result of a *left join* between ``A`` and ``B``.\n", | ||
"\n", | ||
"+--------+-------+----+-----+-------+----+-----+------+\n", | ||
"| | A | B | AB |\n", | ||
"+--------+-------+----+-----+-------+----+-----+------+\n", | ||
"| index | OBJID | RA | DEC | OBJID | RA | DEC | dist |\n", | ||
"+========+=======+====+=====+=======+====+=====+======+\n", | ||
"| 1 | | | | | | | |\n", | ||
"+--------+-------+----+-----+-------+----+-----+------+\n", | ||
"| 2 | | | | | | | |\n", | ||
"+--------+-------+----+-----+-------+----+-----+------+\n", | ||
"| ... | | | | | | | |\n", | ||
"+--------+-------+----+-----+-------+----+-----+------+\n", | ||
"|length A| | | | | | | |\n", | ||
"+--------+-------+----+-----+-------+----+-----+------+\n", | ||
"\n", | ||
"\n", | ||
".. [*] The pipeline does not consider the use of positional errors so far.\n", | ||
"\n", | ||
"\n", | ||
"\n", | ||
"Defining a search-radius\n", | ||
"========================\n", | ||
"\n", | ||
"So far, the \"tolerance radius\" we `previously talked <xmatch_introduction>`_ about has not been considered.\n", | ||
"On not considering this aspect, we might end up with matches between (\"nearest-neighbors\") objects that are not actually close to each other. In our current mock example, it can be argued that among all the possible matches `Table 3`_ only *[A:4,B:2]* and *[A:12,B:5]* do represent real matches -- if we consider ``30 pixels`` to be sufficiently close.\n", | ||
"\n", | ||
"Although not used ever so far, the objects in our mock catalogs have an \"error radius\" of ``20 pixels``. Considering the errors between \"A\" and \"B\" to be independent, their intersection is given by their sum in quadrature,\n", | ||
"\n", | ||
".. math::\n", | ||
"\n", | ||
" \\epsilon &= \\sqrt{err_A^2 + err_B^2} \\\\\n", | ||
" &= \\sqrt{20^2 + 20^2} \\\\\n", | ||
" &= 28.2\n", | ||
"\n", | ||
", which gives us a tolerance radius of :math:`28` pixels. Using this value to cut out \"false matchings\" leaves us with the pair *[A:4,B:2]*. As such, our final matching table is given by `Table 4`_ below.\n", | ||
"\n", | ||
".. table:: Final matching-table\n", | ||
" :name: Table 4\n", | ||
"\n", | ||
" ===== ==== ==== ==== ==== ==== =====\n", | ||
" A_ID x y B_ID x y dist\n", | ||
" ===== ==== ==== ==== ==== ==== =====\n", | ||
" 1 301 34 nan nan nan nan\n", | ||
" 2 51 74 nan nan nan nan\n", | ||
" 3 230 145 nan nan nan nan\n", | ||
" 4 404 232 2 391 246 19\n", | ||
" 5 229 265 nan nan nan nan\n", | ||
" 6 52 286 nan nan nan nan\n", | ||
" 7 83 288 nan nan nan nan\n", | ||
" 8 346 289 nan nan nan nan\n", | ||
" 9 317 339 nan nan nan nan\n", | ||
" 10 214 376 nan nan nan nan\n", | ||
" 11 465 388 nan nan nan nan\n", | ||
" 12 401 455 nan nan nan nan\n", | ||
" ===== ==== ==== ==== ==== ==== =====\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Great circle algorithm\n", | ||
"\n", | ||
"The fundamental characteristic of a great-circle (gc) cross-matching is the\n", | ||
"restriction on the region to search for counterparts, and that limiting region\n", | ||
"is defined by a *radius* parameter around each target source.\n", | ||
"\n", | ||
"Whereas a *nn* algorithm will *always* find a counterpart for each and every\n", | ||
"target, the matching pair may be separated by a unreasonable distance.\n", | ||
"The *gc* algorithm on the other hand, by restricting the search region, may not\n", | ||
"find a counterpart for each target; and depending on the size of the search\n", | ||
"region, it may find multiple counterparts, to decide which one to take is a\n", | ||
"second step.\n", | ||
"\n", | ||
"Typically, in the simplest scenario, the *nearest neighbour* is taken among the\n", | ||
"counterpart candidates found by the *great-circle* algorithm.\n", | ||
"\n", | ||
"The great circle approach provides a better performant algorithm (O(N*logN))\n", | ||
"and provide a natural mechanism to sub-sample the datasets.\n", | ||
"Such sampling mechanism may then be used to dynamically define the best match.\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.5.4" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
Oops, something went wrong.