The TARA package includes a tool (match_xy.pro) to match astronomical catalogs using positional uncertainties for individual sources plus a match significance threshold, rather than using a fixed matching radius as is commonly employed. This tool can be used to match catalogs from different observatories (e.g. to find infrared counterparts to Chandra sources), or can be used to find the union of Chandra source lists obtained from multiple source detection runs.
), and a unique identifier (e.g. sequence number) for the entry.
If a known reference frame offset between the catalogs is provided to match_xy (see examples) then that offset is applied to one of the catalogs at the start of the algorithm.
The second stage of the algorithm seeks to pare down the large set of significant matches produced by the first stage, which can include many-to-one and one-to-many relationships, to a reasonable one-to-one set of matches destined to be accepted by the observer as ``true'' associations. This stage is asymmetric; the matches are examined from the point of view of one of the catalogs, the ``master catalog''. We refer to the most significant match of each master source as its Primary Match (PM), and refer to any other significant matches as ``Secondary Matches'' (SM); if no significant match exists we refer to the source as ``isolated''. This stage seeks to label the PMs as either ``successful'', meaning that the observer will accept them as true associations, or ``failed'', meaning that the PM should be examined carefully. The labeling is constructed so that the successful PMs form one-to-one relationships. Imagine how a human with the two catalogs plotted together on paper might construct such a labeling by circling (accepting) matches in order of their likelihood while avoiding circling any source twice. The algorithm uses the same ``greedy'' approach, processing the matches from the first stage in order of their significance, and disallowing many-to-one and one-to-many relationships. When a PM is accepted as ``successful'' any remaining PMs which involve either participating source are labeled as ``failed''. Thus for example if two master sources are significantly close to a single slave source, the closer master will be labeled as a successful PM and the other will be labeled as a failed PM. The term ``failed'' is meant to convey that a match would have been accepted as a true association if another more likely association had not interfered.
The third stage of the algorithm produces ds9 region files, plots, and statistics which help the observer understand and examine the matching results.
The process also produces a pair of ds9 region files (described below) containing graphics which help the observer visualize the matching results. The meaning of the symbols and colors used are recorded in the headers of the region files. The coordinate system used in the region files is the (X,Y) coordinate system used in the catalogs (see below).
A final output is a ``union'' (merged) catalog containing the union of the source lists, i.e. with matches (duplicates) represented only once. The position of a duplicate source is taken from the more accurate of the two matching positions. For example if you have multiple catalogs for the same field produced by different source detection runs this union catalog output could be adopted as the final source list for the field.
): 1-sigma position error estimates, possibly different for each source.
If your goal is to form the union of (i.e. merge) multiple catalogs assumed to lie in the same reference frame then an example set of IDL calls would be:
.run match_xy
; Read in three catalogs.
cat_A = build_mycat('catA.txt')
cat_B = build_mycat('catB.txt')
cat_C = build_mycat('catC.txt')
; Pick one of the catalogs to be the master.
match_xy, the_match, cat_A, 'A', /INIT
; Choose a match significance.
sig = 0.99
; Match to the second catalog.
match_xy, the_match, cat_B, 'B', sig, UNION_CAT=cat_AB
; Match the AB union to the third catalog.
match_xy, the_match, cat_AB, 'AB', /INIT
match_xy, the_match, cat_C, 'C', sig, UNION_CAT=cat_ABC
The array of structures cat_ABC is the union of the three catalogs.
If your goal is to find 2MASS and GLIMPSE counterparts to an ACIS catalog then an example set of IDL calls would be:
.run match_xy
; Read in the catalogs.
acis_cat = build_acis_cat('catACIS.fits')
twomass_cat = build_2mass_cat('cat2MASS.txt')
glimpse_cat = build_glimpse_cat('catGLIMPSE.txt')
; Let the ACIS catalog be the master.
match_xy, the_match, acis_cat, 'ACIS', /INIT
; Choose a match significance.
sig = 0.99
; Match to the slaves and analyze the reference frame offsets.
match_xy, the_match, twomass_cat, 'TWOMASS', sig
match_xy, the_match, glimpse_cat, 'GLIMPSE', sig
match_xy_analyze, the_match, composite_cat
; From the plots and statistics presented, choose reference frame offsets and rematch.
match_xy, the_match, twomass_cat, 'TWOMASS', sig, XSHIFT=-0.125, YSHIFT= 0.574
match_xy, the_match, glimpse_cat, 'GLIMPSE', sig, XSHIFT= 0.074, YSHIFT=-0.066
; Confirm the catalogs are now well aligned.
; Construct & save a ds9 region file and a composite catalog
match_xy_analyze, the_match, composite_cat
save, the_match, composite_cat, FILE='composite_cat.sav'
The composite catalog composite_cat is an array of structures carrying all the information from the master catalog plus information from all the the matching slave catalogs.
Two ds9 region files (described above) named ``composite_cat.reg'' and ``composite_cat_pm.reg'' are produced.
If you wish to use only a subset of the sources for estimating reference frame offsets you can
supply a mask vector to the match_xy_analyze routine via the keyword ANALYSIS_MASK. For example:
; Use only on-axis ACIS sources with more than 10 counts. acis_mask = (acis_cat.THETA LE 2) AND (acis_cat.SRC_CNTS GE 10) match_xy_analyze, the_match, composite_cat, ANALYSIS_MASK=acis_maskA more careful approach would also mask out inaccurate entries in the slave catalog.
The XSHIFT/YSHIFT offsets you supply affect the coordinates of the objects in the region files that match_xy produces, and affect the X and Y columns found in the catalogs that match_xy produces.
World coordinates found in the original catalogs (e.g. columns RA and DEC) are NOT altered.
For example, suppose we match a Chandra-ACIS catalog, serving as master, and a GLIMPSE catalog, serving as slave.
Suppose the coordinate system we use in the catalogs given to match_xy is the standard Chandra tangent plane (SKY) system, in which X is aligned to RA but has the opposite sign, and Y is aligned to DEC with the same sign.
Suppose we found that the GLIMPSE catalog had to be shifted to align to ACIS via XSHIFT/YSHIFT inputs to match_xy.
You will find that the celestial coordinates of GLIMPSE sources in the region files are related to the published GLIMPSE coordinates via:
We strongly recommend that you visualize the matching results using the region files described above, considering whether you're comfortable with the results. Such visual review can often reveal defects or omissions in the catalogs you're using. You should carefully examine all sources marked as ``isolated'' to make sure you do not see obvious matches that were missed due to uncorrected reference frame offsets or unreasonably small position error estimates. Various categories of regions in the ds9 region files are identified using the ds9 tag property. The categories (tag values) can be seen and selected via the ds9 menu selection Region:Groups, making it easy to remove groups you don't want to see (e.g. unused slave sources) or to change the appearance of all the members of a group.
Included in match_xy.pro is a program called match_xy_simulate which can be used to statistically characterize the performance of the matching process, e.g. estimating the number of false matches identified. Such simulations and analysis are described in an appendix to a paper titled ``The Young Stellar Population in M17 Revealed by Chandra'' by Broos et al., 2006.
The program adaptive_density_2d.pro implements a simple adaptive kernel image smoothing algorithm that accounts for variations in the exposure of the observation. Note that this simple algorithm spreads out point sources more than the CIAO tool csmooth because this algorithm lacks the csmooth feature whereby bright pixels are smoothed early on, and then removed from the image so they don't spread to neighboring regions. We have typically used adaptive_density_2d to study diffuse emission in images in which the point sources have been masked.
:
For a particular kernel size,
, the flux at position
is estimated as
is the set of pixels in the ``observed'' region (positive exposure) of the scene,
is the integer-valued image of observed counts, and
is the exposure map.
To derive an estimate of the error on the flux value, we first note that each pixel in our observed counts image,
, is of course a sample from an unknown underlying Poisson distribution associated with that pixel. We make the simplifying assumption that all the pixels under the kernel have the same true flux (which we just estimated). By multiplying that flux by the pixel's exposure value, we then compute the mean number of counts that should be observed in each pixel if the image was repeatedly observed
The flux and flux error are used to define a simple ``significance'' value,
![\begin{eqnarray*}
significance & = & \frac{flux}{error} \\
& = & \frac{\sum_{x...
... \in OB} \sum_{y \in OB} [k(R,x-X,y-Y)]^2 flux(X,Y) emap(x,y)}}
\end{eqnarray*}](img45.png)
increases since more counts are encompassed under the kernel. The observer supplies a significance threshold, and at each position
the flux is estimated using the smallest kernel size,
, that produces a significance exceeding the threshold. Thus, regions of low flux are more heavily smoothed than regions of higher flux. A map of the kernel sizes chosen is returned by the smoothing program.
Application of the kernel to the exposure map in equation (1) is required for accurate smoothing of any wide-field ACIS image because the exposure varies significantly due to chip gaps and bad regions of the detector, as shown in the example exposure map in Figure 13. The algorithm naturally handles unobserved regions, such as areas explicitly masked by the observer and areas outside the field of view of ACIS, because those zero-count and zero-exposure pixels do not affect either the flux or flux error computations. As you would expect, an unobserved region tends to cause neighboring observed regions to be more heavily smoothed since a larger kernel must be used to find an adequate number of counts. Note that the algorithm will estimate a flux for unobserved pixels (zero exposure) using the nearby observed regions. This is of course a form of interpolation.
The smoothing program recognizes a negative value in the exposure map as a flag specifying that the position is ``off-field'', meaning that no flux estimate should be computed there. We flag unobserved pixels in this way for two purposes:
dmcopy "foo.img [exclude sky=region(mask.reg)][opt full]" foo_masked.img dmcopy "foo.emap[exclude sky=region(mask.reg)][opt full]" foo_masked.emap
The next step would be to consider what resolution (image size) is desirable for the final smoothed flux image. In this example we're studying diffuse emission, so 1024x1024 resolution is excessive. In any case only the largest monitors can display such an image at full resolution, and a figure in a paper could never render such high resolution. Thus, in this example, we will rebin (in IDL) to 512x512:
img = readfits('foo_masked.img')
emap = readfits('foo_masked.emap')
full_emap = readfits('foo.emap')
img = fix(frebin(img, 512,512,/TOTAL))
emap = frebin(emap, 512,512,/TOTAL)
full_emap = frebin(full_emap,512,512,/TOTAL)
If the image scene includes the field edges, where the exposure tapers off to zero due to dither, we would typically declare pixels with low exposure, say
nominal, to be "off-field" to prevent pixels with very small relative exposure from producing artifacts in the flux image.
off_field = where(full_emap LT (max(full_emap)/10.)) img [off_field] = 0 emap[off_field] = -1
The call to adaptive_density_2d.pro, for a Gaussian kernel, would then look like this:
adaptive_density_2d, img, min_significance, EMAP=emap, /GAUSS, $
flux_map, error_map, radius_map
The three output images (flux_map, error_map, radius_map) contain the flux, flux error, and kernel sizes. The minimum significance parameter controls the smoothness of the flux map produced. The optional keyword MAX_RADIUS can be used if the default maximum kernel size parameter (40) is not suitable. If the Epanechnikov kernel is desired, supply /EPAN instead of /GAUSS. If the tophat kernel is desired omit both /EPAN & /GAUSS.
In the example above the exposure map in the masked regions contained the value zero. As described above, flux values are computed for those pixels using whatever kernel is required to find enough counts. If this sort of interpolation over the holes is not desired we can flag the holes as ``off-field'' so they will be retained in the smoothed image:
off_field = where(emap EQ 0) img [off_field] = 0 emap[off_field] = -1
Alternatively, we might want to smooth over the edges of the holes, leaving the centers of the larger holes off-field:
on_field = (emap GT 0) off_field = where(smooth(float(on_field),3, /EDGE) EQ 0) img [off_field] = 0 emap[off_field] = -1