Correct extraction of a diffuse source differs from extraction of a point source in a number of important ways.
Observers commonly need to express the observed strength of a diffuse source in terms of surface brightness, for example normalizing a luminosity calculated via XSPEC by some measure of the size of the source on the sky. If the response of the observatory was constant within the selected extraction region, then the appropriate size normalization would simply be the geometric area of the region. However, in a typical ACIS observation the response varies strongly across the extraction region in several ways:
When a diffuse source is extracted in CIAO, this spatially-varying response is abstracted/averaged into a single set of response files (ARF and RMF).
Obviously, the appropriate region size normalization depends on how this average response is calculated, since in the denominator of the final surface brightness expression, e.g.
, the response of the observatory and the size normalization are degenerate (i.e. are multiplied together).
Now, if one averaged the observatory response over the region in the SKY coordinate system,
, (including the effects of bad columns, chip gaps, detector edges, and point source masks) then that multi-ObsId response would account for everything, and the appropriate size normalization would simply be the geometric area of the region.
However, it is important to understand that this is not the algorithm employed by the tool mkwarf.
Instead, mkwarf (through its WMAP input) forms a weighted average of the response of the observatory within a set of cells on the detector.
In this process there is no concept of reduced exposure time arising from dithering over unobserved parts of the focal plane, and there is no concept of point source masking.
The good news is that we have on hand a data product that does represent the ``depth'' of the observation (exposure time
effective area) everywhere, namely the exposure map
.
If the exposure map and event data have had the same point source masking applied, then the integral of the exposure map over the extraction region,
, represents precisely the denominator of the final surface brightness expression, e.g.
, that we seek for the specific mono-energy
.
Given that the ARF produced by mkwarf is the only convenient representation we have for the energy-dependence of the response, a reasonable approach would seem to be to choose any scaling for that ARF and/or an EXPOSURE time and/or a size normalization value such that in the end our extracted spectrum is normalized by
at energy
.
Thus, AE chooses to compute a ``surface brightness ARF'' (designated by the subscript SB) by scaling the ARF produced by mkwarf as follows:

,
The units of the ARF are thus changed from
to
; we have in effect multiplied the ARF by a region size quantity that is appropriate for the algorithm mkwarf used to compute that ARF.
All flux and luminosity quantities derived from XSPEC should then be understood to be in ``per
units''.
The conversion of each extraction's calibration to surface brightness units, described above, is essential when multiple observations are to be combined. Since the observatory's spatial variation in response (the exposure map) in the extraction region can be quite different for each observation, it would obviously be inappropriate (or at least very confusing) to try to merge multiple extractions that lack any size normalization, and then to try to define a single extraction region ``size'' that normalizes the spectral model.
If instead one incorporates the region size concept into each extraction's calibration, then when multiple observations are merged one is adding ``apples to apples''. There is a clear analogy between this practice and the way we handle PSF fractions when multiple point source extractions are merged; in that case since each extraction can have a different PSF fraction, AE chooses to scale each observation's ARF by its PSF fraction.