Noise Threshold for HMI Full-disk Disambiguation

Goal: Construct a mask array which includes only pixels with a well-measured transverse field, while excluding only pixels with a noisy transverse field and isolated pixels with a well-measured transverse field.

All the examples shown here were provided by Yang for 2011.05.09_07:00:00_TAI. I believe the inversion code used to generate these does not include Rebecca's fix from roughly 2012.02.13. According to Yang, the median and rms were computed excluding pixels which were in active region patches, and the fits he provided were made with Chebyshev polynomials.

Before considering the threshold, first consider some properties of the median, the fits to the median, and the rms.

The median (left) and the 15th order fit to the median (middle), scaled between 40 and 180, and the rms (right), scaled between 20 and 80.

Slice across the disk at y=1024 (top) and y=2048 (bottom), showing the median (black), the 15th order fit to the median (blue), the 5th order fit to the median (green) and the rms (purple).

All of these show variations which are not simply a function of the distance from disk center. Subjectively, the 15th order fit appears to capture the variations quite well, whereas the 5th order fit misses some of the variations.

In both the images and the slices, it can be seen that the median is generally smaller immediately to the left of disk center compared to immediately to the right of disk center, while the opposite is true closer to the limb. It can also be seen (note the different scale for the rms image) that the rms is smaller than the median, usually by at least a factor of two, although slightly less close to x=1500. The amplitude of the large scale variations in the rms is also substantially less than the large scale variations in the median.

Scatter plot of the median versus the rms. The correlation coefficient is +0.24. My guess is that this mainly reflect the center to limb variation, where both typically increase, at least close to the limb. There is also a tail to large values of the rms for median values around 100, but it does occurs at a relatively constant value of the median.

Now consider using the median, the fit to the median and the rms to determine a threshold for the disambiguation module. All the images which follow show the line of sight component of the field, saturated at +/-500G. Contours indicate possible mask arrays defining which pixels to disambiguate using simulated annealing. The left column shows blue contours of the chosen threshold itself, while the middle and right columns show red contours of the mask array after eroding by 1 pixel, which eliminates isolated pixels in the mask. The annealing algorithm cannot properly treat isolated pixels, as it uses finite difference to calculate derivatives. The right column is the subarea from pixels 2000:3000 and 1000:2000.

Multiples of the median field strength.

Using the median itself as the threshold (top row, left) includes many noise-dominated pixels in the mask. After eroding (top row, middle), there is still a (weak) large-scale variation present, possibly due to the variations in the RMS, as pointed out by Jesper. In detail (top row, right), many pixels likely dominated by noise can still be seen within the mask.

Using twice the median as the threshold (middle row, left) shows the large-scale variation. However, after eroding (middle row, middle), there is very little indication of it remaining. However, there appear to be some pixels excluded by the eroded mask which may be well-measured (middle row, right).

Using three times the median as the threshold (bottom row, left) shows a weaker large-scale variation. After eroding (bottom row, middle), it is no longer evident. However, there appear to be quite a few pixels excluded by the eroded mask which may be well-measured (bottom row, right).

Multiples of the fit to the median.

Repeat the above analysis using the 15th order fit to the median. For a given multiple of the fit to the median, the large scale variation is less pronounced than when using the median itself.

Sum of fit to the median and RMS.

Repeat using the sum of the fit to the median and the rms. Before eroding, this is similar to using just the median, which is consistent with the typical amplitude of the rms being smaller than the typical amplitude of the median. After eroding, this is similar to eroding twice the median.

Fit to the median plus a constant.

To determine whether the variations in the rms are important, repeat using the sum of the fit to the median and the constant value 50. Before and after eroding, the results are similar to use the sum of the fit to the median and the RMS.

Summary:

Using a fit to the median is better than using the median itself. (Why is there so much variation from one pixel to the next in the median? What was the maximum sample size for computing the median?)
Using the median plus a random noise estimate (either the rms or a simple constant) works better than using a multiple of the median: the result does not obviously show the large scale variations, and, after eroding, it appears to include small areas of well-measured field better than a multiple of the fit to the median alone. (Would using the variance of the field returned by the inversion work for the noise estimate?)
I think this should be repeated for a fit to the field at a single time, rather than a fit to the median, but the fit needs to properly account for strong field areas. If this works, we would not have to rely on long time intervals to compute the median, with an implicit assumption that there are no slow changes in the large scale variations. See below for some results of this.
This needs to be repeated for many more times (values of VR), using the output from the finalized inversion code.
I need to look more carefully at different areas on the disk, to get a handle on how well (or poorly) the threshold is capturing small areas of well-measured field.
I need to look at whether the results for the transverse field are significantly different than the results for the total field shown here.

Comparison of Fits

The following plots compare the 15th order fit to the median (blue) to the 15th order fit to the field itself, excluding active regions patches (red) and excluding active region "blobs" (purple), overlaid on the median (first three images) or the field itself (fourth image). The first four plots are slices at y=1024, y=1400 y=2048 and y=1400; the last image is the difference between the fits, scaled to +/-20, overlaid with contours of the field at 500,1000.

For the most part, the single time fit is within 10 of the fit to the median, but there are some noticeable difference close to the strong field. Perhaps we need to adjust how the strong field areas are removed?

There are also some clear differences very close to the limb, with the single time fit approaching larger values at the limb. This is probably not important for the disambiguation, as I think these pixels are too close to the limb to be disambiguated using simulated annealing.

Single time fit plus constant

The following images are the results of using the fit to the field at a single time plus 50 to construct the mask array. The results are generally similar to using the fit to the median.