Addressing Variability in Derived Parameters
By Elijah Bernstein-Cooper, July 20, 2015, 0 comments.

Variability in Between Code Edits

Within a week of editing the code masking process, the derived parameters have varied wildly between different versions of the code. Specifically, during an iteration with the Lee+12 data (results shown here), the derived width for Perseus is about km/s, whereas in the most recent post, the derived width is on order of km/s.

Explanation of variability

The difference between these two calculations of the parameters stems from a nuanced progression of masking. The order of masking can severely adjust the outcome of the calculated parameters. After some consideration this is the intuitive masking progression I could think of:

  1. Mask all but faintest 10% of pixels.

  2. Calculate a best-fit DGR and intercept for the pixels.

  3. Derive residuals (data - model), fit Gaussian to residuals mag. Mask residuals times the standard deviation of Gaussian.

  4. Increase the fraction of pixels to be unmasked by 1%. Any pixels originally masked by step 3 will remain unmasked. Only pixels not masked by step 3 can be unmasked in this step.

  5. Repeat steps 1 through 4 until the DGR converges to a few percent between iterations.

Find the difference in the code between the two versions of cloudpy here. .

Likelihood Results

We should compare the results for parameters derived with the same Lee+12 data from the earlier post. The following likelihoods are from a run using the new code and the same data. The larger is found again.


Figure 1. - Perseus likelihoods.



Figure 2. - California likelihoods. The tiny width is worrying. In an attempt to address this strange behavior, I performed a 2D background fit on California outlined in this post. This narrow width is because the model is finding a high intercept fits best, while only a small component of emission correlates with the dust. The model believes that there is little dust which is associated with the clouds .

The unmasked region used to fit the model is the diffuse, low- south-east region of California. See later in the post for the progression of masks.



Figure 3. - California likelihoods with a background subtraction. This looks pretty bad. Not sure what is going on here.



Figure 4. - Taurus likelihoods. The model is favoring to include the entire line of sight from Taurus. This seems reasonable, though is completely different from what we were finding earlier.


Masking Results

Below are a progression of masked residual maps and residual histograms for each iteration. ‘mask iter’ refers to each iteration in the masking, ‘parent iter’ refers to an entire run through masking and the MLE calculation. The first mask, ‘mask iter = 0’, for each parent iteration should be the same, since these are only the faintest 10% of the pixels without any other masking applied.


Figure 5. - Perseus masks.



Figure 6. - California masks.



Figure 7. - Taurus masks.


Mask Convergence

We’re testing convergence of the DGR during masking outlined in the last post.


Figure 8. - Perseus parameter convergences.



Figure 8. - California parameter convergences. The DGRs are leveling off, as we would expect.



Figure 8. - Taurus parameter convergences. There is something obviously wrong here. The width in the second iteration, km/s, should yield the same results as the first iteration, since there is no difference between them. This is a bug.


Multiprocessing

The likelihood calculation now includes multiprocessing framework. This speeds up the calculation by around 50% per additional CPU. See this version of cloudpy. Bip has 12 CPUs, so the speed increase is notable.