Refactor/rewrite of extract1d #8802

stscijgbot-jp · 2024-09-18T18:28:03Z

Issue JP-3753 was created on JIRA by Timothy Brandt:

The extract1d software is slow due to a reliance on lists and explicit for loops, it can be difficult to read, and it does not perform optimal extraction. The refactoring described in this ticket will use efficient numpy arrays and linear algebra to preserve the current behavior, improve background fitting and error estimates, and provide functionality for optimal and PSF-based extraction if the appropriate templates are supplied as arrays.

Extract1d will, by default, operate on the same reference file information as it does currently and will produce similar, though not identical, behavior (mostly in the treatment of backgrounds). In the future PSF-based and optimal extraction approaches are planned to be offered (see https://jira.stsci.edu/browse/JP-251).

stscijgbot-jp · 2024-09-24T19:23:04Z

Comment by Timothy Brandt on JIRA:

I made progress on this during my weekend travel. I am attaching a notebook with an initial implementation of a new extract1d approach with most functionality implemented. I would like anyone interested to take a look at this for general comments. There are a couple of things I would like to discuss:

-With a weighted or optimal-type extraction, it is no longer straightforward to keep read noise and photon noise separate. I would like to discuss what to do.- [edit] T{}his is incorrect: it is relatively straightforward to do so.{}

There is a bias in optimal or weighted extractions that it would be very nice to avoid. The notebook includes a demonstration of this bias at the end with a sketch of a suggested approach to avoid it.

More generally, I would like comments on whether this includes all of the basic desired functionality, and whether the code structure is generally satisfactory.[^extract1d_framework.ipynb]

stscijgbot-jp · 2024-09-25T20:38:05Z

Comment by Melanie Clarke on JIRA:

Thanks for the update Timothy Brandt - I'll start taking a look at what you have so far.

stscijgbot-jp · 2024-09-27T19:18:04Z

Comment by Timothy Brandt on JIRA:

Updated notebook to reflect distinction between read noise, photon noise, flat field noise. Updated background noise calculations.[^extract1d_framework 9-27.ipynb]

stscijgbot-jp · 2024-09-30T14:58:05Z

Comment by Timothy Brandt on JIRA:

One more update to implement de-biasing approach to inverse variance weighting.

[^extract1d_framework 9-29.ipynb]

stscijgbot-jp · 2024-09-30T20:38:06Z

Comment by Melanie Clarke on JIRA:

Thanks again, Timothy Brandt . I think I now mostly understand your approach.

It looks like what would be needed, in addition to this engine, is a method to make the 2D spatial profiles, either from a model for PSF-based fitting, or from input parameters/references specifying the box/background location(s). I think Jane is working on the PSF end of that in JP-251, but we may need some additional effort to support the current use case, as well as integration with the step's front end. The format you’re expecting for the input profiles looks fine to me, anyway.

A few comments and questions...

For data handling and formatting:

I think we should default to ignoring NaNs in box extractions, to replicate the current behavior.
Missing/unfittable data in optimal extraction should probably be set to NaN instead of zero in the output.
We will probably still want to output separate variances for read noise, poisson noise, and flat variance components, to replicate current behavior.
For box extractions, it might be useful to update the model returned with a uniform flux in the extraction region, so it’s not all zeros.

Questions about your assumptions:

Why not allow multiple source extractions with the box method? If they don’t overlap, it seems a reasonable thing to do.
What happens with the matrix inversions in the fitted case if the spectral image is large, e.g. NIRSpec ALLSLITS or MOS used as long slit? Is that solution still workable, performance-wise, or might we need a different fitting strategy?

For the debiasing approach for inverse variance weighting… I’m a little skeptical about how well your proposal will work. It seems error prone, given the multiple levels of fitting to the same input data, and given that read noise variance is not necessarily independent of the input flux either. But I agree we should discuss more – it might be worth trying.

stscijgbot-jp · 2024-09-30T21:23:05Z

Comment by Timothy Brandt on JIRA:

Thank you, Melanie Clarke. Your suggestions all seem reasonable and straightforward to implement. And yes, the (difficult) task is to actually derive these spatial profiles. I can do this for the current case of polynomial interval boundaries once we are happy with this end of the implementation, since that it what should play nicely with future use cases. For your questions:

Multiple sources with box extraction will be fine and unproblematic as long as they are well-separated. It will be straightforward to generalize that step. We could add a check that they are indeed non-overlapping (np.prod(np.array(profiles_2d), axis=0) == 0), with either a warning or an error if they do overlap.

The matrix inversions will be fine, since the matrices to invert have a size corresponding to the number of parameters to fit. For a given column, that should never be more than a few. The more expensive part is in the matrix multiplications to create the design matrix. Even here, it scales linearly with the size of the region to extract. If you are fitting a 2048x2048 matrix row-by-row, the total cost should be no more than a couple of seconds. You can check this cost by increasing npixels_crossdisperse; it won't be problematic.

And for the debiasing, I agree that my approach is a bit too simplistic. I think it's probably much better than just using inverse-variance weighting (as is often implied in a PSF-based approach), but is far from perfect. I think that the best approach, which would be very nearly correct, would probably rely on the ramp fitting itself, finding the photon and read noise that would have been reported if the count rate matched the model count rate. I would favor leaving in something like I have now as a placeholder for doing things properly.

stscijgbot-jp · 2024-10-01T14:33:04Z

Comment by Melanie Clarke on JIRA:

Great, thanks - that all sounds good! I can help with the integration when we are ready to put all the pieces together.

stscijgbot-jp · 2024-10-31T20:13:05Z

Comment by Timothy Brandt on JIRA:

Ok, after some delays, I think this is basically ready to go. The 10/31 version of the notebook includes the extract1d functions that produce output of a form very close to the current extract1d. The input to these routines will need to change: the list of polynomial coefficients should be replaced with one or more 2D profiles, one for each source to be extracted.[^extract1d_framework 10-31.ipynb]

stscijgbot-jp · 2024-10-31T20:23:04Z

Comment by Melanie Clarke on JIRA:

Thanks Timothy Brandt - I'll start work on integration.

stscijgbot-jp · 2024-11-05T19:03:07Z

Comment by Melanie Clarke on JIRA:

David Law, Tyler Pauly - there is currently some support in extract_1d for a reference file in FITS format, that specifies an extraction region as an image, where values > 0 indicate a source region and values < 0 indicate a background region. Test coverage for this extraction mode is pretty minimal, and there are no current reference files in CRDS in FITS format. This style of reference file is not documented in the pipeline readthedocs.

The implementation for extraction from image reference files is separate and parallel to the more standard extraction from JSON reference files, and since it is minimally covered by tests and used rarely, I suspect it may be be buggy and out of sync.

In the current refactor, do we need to port forward support for FITS images for extraction aperture definition, or can we retire it?

stscijgbot-jp · 2024-11-05T19:23:06Z

Comment by Tyler Pauly on JIRA:

My vote would be to retire it; it can be re-implemented later if desired, but I have no indication that the current version sees use.

stscijgbot-jp added the Software Affected: extract_1d label Sep 18, 2024

stscijgbot-jp added the team-chartreuse label Sep 30, 2024

stscijgbot-jp added the performance-improvements label Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor/rewrite of extract1d #8802

Refactor/rewrite of extract1d #8802

stscijgbot-jp commented Sep 18, 2024

stscijgbot-jp commented Sep 24, 2024 •

edited

Loading

stscijgbot-jp commented Sep 25, 2024

stscijgbot-jp commented Sep 27, 2024

stscijgbot-jp commented Sep 30, 2024

stscijgbot-jp commented Sep 30, 2024

stscijgbot-jp commented Sep 30, 2024

stscijgbot-jp commented Oct 1, 2024

stscijgbot-jp commented Oct 31, 2024

stscijgbot-jp commented Oct 31, 2024

stscijgbot-jp commented Nov 5, 2024

stscijgbot-jp commented Nov 5, 2024

Refactor/rewrite of extract1d #8802

Refactor/rewrite of extract1d #8802

Comments

stscijgbot-jp commented Sep 18, 2024

stscijgbot-jp commented Sep 24, 2024 • edited Loading

-With a weighted or optimal-type extraction, it is no longer straightforward to keep read noise and photon noise separate. I would like to discuss what to do.- [edit] T{}his is incorrect: it is relatively straightforward to do so.{}

There is a bias in optimal or weighted extractions that it would be very nice to avoid. The notebook includes a demonstration of this bias at the end with a sketch of a suggested approach to avoid it.

stscijgbot-jp commented Sep 25, 2024

stscijgbot-jp commented Sep 27, 2024

stscijgbot-jp commented Sep 30, 2024

stscijgbot-jp commented Sep 30, 2024

stscijgbot-jp commented Sep 30, 2024

stscijgbot-jp commented Oct 1, 2024

stscijgbot-jp commented Oct 31, 2024

stscijgbot-jp commented Oct 31, 2024

stscijgbot-jp commented Nov 5, 2024

stscijgbot-jp commented Nov 5, 2024

stscijgbot-jp commented Sep 24, 2024 •

edited

Loading