-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add projections schema to start doc #179
Conversation
@tacaswell and @danielballan, I debated adding to #130 vs. a whole new PR, especially since what we came up with was so similar. But if it was better in Tom's PR, then I suppose it's just one commit away. |
event_model/tests/test_em.py
Outdated
"version": "42.0.0", | ||
"configuration": {}, | ||
"projection": { | ||
'Entry/instrument/detector/data': { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we want to inherit the "full path" from nexus, but should be looking at what the keys in the NXData
groups are.
My understanding is that the "insturment/..."
paths in nexus are directly analogous to the bodies of these projections and are still instrument / beam line / facility specific where as we want to map to the NXData definitions which re-map the devices specific mappings to a technique specific mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. But I was assuming that the job of this schema was to be able to project from places in the document model (stream/field/etc) to a uniquely-identified spot in a vocabulary, and though NeXuS is what is probably desired, we needed to be open. My example in the test was a bit of a trial balloon on how the mapping might look for NeXuS, but once projections are adopted in event_model, there will still have to be some sort of coordination about string-to-NeXuS field conventions. With NeXuS being XML-ish, I happened to slip into an XPath-ish syntax.
If we want to be more explicit about NeXuS pathing here, that would be more limiting but save some effort in coordination.
event_model/tests/test_em.py
Outdated
"version": "42.0.0", | ||
"configuration": {}, | ||
"projection": { | ||
'Entry/instrument/detector/data': { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/entry/instrument/detector/data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prjemian ....does this reflect a consensus about how serialize pointers into fields in a NeXuS document? Are there other documented rules? I would love to make this unit test more comprehensive if there are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I know how to answer that. So, my answer has become verbose.
First off, the key of the projection dictionary is, I assume, the HDF5 address of the specific item in the NeXus file. The NeXus naming convention is to use lower case. I prepended a /
since this should be an absolute address in the file. Following NeXus convention for picking default names of groups [1], the path to a dataset /entry/instrument/detector/data
names the data
dataset of the NXdetector group (child of the NXinstrument group which is child of the NXentry group which is child of the file root).
[1] default group names in NeXus: For a hypothetical base class of NXzzzz
, the default group would be named zzzz
or zzzz1
, zzzz2
if there are several such. In some NeXus classes, a specific name is required. Mostly in the base classes, the name is more flexible. Use the default naming unless you have a good reason to choose otherwise.
An example is shown in the documentation for a NeXus filewriter callback I've written for the APS. The example shows the raw data stored under /entry/instrument/bluesky
. NeXus-style links position this data at various places in the NeXus structure satisfying the expectations of the NeXus base class structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So our projection keys would point into the conventional NeXus base class structures. A good test would be:
- ingest from a NeXus file into databroker with declared projections
nxwriter = apstools.filewriters.NXWriter()
for key, doc in db[-1]: nxwriter.receiver(key, doc)
- compare the new NeXus file with the original
The |
I changed compose_run to actually allow a projection to be written to the run_start document. @jklynch While adding a test, moved stuff out of test_em.py and into new test_projections.py |
event_model/__init__.py
Outdated
@@ -1625,7 +1625,7 @@ def compose_descriptor(*, start, streams, event_counter, | |||
partial(compose_event_page, descriptor=doc, event_counter=event_counter)) | |||
|
|||
|
|||
def compose_run(*, uid=None, time=None, metadata=None, validate=True): | |||
def compose_run(*, uid=None, time=None, metadata=None, validate=True, projections=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All other non-required keys enter in through metadata
. I think we should either add all of them to the signature or leave projections
out of the signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll back out that change, let you add a projection to metadata.
I added calculation fields. There is a corresponding PR for a projector in databroker that uses these projections: bluesky/databroker#585 |
Andi points out a concern that the the start document is already getting big at some beamlines, and that adding this to it might get in the way of searching. |
@dylanmcreynolds Please add mention of this to the documentation, probably in this section, https://blueskyproject.io/event-model/data-model.html#run-start-document and note clearly that this is experimental. Note: On today's pilot call, we agree to merge this (marked "experimental") by a week from today. |
Committed a brief note there. |
We agreed to merge this in order to lower the barrier for people to play with it, on the understanding that we will very likely make backward-incompatible changes as we gain more experience with it. Just waiting flake8 fixes, then good to go! 🚀 |
I bumped the version of jsonschema from 2 to 3 in the travis run that pegs jsonschema version and it runs. I don't know if this is appropriate or if it's better to remove that test run. @tacaswell , @danielballan ? |
50be570
to
2ca0aec
Compare
[Already covered this on Slack but re-summarizing for the public record:] The build that pins |
Description
This is a take on another PR related to "techniques", which turned into "projections". #130. I considered trying to commit to that PR, but this seemed cleaner, even though 130 has a lot of critical discussion in it. This really isn't very different from that.
Motivation and Context
I won't got into too much detail here, but the basic thought was to address the need for mapping from a beamline's ontology to one ore more ontologies. I had in mind a case where a beamline is doing a scattering experiment and wants to provide the map to downstream users/application about how to map fields in NXsas fields. with that in mind, I had a few things in mind in the design:
I took no position on the names of Projections. For this all to work in, say, the NeXuS case, there will still need to be some sort of agreed format for specifying the fields to do this mapping. But I felt that that "unique string" was all we need for this (uniqueness being enforced by being property names in the Projection object.
EDIT: I also updated compose_run to allow for projections to be written to the root of a run_start doc.
How Has This Been Tested?
Added new test_projecitonpy including at least one failing case (which exposed a nice bug in my initial schema attempt).