Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add projections schema to start doc #179

Merged
merged 13 commits into from
Aug 11, 2020

Conversation

dylanmcreynolds
Copy link
Contributor

@dylanmcreynolds dylanmcreynolds commented Jul 3, 2020

Description

This is a take on another PR related to "techniques", which turned into "projections". #130. I considered trying to commit to that PR, but this seemed cleaner, even though 130 has a lot of critical discussion in it. This really isn't very different from that.

Motivation and Context

I won't got into too much detail here, but the basic thought was to address the need for mapping from a beamline's ontology to one ore more ontologies. I had in mind a case where a beamline is doing a scattering experiment and wants to provide the map to downstream users/application about how to map fields in NXsas fields. with that in mind, I had a few things in mind in the design:

  • A Projection is a mapping to a vocabulary or ontologies.
  • The Projections object is collection of potential projections. No inherent limit to how many a beamline decides to project to.
  • Each Projection is an object with a unique name and a value with fields that map to event_model objects.

I took no position on the names of Projections. For this all to work in, say, the NeXuS case, there will still need to be some sort of agreed format for specifying the fields to do this mapping. But I felt that that "unique string" was all we need for this (uniqueness being enforced by being property names in the Projection object.

EDIT: I also updated compose_run to allow for projections to be written to the root of a run_start doc.

How Has This Been Tested?

Added new test_projecitonpy including at least one failing case (which exposed a nice bug in my initial schema attempt).

@dylanmcreynolds
Copy link
Contributor Author

dylanmcreynolds commented Jul 6, 2020

@tacaswell and @danielballan, I debated adding to #130 vs. a whole new PR, especially since what we came up with was so similar. But if it was better in Tom's PR, then I suppose it's just one commit away.

"version": "42.0.0",
"configuration": {},
"projection": {
'Entry/instrument/detector/data': {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we want to inherit the "full path" from nexus, but should be looking at what the keys in the NXData groups are.

My understanding is that the "insturment/..." paths in nexus are directly analogous to the bodies of these projections and are still instrument / beam line / facility specific where as we want to map to the NXData definitions which re-map the devices specific mappings to a technique specific mapping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. But I was assuming that the job of this schema was to be able to project from places in the document model (stream/field/etc) to a uniquely-identified spot in a vocabulary, and though NeXuS is what is probably desired, we needed to be open. My example in the test was a bit of a trial balloon on how the mapping might look for NeXuS, but once projections are adopted in event_model, there will still have to be some sort of coordination about string-to-NeXuS field conventions. With NeXuS being XML-ish, I happened to slip into an XPath-ish syntax.

If we want to be more explicit about NeXuS pathing here, that would be more limiting but save some effort in coordination.

"version": "42.0.0",
"configuration": {},
"projection": {
'Entry/instrument/detector/data': {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/entry/instrument/detector/data

Copy link
Contributor Author

@dylanmcreynolds dylanmcreynolds Jul 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prjemian ....does this reflect a consensus about how serialize pointers into fields in a NeXuS document? Are there other documented rules? I would love to make this unit test more comprehensive if there are.

Copy link
Contributor

@prjemian prjemian Jul 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I know how to answer that. So, my answer has become verbose.

First off, the key of the projection dictionary is, I assume, the HDF5 address of the specific item in the NeXus file. The NeXus naming convention is to use lower case. I prepended a / since this should be an absolute address in the file. Following NeXus convention for picking default names of groups [1], the path to a dataset /entry/instrument/detector/data names the data dataset of the NXdetector group (child of the NXinstrument group which is child of the NXentry group which is child of the file root).

[1] default group names in NeXus: For a hypothetical base class of NXzzzz, the default group would be named zzzz or zzzz1, zzzz2 if there are several such. In some NeXus classes, a specific name is required. Mostly in the base classes, the name is more flexible. Use the default naming unless you have a good reason to choose otherwise.

An example is shown in the documentation for a NeXus filewriter callback I've written for the APS. The example shows the raw data stored under /entry/instrument/bluesky. NeXus-style links position this data at various places in the NeXus structure satisfying the expectations of the NeXus base class structure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So our projection keys would point into the conventional NeXus base class structures. A good test would be:

  1. ingest from a NeXus file into databroker with declared projections
  2. nxwriter = apstools.filewriters.NXWriter()
  3. for key, doc in db[-1]: nxwriter.receiver(key, doc)
  4. compare the new NeXus file with the original

@jklynch
Copy link
Contributor

jklynch commented Jul 7, 2020

The test_em.py file is very long. I think projections tests should get their very own test file test_projections.py.

@dylanmcreynolds
Copy link
Contributor Author

I changed compose_run to actually allow a projection to be written to the run_start document.

@jklynch While adding a test, moved stuff out of test_em.py and into new test_projections.py

@@ -1625,7 +1625,7 @@ def compose_descriptor(*, start, streams, event_counter,
partial(compose_event_page, descriptor=doc, event_counter=event_counter))


def compose_run(*, uid=None, time=None, metadata=None, validate=True):
def compose_run(*, uid=None, time=None, metadata=None, validate=True, projections=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All other non-required keys enter in through metadata. I think we should either add all of them to the signature or leave projections out of the signature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll back out that change, let you add a projection to metadata.

@dylanmcreynolds
Copy link
Contributor Author

I added calculation fields.

There is a corresponding PR for a projector in databroker that uses these projections: bluesky/databroker#585

@dylanmcreynolds
Copy link
Contributor Author

Andi points out a concern that the the start document is already getting big at some beamlines, and that adding this to it might get in the way of searching.

@danielballan
Copy link
Member

@dylanmcreynolds Please add mention of this to the documentation, probably in this section, https://blueskyproject.io/event-model/data-model.html#run-start-document and note clearly that this is experimental.

Note: On today's pilot call, we agree to merge this (marked "experimental") by a week from today.

@dylanmcreynolds
Copy link
Contributor Author

@dylanmcreynolds Please add mention of this to the documentation, probably in this section, https://blueskyproject.io/event-model/data-model.html#run-start-document and note clearly that this is experimental.

Note: On today's pilot call, we agree to merge this (marked "experimental") by a week from today.

Committed a brief note there.

@danielballan
Copy link
Member

We agreed to merge this in order to lower the barrier for people to play with it, on the understanding that we will very likely make backward-incompatible changes as we gain more experience with it.

Just waiting flake8 fixes, then good to go! 🚀

@dylanmcreynolds
Copy link
Contributor Author

I bumped the version of jsonschema from 2 to 3 in the travis run that pegs jsonschema version and it runs. I don't know if this is appropriate or if it's better to remove that test run. @tacaswell , @danielballan ?

@danielballan
Copy link
Member

[Already covered this on Slack but re-summarizing for the public record:]

The build that pins jsonschema==2 is meant to protect against the problem described in #145, and it should stay until we need some feature in a newer version of jsonschema. I force-pushed to remove the commit that altered the CI configuration. It turns out that jsconschema==2 was finding a minor mistake in the schema that was for whatever reason ignored by jsonschema 3.x. Fixed in 2ca0aec.

@danielballan danielballan merged commit 0c29664 into bluesky:master Aug 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants