Add projections schema to start doc #179

dylanmcreynolds · 2020-07-03T20:05:51Z

Description

This is a take on another PR related to "techniques", which turned into "projections". #130. I considered trying to commit to that PR, but this seemed cleaner, even though 130 has a lot of critical discussion in it. This really isn't very different from that.

Motivation and Context

I won't got into too much detail here, but the basic thought was to address the need for mapping from a beamline's ontology to one ore more ontologies. I had in mind a case where a beamline is doing a scattering experiment and wants to provide the map to downstream users/application about how to map fields in NXsas fields. with that in mind, I had a few things in mind in the design:

A Projection is a mapping to a vocabulary or ontologies.
The Projections object is collection of potential projections. No inherent limit to how many a beamline decides to project to.
Each Projection is an object with a unique name and a value with fields that map to event_model objects.

I took no position on the names of Projections. For this all to work in, say, the NeXuS case, there will still need to be some sort of agreed format for specifying the fields to do this mapping. But I felt that that "unique string" was all we need for this (uniqueness being enforced by being property names in the Projection object.

EDIT: I also updated compose_run to allow for projections to be written to the root of a run_start doc.

How Has This Been Tested?

Added new test_projecitonpy including at least one failing case (which exposed a nice bug in my initial schema attempt).

dylanmcreynolds · 2020-07-06T16:12:33Z

@tacaswell and @danielballan, I debated adding to #130 vs. a whole new PR, especially since what we came up with was so similar. But if it was better in Tom's PR, then I suppose it's just one commit away.

tacaswell · 2020-07-06T16:38:34Z

event_model/tests/test_em.py

+                "version": "42.0.0",
+                "configuration": {},
+                "projection": {
+                    'Entry/instrument/detector/data': {


I am not sure we want to inherit the "full path" from nexus, but should be looking at what the keys in the NXData groups are.

My understanding is that the "insturment/..." paths in nexus are directly analogous to the bodies of these projections and are still instrument / beam line / facility specific where as we want to map to the NXData definitions which re-map the devices specific mappings to a technique specific mapping.

Right. But I was assuming that the job of this schema was to be able to project from places in the document model (stream/field/etc) to a uniquely-identified spot in a vocabulary, and though NeXuS is what is probably desired, we needed to be open. My example in the test was a bit of a trial balloon on how the mapping might look for NeXuS, but once projections are adopted in event_model, there will still have to be some sort of coordination about string-to-NeXuS field conventions. With NeXuS being XML-ish, I happened to slip into an XPath-ish syntax.

If we want to be more explicit about NeXuS pathing here, that would be more limiting but save some effort in coordination.

prjemian · 2020-07-06T16:51:44Z

event_model/tests/test_em.py

+                "version": "42.0.0",
+                "configuration": {},
+                "projection": {
+                    'Entry/instrument/detector/data': {


/entry/instrument/detector/data

@prjemian ....does this reflect a consensus about how serialize pointers into fields in a NeXuS document? Are there other documented rules? I would love to make this unit test more comprehensive if there are.

Not sure I know how to answer that. So, my answer has become verbose.

First off, the key of the projection dictionary is, I assume, the HDF5 address of the specific item in the NeXus file. The NeXus naming convention is to use lower case. I prepended a / since this should be an absolute address in the file. Following NeXus convention for picking default names of groups [1], the path to a dataset /entry/instrument/detector/data names the data dataset of the NXdetector group (child of the NXinstrument group which is child of the NXentry group which is child of the file root).

[1] default group names in NeXus: For a hypothetical base class of NXzzzz, the default group would be named zzzz or zzzz1, zzzz2 if there are several such. In some NeXus classes, a specific name is required. Mostly in the base classes, the name is more flexible. Use the default naming unless you have a good reason to choose otherwise.

An example is shown in the documentation for a NeXus filewriter callback I've written for the APS. The example shows the raw data stored under /entry/instrument/bluesky. NeXus-style links position this data at various places in the NeXus structure satisfying the expectations of the NeXus base class structure.

So our projection keys would point into the conventional NeXus base class structures. A good test would be:

ingest from a NeXus file into databroker with declared projections

nxwriter = apstools.filewriters.NXWriter()

for key, doc in db[-1]: nxwriter.receiver(key, doc)

compare the new NeXus file with the original

jklynch · 2020-07-07T18:24:39Z

The test_em.py file is very long. I think projections tests should get their very own test file test_projections.py.

dylanmcreynolds · 2020-07-23T18:35:08Z

I changed compose_run to actually allow a projection to be written to the run_start document.

@jklynch While adding a test, moved stuff out of test_em.py and into new test_projections.py

danielballan · 2020-07-28T14:28:18Z

event_model/__init__.py

@@ -1625,7 +1625,7 @@ def compose_descriptor(*, start, streams, event_counter,
        partial(compose_event_page, descriptor=doc, event_counter=event_counter))


-def compose_run(*, uid=None, time=None, metadata=None, validate=True):
+def compose_run(*, uid=None, time=None, metadata=None, validate=True, projections=None):


All other non-required keys enter in through metadata. I think we should either add all of them to the signature or leave projections out of the signature.

I'll back out that change, let you add a projection to metadata.

dylanmcreynolds · 2020-07-30T15:35:10Z

I added calculation fields.

There is a corresponding PR for a projector in databroker that uses these projections: bluesky/databroker#585

dylanmcreynolds · 2020-07-31T16:41:07Z

Andi points out a concern that the the start document is already getting big at some beamlines, and that adding this to it might get in the way of searching.

danielballan · 2020-07-31T17:42:43Z

@dylanmcreynolds Please add mention of this to the documentation, probably in this section, https://blueskyproject.io/event-model/data-model.html#run-start-document and note clearly that this is experimental.

Note: On today's pilot call, we agree to merge this (marked "experimental") by a week from today.

dylanmcreynolds · 2020-07-31T23:46:23Z

@dylanmcreynolds Please add mention of this to the documentation, probably in this section, https://blueskyproject.io/event-model/data-model.html#run-start-document and note clearly that this is experimental.

Note: On today's pilot call, we agree to merge this (marked "experimental") by a week from today.

Committed a brief note there.

danielballan · 2020-08-11T18:56:52Z

We agreed to merge this in order to lower the barrier for people to play with it, on the understanding that we will very likely make backward-incompatible changes as we gain more experience with it.

Just waiting flake8 fixes, then good to go! 🚀

dylanmcreynolds · 2020-08-11T19:55:09Z

I bumped the version of jsonschema from 2 to 3 in the travis run that pegs jsonschema version and it runs. I don't know if this is appropriate or if it's better to remove that test run. @tacaswell , @danielballan ?

danielballan · 2020-08-11T20:26:22Z

[Already covered this on Slack but re-summarizing for the public record:]

The build that pins jsonschema==2 is meant to protect against the problem described in #145, and it should stay until we need some feature in a newer version of jsonschema. I force-pushed to remove the commit that altered the CI configuration. It turns out that jsconschema==2 was finding a minor mistake in the schema that was for whatever reason ignored by jsonschema 3.x. Fixed in 2ca0aec.

Add projections schema to start doc

5d840b5

tacaswell reviewed Jul 6, 2020

View reviewed changes

prjemian reviewed Jul 6, 2020

View reviewed changes

dylanmcreynolds mentioned this pull request Jul 10, 2020

create NeXus file from specific dataset aps-8id-dys/ipython-8idiuser#176

Closed

dylanmcreynolds added 2 commits July 23, 2020 11:30

cleaned linbreaks nad added a description

12e9551

Change start_doc creation, moved and added tests

a3b7b99

add callable

d374efc

danielballan reviewed Jul 28, 2020

View reviewed changes

dylanmcreynolds added 3 commits July 28, 2020 13:38

backed out change to compose_run

7171fe7

simplified test

01ff096

Removed static option for projection

50efe13

dylanmcreynolds mentioned this pull request Jul 30, 2020

Projector implementation bluesky/databroker#585

Merged

removed chagned " back to "configuration".

3fb6a56

Added a note about experimental projections

e71851d

danielballan approved these changes Aug 11, 2020

View reviewed changes

dylanmcreynolds added 3 commits August 11, 2020 12:07

flake8 fixes

41af2e0

flake8 fixes

5f56320

flake8 fixes

2701b09

Do not specify 'projection' as required twice.

2ca0aec

danielballan force-pushed the projections branch from 50be570 to 2ca0aec Compare August 11, 2020 20:18

danielballan merged commit 0c29664 into bluesky:master Aug 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add projections schema to start doc #179

Add projections schema to start doc #179

dylanmcreynolds commented Jul 3, 2020 •

edited

Loading

dylanmcreynolds commented Jul 6, 2020 •

edited

Loading

tacaswell Jul 6, 2020

dylanmcreynolds Jul 6, 2020

prjemian Jul 6, 2020

dylanmcreynolds Jul 6, 2020 •

edited

Loading

prjemian Jul 6, 2020 •

edited

Loading

prjemian Jul 6, 2020

jklynch commented Jul 7, 2020

dylanmcreynolds commented Jul 23, 2020

danielballan Jul 28, 2020

dylanmcreynolds Jul 28, 2020

dylanmcreynolds commented Jul 30, 2020

dylanmcreynolds commented Jul 31, 2020

danielballan commented Jul 31, 2020

dylanmcreynolds commented Jul 31, 2020

danielballan commented Aug 11, 2020

dylanmcreynolds commented Aug 11, 2020

danielballan commented Aug 11, 2020

Add projections schema to start doc #179

Add projections schema to start doc #179

Conversation

dylanmcreynolds commented Jul 3, 2020 • edited Loading

Description

Motivation and Context

How Has This Been Tested?

dylanmcreynolds commented Jul 6, 2020 • edited Loading

tacaswell Jul 6, 2020

Choose a reason for hiding this comment

dylanmcreynolds Jul 6, 2020

Choose a reason for hiding this comment

prjemian Jul 6, 2020

Choose a reason for hiding this comment

dylanmcreynolds Jul 6, 2020 • edited Loading

Choose a reason for hiding this comment

prjemian Jul 6, 2020 • edited Loading

Choose a reason for hiding this comment

prjemian Jul 6, 2020

Choose a reason for hiding this comment

jklynch commented Jul 7, 2020

dylanmcreynolds commented Jul 23, 2020

danielballan Jul 28, 2020

Choose a reason for hiding this comment

dylanmcreynolds Jul 28, 2020

Choose a reason for hiding this comment

dylanmcreynolds commented Jul 30, 2020

dylanmcreynolds commented Jul 31, 2020

danielballan commented Jul 31, 2020

dylanmcreynolds commented Jul 31, 2020

danielballan commented Aug 11, 2020

dylanmcreynolds commented Aug 11, 2020

danielballan commented Aug 11, 2020

dylanmcreynolds commented Jul 3, 2020 •

edited

Loading

dylanmcreynolds commented Jul 6, 2020 •

edited

Loading

dylanmcreynolds Jul 6, 2020 •

edited

Loading

prjemian Jul 6, 2020 •

edited

Loading