Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

derived columns and merge tables #25

Closed
nsheff opened this issue Jan 25, 2017 · 0 comments
Closed

derived columns and merge tables #25

nsheff opened this issue Jan 25, 2017 · 0 comments
Labels

Comments

@nsheff
Copy link
Contributor

nsheff commented Jan 25, 2017

The release candidate version (v0.4-rc1) has a small issue that can creep up if you have derived columns that require multiple columns variables, plus a merge table. It was first merging the columns, and then doing the derived column variable expansion, which created bad file paths because the merged columns now had spaces inserted randomly throughout.

When a sample has a derived column and is also present in the merge table, there are a couple of different ways looper could proceed. In the event that the derived column uses only a single column variable, there is no problem, but if the derived column uses two sample attributes, then the order of merging becomes relevant. Should the derived column first be derived for each row in the merge table individually, and then second merged into a space delimited string? or should the columns be merged first, and then second, the derived column be constructed from the merged columns?

The way that makes the most sense to me is that the derived column should be populated for each row in the merge table independently, and then these columns should be merged into a space delimited string. This way, files paths are constructed for each entry in the merge table, which usually corresponds to a file. Then the list of files is concatenated into a single string at the end of the merge step. I can't think of a situation where it makes sense to first merge the column, and then derive new columns.

The way around this error in v0.4-rc1 is to include a column for any derived columns that you want populated at the individual road level in the merge table. As long as you include the column in the merge table, they will be derived individually for each row. If they were not included in the merge table however, and were only included in the main sample table, then these columns would be populated based on the already merged columns from the merge table, which is what led to errors.

I have now made a change that will solve the problem in both scenarios. Now derived columns that are not present in the merge table will still be derived individually for each row in the merged table, before being merged. Unit test added.

@nsheff nsheff added the bug label Jan 25, 2017
@nsheff nsheff added this to the 0.4 milestone Jan 25, 2017
@nsheff nsheff closed this as completed in 1121372 Feb 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant