Skip to content

Commit

Permalink
Merge pull request awaregroup#9 from bartbroere/tail-datamashup-docs
Browse files Browse the repository at this point in the history
Add some research notes about tail of DataMashup file
  • Loading branch information
kodonnell authored Aug 20, 2020
2 parents 3304490 + 54d26c0 commit e988ac5
Showing 1 changed file with 13 additions and 1 deletion.
14 changes: 13 additions & 1 deletion converters.py
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,19 @@ class DataMashupConverter(Converter):
- 4 null bytes
- 4 bytes representing little-endian int for length of next xml
- xml of this length
- not sure what the remainder is ...
- the four bytes 16 00 00 00
- a zip End (!) Of Central Directory record (indicated by the bytes 50 4b 05 06)
https://en.wikipedia.org/wiki/Zip_(file_format)#End_of_central_directory_record_(EOCD)
which is a bit surprising in this location, since there's no associated start of the zip file.
After some experiments, Power BI will not work if everything after 16 00 00 00 is omitted,
and also not if everything after 50 4b 05 06 is omitted, claiming the file has been corrupted.
If the tail of the file is replaced with that of a different .pbix file, there are no noticeable
errors in opening the modified .pbix file.
- Some bytes further along in this file, I found the sequence
01 00 00 00 D0 8C 9D DF 01 15 D1 11 8C 7A 00 C0 4F C2 97 EB 01 00 00 00 to be matching across
several different .pbix files. Even longer matches can be found across revisions of the
same .pbix file. Maybe this is metadata about the version of Power BI that was used, and other
metadata, since it seems harmless to transplant everything after the previously mentioned 16 00 00 00.
"""

CONVERTERS = {
Expand Down

0 comments on commit e988ac5

Please sign in to comment.