Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/Azure/usql
Browse files Browse the repository at this point in the history
  • Loading branch information
MikeRys committed Jan 11, 2018
2 parents 2329b88 + 70ab327 commit 8b6a38d
Show file tree
Hide file tree
Showing 50 changed files with 1,035 additions and 1,185 deletions.
6 changes: 6 additions & 0 deletions Debugging/U-SQL Error - Error in number of columns.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,12 @@ Even if you required only a subset of columns in the file for your processing, y

*When to not use this*: When you have a much larger number of columns or if you don’t know the full structure of your data and are interested in a small subset.

### Help with Option 1: Let Visual Studio Generate Your EXTRACT Statement

If you'd like help to generate an EXTRACT statement -- and this is very useful when you have many columns of in your input, use the "Cloud Explorer" view in Visual Sutdio to explore your files (View->Cloud Explorer). Drill down into the explorer until you see your input file. Double click on its name to enter the "File Preview." In that window you will find a button called "Create EXTRACT Script." Click on it. A new window opens showing both the file preview and a script starting with "@input =" for you to use. Check the "File Has Header Row" (assuming it does) and the generic "Column_n" field names are replaced with the ideally meaningful header names. Copy the EXTRACT statement into the script you are writing. You may need to change the "FROM" clause to remove the "adl://<server name>" part of the path.

Note that if you checked the "File Has Header Row" box, the very important "skipFirstNRows:1" parameter is added to your "Extractors.Csv()" clause to become "Extractors.Csv(skipFirstNRows:1)". Headers are string values. If any of the columns are judged to be anything but string, you'll see conversion errors when you run the script.

### Option 2: Use the silent option in your extractor to skip mismatched columns

You can specify a silent parameter to your extractor that will skip rows in your file that have a mismatched number of columns. This is ideal for scenarios where you know the number of columns that are supposed to be in the file, but you don’t want one corrupt row to block the extraction of the rest of the data. Use this with caution – while you will get out of the syntax error, you might run into interesting semantic errors. The sample with the silent flag included looks like this :-
Expand Down
8 changes: 5 additions & 3 deletions Examples/AvroExamples/AvroExamples/2-RegisterAssemblies.usql
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
DROP ASSEMBLY IF EXISTS [Microsoft.Hadoop.Avro];
CREATE ASSEMBLY [Microsoft.Hadoop.Avro] FROM @"/Assemblies/Avro/Microsoft.Hadoop.Avro.dll";
DROP ASSEMBLY IF EXISTS [Avro];
CREATE ASSEMBLY [Avro] FROM @"/Assemblies/Avro/Avro.dll";
DROP ASSEMBLY IF EXISTS [Microsoft.Analytics.Samples.Formats];
CREATE ASSEMBLY [Microsoft.Analytics.Samples.Formats] FROM @"/Assemblies/Avro/Microsoft.Analytics.Samples.Formats.dll";
DROP ASSEMBLY IF EXISTS [Newtonsoft.Json];
CREATE ASSEMBLY [Newtonsoft.Json] FROM @"/Assemblies/Avro/Newtonsoft.Json.dll";
CREATE ASSEMBLY [Newtonsoft.Json] FROM @"/Assemblies/Avro/Newtonsoft.Json.dll";
DROP ASSEMBLY IF EXISTS [log4net];
CREATE ASSEMBLY [log4net] FROM @"/Assemblies/Avro/log4net.dll";
7 changes: 4 additions & 3 deletions Examples/AvroExamples/AvroExamples/3-SimpleAvro.usql
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Hadoop.Avro];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
REFERENCE ASSEMBLY [log4net];
REFERENCE ASSEMBLY [Avro];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

DECLARE @input_file string = @"\TwitterStream\{*}\{*}\{*}.avro";
DECLARE @output_file string = @"\output\twitter.csv";
Expand All @@ -14,7 +15,7 @@ DECLARE @output_file string = @"\output\twitter.csv";
partitionid long,
eventenqueuedutctime string
FROM @input_file
USING new Microsoft.Analytics.Samples.Formats.Avro.AvroExtractor(@"
USING new Microsoft.Analytics.Samples.Formats.ApacheAvro.AvroExtractor(@"
{
""type"" : ""record"",
""name"" : ""GenericFromIRecord0"",
Expand Down
16 changes: 6 additions & 10 deletions Examples/AvroExamples/README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,15 @@
# U-SQL Avro Example
This example demonstrates how you can use U-SQL to analyze data stored in Avro files.

## Deploying
The Avro Extractor requires Microsoft.Analytics.Samples.Formats and an updated version of the Microsoft.Hadoop.Avro library which can be found [here](https://github.com/flomader/hadoopsdk).

1. Download the latest version of Microsoft.Hadoop.Avro.zip from [here]( https://github.com/flomader/hadoopsdk/releases).
2. Extract Microsoft.Hadoop.Avro.dll from Microsoft.Hadoop.Avro.zip
3. Clone and open the Microsoft.Analytics.Samples.Formats solution in Visual Studio.
4. Update the reference of the file Microsoft.Hadoop.Avro.dll
5. Build the Microsoft.Analytics.Samples.Formats solution
## Build
1. Open Microsoft.Analytics.Samples.sln in Visual Studio 2017
2. Build the Microsoft.Analytics.Samples solution

### Register assemblies
1. Copy the following files to a directory in Azure Data Lake Store (e.g. \Assemblies\Avro):
1. Copy the following files from your build directory to a directory in Azure Data Lake Store (e.g. \Assemblies\Avro):
* Microsoft.Analytics.Samples.Formats.dll
* Microsoft.Hadoop.Avro.dll
* Avro.dll
* log4net.dll
* Newtonsoft.Json.dll
2. Create a database (e.g. run 1-CreateDB.usql.cs), switch to the new database
3. Check file paths in 2-RegisterAssemblies.usql and update them if necessary
Expand Down
Binary file added Examples/DataFormats/Lib/Avro.dll
Binary file not shown.
Binary file added Examples/DataFormats/Lib/Newtonsoft.Json.dll
Binary file not shown.
Binary file added Examples/DataFormats/Lib/log4net.dll
Binary file not shown.
Loading

0 comments on commit 8b6a38d

Please sign in to comment.