Merge branch 'master' of https://github.com/Azure/usql

la-trejas · Jan 11, 2018 · 8b6a38d · 8b6a38d
2 parents 2329b88 + 70ab327
commit 8b6a38d
Show file tree

Hide file tree

Showing 50 changed files with 1,035 additions and 1,185 deletions.
diff --git a/Debugging/U-SQL Error - Error in number of columns.md b/Debugging/U-SQL Error - Error in number of columns.md
@@ -52,6 +52,12 @@ Even if you required only a subset of columns in the file for your processing, y
 
 *When to not use this*: When you have a much larger number of columns or if you don’t know the full structure of your data and are interested in a small subset.
 
+### Help with Option 1: Let Visual Studio Generate Your EXTRACT Statement
+
+If you'd like help to generate an EXTRACT statement -- and this is very useful when you have many columns of in your input, use the "Cloud Explorer" view in Visual Sutdio to explore your files (View->Cloud Explorer). Drill down into the explorer until you see your input file. Double click on its name to enter the "File Preview." In that window you will find a button called "Create EXTRACT Script." Click on it. A new window opens showing both the file preview and a script starting with "@input =" for you to use. Check the "File Has Header Row" (assuming it does) and the generic "Column_n" field names are replaced with the ideally meaningful header names. Copy the EXTRACT statement into the script you are writing. You may need to change the "FROM" clause to remove the "adl://<server name>" part of the path.
+
+Note that if you checked the "File Has Header Row" box, the very important "skipFirstNRows:1" parameter is added to your "Extractors.Csv()" clause to become "Extractors.Csv(skipFirstNRows:1)". Headers are string values. If any of the columns are judged to be anything but string, you'll see conversion errors when you run the script.
+
 ### Option 2: Use the silent option in your extractor to skip mismatched columns
 
 You can specify a silent parameter to your extractor that will skip rows in your file that have a mismatched number of columns. This is ideal for scenarios where you know the number of columns that are supposed to be in the file, but you don’t want one corrupt row to block the extraction of the rest of the data. Use this with caution – while you will get out of the syntax error, you might run into interesting semantic errors. The sample with the silent flag included looks like this :-

diff --git a/Examples/AvroExamples/AvroExamples/2-RegisterAssemblies.usql b/Examples/AvroExamples/AvroExamples/2-RegisterAssemblies.usql
@@ -1,6 +1,8 @@
-DROP ASSEMBLY IF EXISTS [Microsoft.Hadoop.Avro];
-CREATE ASSEMBLY [Microsoft.Hadoop.Avro] FROM @"/Assemblies/Avro/Microsoft.Hadoop.Avro.dll";
+DROP ASSEMBLY IF EXISTS [Avro];
+CREATE ASSEMBLY [Avro] FROM @"/Assemblies/Avro/Avro.dll";
 DROP ASSEMBLY IF EXISTS [Microsoft.Analytics.Samples.Formats];
 CREATE ASSEMBLY [Microsoft.Analytics.Samples.Formats] FROM @"/Assemblies/Avro/Microsoft.Analytics.Samples.Formats.dll";
 DROP ASSEMBLY IF EXISTS [Newtonsoft.Json];
-CREATE ASSEMBLY [Newtonsoft.Json] FROM @"/Assemblies/Avro/Newtonsoft.Json.dll";
+CREATE ASSEMBLY [Newtonsoft.Json] FROM @"/Assemblies/Avro/Newtonsoft.Json.dll";
+DROP ASSEMBLY IF EXISTS [log4net];
+CREATE ASSEMBLY [log4net] FROM @"/Assemblies/Avro/log4net.dll";
diff --git a/Examples/AvroExamples/AvroExamples/3-SimpleAvro.usql b/Examples/AvroExamples/AvroExamples/3-SimpleAvro.usql
@@ -1,6 +1,7 @@
 REFERENCE ASSEMBLY [Newtonsoft.Json];
-REFERENCE ASSEMBLY [Microsoft.Hadoop.Avro]; 
-REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; 
+REFERENCE ASSEMBLY [log4net];
+REFERENCE ASSEMBLY [Avro]; 
+REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
 
 DECLARE @input_file string = @"\TwitterStream\{*}\{*}\{*}.avro";
 DECLARE @output_file string = @"\output\twitter.csv";
@@ -14,7 +15,7 @@ DECLARE @output_file string = @"\output\twitter.csv";
         partitionid             long,
         eventenqueuedutctime    string
     FROM @input_file
-    USING new Microsoft.Analytics.Samples.Formats.Avro.AvroExtractor(@"
+    USING new Microsoft.Analytics.Samples.Formats.ApacheAvro.AvroExtractor(@"
     {
       ""type"" : ""record"",
       ""name"" : ""GenericFromIRecord0"",

diff --git a/Examples/AvroExamples/README.md b/Examples/AvroExamples/README.md
@@ -1,19 +1,15 @@
 # U-SQL Avro Example
 This example demonstrates how you can use U-SQL to analyze data stored in Avro files.
 
-## Deploying
-The Avro Extractor requires Microsoft.Analytics.Samples.Formats and an updated version of the Microsoft.Hadoop.Avro library which can be found [here](https://github.com/flomader/hadoopsdk).
-
-1. Download the latest version of Microsoft.Hadoop.Avro.zip from [here]( https://github.com/flomader/hadoopsdk/releases).
-2. Extract Microsoft.Hadoop.Avro.dll from Microsoft.Hadoop.Avro.zip
-3. Clone and open the Microsoft.Analytics.Samples.Formats solution in Visual Studio.
-4. Update the reference of the file Microsoft.Hadoop.Avro.dll
-5. Build the Microsoft.Analytics.Samples.Formats solution
+## Build
+1. Open Microsoft.Analytics.Samples.sln in Visual Studio 2017
+2. Build the Microsoft.Analytics.Samples solution
 
 ### Register assemblies
-1. Copy the following files to a directory in Azure Data Lake Store (e.g. \Assemblies\Avro):
+1. Copy the following files from your build directory to a directory in Azure Data Lake Store (e.g. \Assemblies\Avro):
   * Microsoft.Analytics.Samples.Formats.dll
-  * Microsoft.Hadoop.Avro.dll
+  * Avro.dll
+  * log4net.dll
   * Newtonsoft.Json.dll
 2. Create a database (e.g. run 1-CreateDB.usql.cs), switch to the new database 
 3. Check file paths in 2-RegisterAssemblies.usql and update them if necessary

diff --git a/Examples/DataFormats/Lib/Avro.dll b/Examples/DataFormats/Lib/Avro.dll
diff --git a/Examples/DataFormats/Lib/Newtonsoft.Json.dll b/Examples/DataFormats/Lib/Newtonsoft.Json.dll
diff --git a/Examples/DataFormats/Lib/log4net.dll b/Examples/DataFormats/Lib/log4net.dll