This example demonstrates how you can use U-SQL to analyze data stored in Avro files.
The Avro Extractor requires Microsoft.Analytics.Samples.Formats and an updated version of the Microsoft.Hadoop.Avro library which can be found here.
- Download the latest version of Microsoft.Hadoop.Avro.zip from here.
- Extract Microsoft.Hadoop.Avro.dll from Microsoft.Hadoop.Avro.zip
- Clone and open the Microsoft.Analytics.Samples.Formats solution in Visual Studio.
- Update the reference of the file Microsoft.Hadoop.Avro.dll
- Build the Microsoft.Analytics.Samples.Formats solution
- Copy the following files to a directory in Azure Data Lake Store (e.g. \Assemblies\Avro):
- Microsoft.Analytics.Samples.Formats.dll
- Microsoft.Hadoop.Avro.dll
- Newtonsoft.Json.dll
- Create a database (e.g. run 1-CreateDB.usql.cs), switch to the new database
- Check file paths in 2-RegisterAssemblies.usql and update them if necessary
- register the assemblies which have previously been uploaded to ADLS by submitting 2-RegisterAssemblies.usql
- Get an Avro sample file which contains twitter data from here.
- Use the Azure Data Lake Explorer (in Visual Studio, or the Azure Portal) or any other ADLS client to upload twitter.avro to a directory in Azure Data Lake Store (e.g. /TwitterStream/2016/12/twitter.avro)
- Check file paths in 3-SimpleAvro.usql and update them if necessary
- Submit 3-SimpleAvro.usql and wait for the U-SQL to finish.