Skip to content

Commit

Permalink
Refactor metadata tool to use deps.dev and libraries.io (microsoft#338)
Browse files Browse the repository at this point in the history
  • Loading branch information
scovetta authored Aug 15, 2022
1 parent 071a442 commit dc63339
Show file tree
Hide file tree
Showing 14 changed files with 448 additions and 68 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -351,3 +351,6 @@ MigrationBackup/

# Ionide (cross platform F# VS Code tools) working folder
.ionide/

# Rider
src/.idea
11 changes: 5 additions & 6 deletions PRIVACY.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,11 @@ https://go.microsoft.com/fwlink/?LinkID=824704.

However, OSS Gadget does make outbound network connections to various locations,
including NPM (registry.npmjs.org), NuGet (api.nuget.org), and RubyGems (rubygems.org).
These connections are unauthenticated and made directly to that service. As such,
These connections are generally unauthenticated and made directly to that service. As such,
their privacy policy would apply.

In addition, the health calculator (oss-health.exe) tool queries GitHub using the
public API. In order to do so, it needs an API key, which you would need to
include. This connection is therefore authenticated, between the tool and GitHub, and
as such, the [GitHub privacy policy](https://help.github.com/en/github/site-policy/github-privacy-statement)
would apply.
In addition, the health calculator (oss-health.exe) and metadata (oss-metadata) tools use API keys
to query GitHub and libraries.io. Since this connection is authenticated, the
[GitHub privacy policy](https://help.github.com/en/github/site-policy/github-privacy-statement) and
[Libraries.io privacy policy](https://libraries.io/privacy) would apply.

4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ A list of tools included is below. Click on the name of a tool to go to the wik
* [oss-find-source](https://github.com/microsoft/OSSGadget/wiki/OSS-Find-Source): Attempts to locate the source code (on GitHub, currently) of a given package.
* [oss-find-squats](https://github.com/microsoft/OSSGadget/wiki/OSS-Find-Squats): Identifies potential typo-squatting for a given package.
* [oss-health](https://github.com/microsoft/OSSGadget/wiki/OSS-Health): Calculates health metrics for a given package.
* [oss-metadata](https://github.com/microsoft/OSSGadget/wiki/OSS-Metadata): Normalizes metadata about a package into a common schema.
* [oss-metadata](https://github.com/microsoft/OSSGadget/wiki/OSS-Metadata): Retrieves metadata from deps.dev or libraries.io for a given package.
* [oss-risk-calculator](https://github.com/microsoft/OSSGadget/wiki/OSS-Risk-Calculator): Calculates a metric for risk of using a package.

All OSS Gadget tools accept one or more [Package URLs](https://github.com/package-url/purl-spec) as a way to uniquely identify a package. Package URLs look like `pkg:npm/express` or `pkg:gem/[email protected]`. If you leave the version number off, it implicitly means, "attempt to find the latest version". Using an asterisk (`pkg:npm/express@*`) means "perform the action on all available versions".
Expand Down Expand Up @@ -60,7 +60,7 @@ This will download left-pad into a newly-created directory named `npm-left-pad@1
Each of the programs self-documents information on command line options (`--help`).

### Building from Source
OSS Gadget builds with standard `dotnet build` commands.
OSS Gadget builds with standard `dotnet build` commands and includes tests via `dotnet test`.

See [Building from Source](https://github.com/microsoft/OSSGadget/wiki/Building-from-Source) in the wiki for information on building from source.

Expand Down
41 changes: 20 additions & 21 deletions src/Shared.CLI/OSSGadget.cs
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ protected void DisplayHelp<T>(ParserResult<T> result, IEnumerable<Error> errs)
h.AddPostOptionsLines(GetCommonSupportedHelpTextLines());
return HelpText.DefaultParsingErrorsHandler(result, h);
});
Console.Write(helpText);
Console.Error.Write(helpText);
}

/// <summary>
Expand Down Expand Up @@ -128,10 +128,10 @@ protected void SelectOutput(string outputFile)

public static void ShowToolBanner()
{
Console.WriteLine(OSSGadget.GetBanner());
Console.Error.WriteLine(OSSGadget.GetBanner());
string? toolName = GetToolName();
string? toolVersion = GetToolVersion();
Console.WriteLine($"OSS Gadget - {toolName} {toolVersion} - github.com/Microsoft/OSSGadget");
Console.Error.WriteLine($"OSS Gadget - {toolName} {toolVersion} - github.com/Microsoft/OSSGadget");
}

/// <summary>
Expand Down Expand Up @@ -178,24 +178,23 @@ ____ _____ _____ _____ _ _
protected static string GetCommonSupportedHelpText()
{
string supportedHelpText = @"
The package-url specifier is described at https://github.com/package-url/purl-spec:
pkg:cargo/rand The latest version of Rand (via crates.io)
pkg:cocoapods/AFNetworking The latest version of AFNetworking (via cocoapods.org)
pkg:composer/Smarty/Smarty The latest version of Smarty (via Composer/ Packagist)
pkg:cpan/Apache-ACEProxy The latest version of Apache::ACEProxy (via cpan.org)
pkg:cran/[email protected] Version 0.8.0 of ACNE (via cran.r-project.org)
pkg:gem/rubytree@* All versions of RubyTree (via rubygems.org)
pkg:golang/sigs.k8s.io/yaml The latest version of sigs.k8s.io/yaml (via proxy.golang.org)
pkg:github/Microsoft/DevSkim The latest release of DevSkim (via GitHub)
pkg:hackage/a50@* All versions of a50 (via hackage.haskell.org)
pkg:maven/org.apdplat/deep-qa The latest version of org.apdplat.deep-qa (via repo1.maven.org)
pkg:npm/express The latest version of Express (via npm.org)
pkg:nuget/Newtonsoft.JSON The latest version of Newtonsoft.JSON (via nuget.org)
pkg:pypi/[email protected] Version 1.11.1 of Django (via pypi.org)
pkg:ubuntu/zerofree The latest version of zerofree from Ubuntu (via packages.ubuntu.com)
pkg:vsm/MLNET/07 The latest version of MLNET.07 (from marketplace.visualstudio.com)
pkg:url/[email protected]?url=<URL> The direct URL <URL>
";
The package-url specifier is described at https://github.com/package-url/purl-spec:
pkg:cargo/rand The latest version of Rand (via crates.io)
pkg:cocoapods/AFNetworking The latest version of AFNetworking (via cocoapods.org)
pkg:composer/Smarty/Smarty The latest version of Smarty (via Composer/ Packagist)
pkg:cpan/Apache-ACEProxy The latest version of Apache::ACEProxy (via cpan.org)
pkg:cran/[email protected] Version 0.8.0 of ACNE (via cran.r-project.org)
pkg:gem/rubytree@* All versions of RubyTree (via rubygems.org)
pkg:golang/sigs.k8s.io/yaml The latest version of sigs.k8s.io/yaml (via proxy.golang.org)
pkg:github/Microsoft/DevSkim The latest release of DevSkim (via GitHub)
pkg:hackage/a50@* All versions of a50 (via hackage.haskell.org)
pkg:maven/org.apdplat/deep-qa The latest version of org.apdplat.deep-qa (via repo1.maven.org)
pkg:npm/express The latest version of Express (via npm.org)
pkg:nuget/Newtonsoft.JSON The latest version of Newtonsoft.JSON (via nuget.org)
pkg:pypi/[email protected] Version 1.11.1 of Django (via pypi.org)
pkg:ubuntu/zerofree The latest version of zerofree from Ubuntu (via packages.ubuntu.com)
pkg:vsm/MLNET/07 The latest version of MLNET.07 (from marketplace.visualstudio.com)
pkg:url/[email protected]?url=<URL> The direct URL <URL>\n";
return supportedHelpText;
}

Expand Down
54 changes: 54 additions & 0 deletions src/Shared/Metadata/BaseMetadataSource.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
// Copyright (c) Microsoft Corporation. Licensed under the MIT License.

namespace Microsoft.CST.OpenSource;

using System.Threading.Tasks;
using System.Text.Json;
using PackageUrl;
using Microsoft.Extensions.DependencyInjection;
using System.Net.Http;
using System;
using Polly;
using System.Net;
using Polly.Retry;
using Polly.Contrib.WaitAndRetry;

public abstract class BaseMetadataSource
{
protected static readonly NLog.Logger Logger = NLog.LogManager.GetCurrentClassLogger();

protected HttpClient HttpClient;

public BaseMetadataSource()
{
ServiceProvider serviceProvider = new ServiceCollection()
.AddHttpClient()
.BuildServiceProvider();

var clientFactory = serviceProvider.GetService<IHttpClientFactory>() ?? throw new InvalidOperationException();
HttpClient = clientFactory.CreateClient();
}

public async Task<JsonDocument?> GetMetadataForPackageUrlAsync(PackageURL packageUrl, bool useCache = false)
{
return await GetMetadataAsync(packageUrl.Type, packageUrl.Namespace, packageUrl.Name, packageUrl.Version, useCache);
}
public abstract Task<JsonDocument?> GetMetadataAsync(string packageType, string packageNamespace, string packageName, string packageVersion, bool useCache = false);

/// <summary>
/// Loads a URL and returns the JSON document, using a retry policy.
/// </summary>
/// <param name="uri">The <see cref="Uri"/> to load.</param>
/// <param name="policy">An optional <see cref="AsyncRetryPolicy"/> to use with the http request.</param>
/// <returns>The resultant JsonDocument, or an exception on failure.</returns>
public async Task<JsonDocument> GetJsonWithRetry(string uri, AsyncRetryPolicy<HttpResponseMessage>? policy = null)
{
policy ??= Policy
.Handle<HttpRequestException>()
.OrResult<HttpResponseMessage>(r => (int)r.StatusCode >= 500 || r.StatusCode == HttpStatusCode.TooManyRequests)
.WaitAndRetryAsync(Backoff.DecorrelatedJitterBackoffV2(TimeSpan.FromSeconds(15), retryCount: 5));

var result = await policy.ExecuteAsync(() => HttpClient.GetAsync(uri, HttpCompletionOption.ResponseContentRead));
return await JsonDocument.ParseAsync(await result.Content.ReadAsStreamAsync());
}
}
51 changes: 51 additions & 0 deletions src/Shared/Metadata/DepsDevMetadataSource.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
// Copyright (c) Microsoft Corporation. Licensed under the MIT License.

namespace Microsoft.CST.OpenSource;

using System;
using System.Threading.Tasks;
using System.Text.Json;
using System.Linq;
using System.Collections.Generic;
using Microsoft.CST.OpenSource.PackageManagers;

public class DepsDevMetadataSource : BaseMetadataSource
{
[System.Diagnostics.CodeAnalysis.SuppressMessage("Style", "IDE0044:Add readonly modifier", Justification = "Modified through reflection.")]
public static string ENV_DEPS_DEV_ENDPOINT = "https://deps.dev/_";

public static readonly List<string> VALID_TYPES = new List<string>() {
"npm",
"go",
"maven",
"pypi",
"cargo"
};

public override async Task<JsonDocument?> GetMetadataAsync(string packageType, string packageNamespace, string packageName, string packageVersion, bool useCache = false)
{
var packageTypeEnc = string.Equals(packageType, "golang") ? "go" : packageType;
if (!VALID_TYPES.Contains(packageTypeEnc, StringComparer.InvariantCultureIgnoreCase))
{
Logger.Warn("Unable to get metadata for [{} {}]. Package type [{}] is not supported. Try another data provider.", packageNamespace, packageName, packageType);
}
var packageNamespaceEnc = packageNamespace?.Replace("@", "%40").Replace("/", "%2F");
var packageNameEnc = packageName.Replace("@", "%40").Replace("/", "%2F");

var fullPackageName = string.IsNullOrWhiteSpace(packageNamespaceEnc) ?
$"{packageNameEnc}" :
$"{packageNamespaceEnc}%2F{packageNameEnc}";

// The missing slash in the next line is not a bug.
var url = $"{ENV_DEPS_DEV_ENDPOINT}/s/{packageTypeEnc}/p/{fullPackageName}/v/{packageVersion}";
try
{
return await BaseProjectManager.GetJsonCache(HttpClient, url, useCache);
}
catch(Exception ex)
{
Logger.Warn("Error loading package: {0}", ex.Message);
return null;
}
}
}
81 changes: 81 additions & 0 deletions src/Shared/Metadata/LibrariesIoMetadataSource.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
// Copyright (c) Microsoft Corporation. Licensed under the MIT License.

namespace Microsoft.CST.OpenSource;

using System;
using System.Threading.Tasks;
using System.Text.Json;
using System.Linq;
using System.Collections.Generic;

public class LibrariesIoMetadataSource : BaseMetadataSource
{
[System.Diagnostics.CodeAnalysis.SuppressMessage("Style", "IDE0044:Add readonly modifier", Justification = "Modified through reflection.")]
public static string ENV_LIBRARIES_IO_ENDPOINT = "https://libraries.io/api";
public static string? ENV_LIBRARIES_IO_API_KEY = null;

// Reload periodically from https://libraries.io/api/platforms
// curl https://libraries.io/api/platforms | jq '.[].name' | sed 's/[A-Z]/\L&/g' | sed 's/$/,/g' | sort | sed '$ s/.$//'
public static readonly List<string> VALID_TYPES = new List<string>() {
"alcatraz",
"bower",
"cargo",
"carthage",
"clojars",
"cocoapods",
"conda",
"cpan",
"cran",
"dub",
"elm",
"go",
"hackage",
"haxelib",
"hex",
"homebrew",
"inqlude",
"julia",
"maven",
"meteor",
"nimble",
"npm",
"nuget",
"packagist",
"pub",
"puppet",
"purescript",
"pypi",
"racket",
"rubygems",
"swiftpm"
};

public override async Task<JsonDocument?> GetMetadataAsync(string packageType, string packageNamespace, string packageName, string packageVersion, bool useCache = false)
{
var packageTypeEnc = string.Equals(packageType, "golang") ? "go" : packageType;
if (!VALID_TYPES.Contains(packageTypeEnc, StringComparer.InvariantCultureIgnoreCase))
{
Logger.Warn("Unable to get metadata for [{} {}]. Package type [{}] is not supported. Try another data provider.", packageNamespace, packageName, packageType);
}

var apiKey = ENV_LIBRARIES_IO_API_KEY != null ? $"apiKey={ENV_LIBRARIES_IO_API_KEY}" : "";
var packageNamespaceEnc = packageNamespace?.Replace("@", "%40").Replace("/", "%2F");
var packageNameEnc = packageName.Replace("@", "%40").Replace("/", "%2F");

var fullPackageName = string.IsNullOrWhiteSpace(packageNamespaceEnc) ?
$"{packageNameEnc}" :
$"{packageNamespaceEnc}%2F{packageNameEnc}";

var url = $"{ENV_LIBRARIES_IO_ENDPOINT}/{packageTypeEnc}/{fullPackageName}?{apiKey}";

try
{
return await GetJsonWithRetry(url);
}
catch(Exception ex)
{
Logger.Warn("Error loading package: {0}", ex.Message);
}
return null;
}
}
72 changes: 72 additions & 0 deletions src/Shared/Metadata/NativeMetadataSource.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
// Copyright (c) Microsoft Corporation. Licensed under the MIT License.

namespace Microsoft.CST.OpenSource;

using System;
using System.Threading.Tasks;
using System.Text.Json;
using PackageUrl;
using System.Linq;
using System.Collections.Generic;
using Microsoft.CST.OpenSource.PackageManagers;
using System.Reflection;

public class NativeMetadataSource : BaseMetadataSource
{
public static readonly List<string> VALID_TYPES = new List<string>();

static NativeMetadataSource()
{
// Dynamically gather the list of valid types based on whether subtypes of
// BaseProjectManager implement GetMetadataAsync.
var projectManagers = typeof(BaseMetadataSource).Assembly.GetTypes()
.Where(t => t.IsSubclassOf(typeof(BaseProjectManager)));

foreach (var projectManager in projectManagers.Where(d => d != null))
{
MethodInfo? method = projectManager.GetMethod("GetMetadataAsync");
if (method != null)
{
var type = projectManager.GetField("Type", BindingFlags.Public | BindingFlags.Static)?.GetValue(null) as string;
if (type != null)
{
VALID_TYPES.Add(type);
}
}
}
VALID_TYPES.Sort();
}

public override async Task<JsonDocument?> GetMetadataAsync(string packageType, string packageNamespace, string packageName, string packageVersion, bool useCache = false)
{
var packageUrl = new PackageURL(packageType, packageNamespace, packageName, packageVersion, null, null);

var packageManager = ProjectManagerFactory.ConstructPackageManager(packageUrl);
if (packageManager != null)
{
try
{
var metadata = await packageManager.GetMetadataAsync(packageUrl);
if (metadata != null)
{
try
{
return JsonDocument.Parse(metadata);
}
catch(Exception ex)
{
Logger.Warn(ex, "Error parsing metadata: {0}", ex.Message);
return JsonSerializer.SerializeToDocument(new Dictionary<string, string>() {
{ "content", metadata }
});
}
}
}
catch(Exception ex)
{
Logger.Warn(ex, "Error retrieving metadata: {0}", ex.Message);
}
}
return null;
}
}
4 changes: 2 additions & 2 deletions src/Shared/nlog.config
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
</targets>

<rules>
<logger name="Microsoft.CST.OpenSource.*" minlevel="Info" maxLevel="Error" writeTo="consoleLog" ruleName="console"/>
<logger name="*" minlevel="Trace" writeTo="fileLog,detailedFileLog" ruleName="fileLog" />
<logger name="Microsoft.CST.OpenSource.*" minlevel="Info" maxLevel="Error" writeTo="consoleLog" ruleName="consoleLog"/>
<logger name="*" minlevel="Trace" writeTo="fileLog" ruleName="fileLog" />
</rules>
</nlog>
Loading

0 comments on commit dc63339

Please sign in to comment.