Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proto2schema 1/n: Schema generation #4727

Closed
wants to merge 12 commits into from

Conversation

alxmrs
Copy link
Contributor

@alxmrs alxmrs commented Feb 21, 2020

This is one of multiple PRs that will generate schemas from manifests via protobufs. So far, this creates a copy-and-past BUILD rule using a Kotlin CLI. Later, this will change to actually generate Kotlin jars.

This PR also introduces the first use of Kotlin Poet in the project. Schema generation via this library is exercised in a test. Once @piotrswigon 's efforts to serialize manifests (read: schemas) into protos are accomplished, we'll be able to adapt this CL to generate schemas from manifests end-to-end.

I anticipate that the names for the build macros don't quite fit our project, I'm happy to take suggestions (a nod to you, @csilvestrini).

commit 0339c06
Author: Alex Rosengarten <[email protected]>
Date:   Fri Feb 21 14:06:11 2020 -0800

    Add Kotlinpoet to project
commit 1e8df40
Author: Alex Rosengarten <[email protected]>
Date:   Fri Feb 21 14:44:59 2020 -0800

    updated to internal versions

commit 0339c06
Author: Alex Rosengarten <[email protected]>
Date:   Fri Feb 21 14:06:11 2020 -0800

    Add Kotlinpoet to project
echo("$protoFile --> $outFile")

val bytes = protoFile.readBytes()
val manifest = Manifest.parseFrom(bytes)
Copy link
Collaborator

@piotrswigon piotrswigon Feb 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to sync.

We should also sync about what it is you need from the serialization. Let's maybe all sync on Monday. Current proto is set up for a single recipe - I think it would be best if we all talked together with @csilvestrini.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea a lot. Let's all get on the same page.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do an initial merge now.

class Proto2SchemaTest {

@Test
fun schemaGeneration_singleProperty() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't a test for proto2schema take a proto and produce a schema instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, totally. Here, I'm testing an internal method, since I was waining on a few things (e.g. the schema protos) I am going to have a discussion with Ray today to discuss how the output Kt file should be structured.

return name[0].toLowerCase() + name.substring(1)
}

fun generateSchemas(schemas: List<Schema>): Iterable<PropertySpec> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why you generate schemas from the objects that are already schemas. It would seem more productive to generate them from the proto. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema protos didn't exist at the time that I wrote this PR (I will sync now). I wanted to get started on learning how to do generation in KtPoet asap.
I'll revise this to use the protos that you landed.


return [DefaultInfo(files = depset([out]))]

proto2schema = rule(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why proto2schema? What is the ultimate output of this rule? As far as I understand ProdEx it should be Plans, not schemas. I may however have some wrong ideas. @csilvestrini ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a different story to recipe2plan. I'll chat you the ticket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmandlis can speak to this: the most immediate need with codegeneration is creating schemas (and Types) from manifests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if consolidating the schema and plan generation into a single rule would be a good idea? Are we likely to want to do them separately?


package(default_visibility = ["//visibility:public"])

kt_jvm_library(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If arcs/core/tools is going to consist of command-line JVM-based tooling, it might make sense to create a README.md in this directory to explain that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

import com.squareup.kotlinpoet.PropertySpec
import java.io.File

class Proto2Schema : CliktCommand(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KDoc would be great here, including documentation on how the. tool is intended to be used.

help = """Generates Schemas and Types from a protobuf-serialized manifest.

This script reads schemas from a serialized manifest and generates Kotlin `Schema` and `Type`
classes.""",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use .trimIndent() unless you intend the second paragraph to be indented by 4 spaces when output.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, for readability:

    help = """
        Generates Schemas and Types from a protobuf-serialized manifest.

        This script reads schemas from a serialized manifest and generates Kotlin `Schema` and 
        `Type` implementations.
    """.trimIndent(),

Comment on lines +25 to +29
val outdir by option(help = "output directory; defaults to '.'").file(fileOkay = false)
val outfile by option(help = "output filename; if omitted")
val packageName by option(help = "scope to specified package; default: 'arcs'").default("arcs")
val protos by argument(help = "paths to protobuf-serialized manifests")
.file(exists = true).multiple()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these be private?

Comment on lines +32 to +42
override fun run() {
protos.forEach { protoFile ->
val outFile = outputFile(protoFile)
echo("$protoFile --> $outFile")

val bytes = protoFile.readBytes()
val envelope = RecipeEnvelopeProto.parseFrom(bytes)

outFile.writeBytes(protoFile.readBytes())
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could make this a function-expression:

    override fun run() = protos.forEach { protoFile ->
        val outFile = outputFile(protoFile)
        echo("$protoFile --> $outFile")

        val bytes = protoFile.readBytes()
        val envelope = RecipeEnvelopeProto.parseFrom(bytes)

        outFile.writeBytes(protoFile.readBytes())
    }

}

/** Produces a File object per user specification, or with default values. */
fun outputFile(inputFile: File): File {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be private?

Comment on lines +57 to +59
val schemaClass = ClassName("arcs.core.data", "Schema")
val schemaNameClass = ClassName("arcs.core.data", "SchemaName")
val schemaFieldsClass = ClassName("arcs.core.data", "SchemaFields")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this tools package can depend on //java/arcs/core/data, prefer using Kotlin reflection (will need a new maven library import in the WORKSPACE/third_party for this) and KotlinPoet's .asTypeName:

val schemaClass = Schema::class.createType().asTypeName()

This will allow us to more easily and safely refactor the classes you're using here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could probably make that shorter using an extension function like Schema::class.toTypeName()

val schemaClass = ClassName("arcs.core.data", "Schema")
val schemaNameClass = ClassName("arcs.core.data", "SchemaName")
val schemaFieldsClass = ClassName("arcs.core.data", "SchemaFields")
return schemas.map {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lambda's body is complex enough to warrant its own dedicated function.

return schemas.map(::generateSchemaSpec)

// ...

private fun generateSchemaSpec(schema: Schema): PropertySpec {
    // ...
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would let you test generateSchemaSpec in isolation.

Comment on lines +64 to +132
.addStatement("%T(", schemaClass)
.indent()
.addStatement("listOf(")
.indent()
.apply {
it.names.forEachIndexed { index, name ->
if (index > 0) addStatement(",%T(%S)", schemaNameClass, name.name)
else addStatement("%T(%S)", schemaNameClass, name.name)
}
}
.unindent()
.addStatement("),")
.addStatement("%T(", schemaFieldsClass)
.indent()
.addStatement("singletons = mapOf(")
.indent()
.apply {
val entries = it.fields.singletons.entries
entries.forEachIndexed { index, entry ->
when (entry.value.tag) {
FieldType.Tag.EntityRef -> add(
"%S to %T(%S)",
entry.key,
FieldType.EntityRef::class,
(entry.value as FieldType.EntityRef).schemaHash
)
FieldType.Tag.Primitive -> add(
"%S to %T.%L",
entry.key,
FieldType::class,
(entry.value as FieldType.Primitive).primitiveType
)
}
if (index != entries.size - 1) add(",")
add("\n")
}
}
.unindent()
.addStatement("),")
.addStatement("collections = mapOf(")
.indent()
.apply {
val entries = it.fields.collections.entries
entries.forEachIndexed { index, entry ->
when (entry.value.tag) {
FieldType.Tag.EntityRef -> add(
"%S to %T(%S)",
entry.key,
FieldType.EntityRef::class,
(entry.value as FieldType.EntityRef).schemaHash
)
FieldType.Tag.Primitive -> add(
"%S to %T.%L",
entry.key,
FieldType::class,
(entry.value as FieldType.Primitive).primitiveType
)
}
if (index != entries.size - 1) add(",")
add("\n")
}
}
.unindent()
.addStatement(")")
.unindent()
.addStatement("),")
.addStatement("%S", it.hash)
.addStatement(")")
.build())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is way too much logic to exist in one monolith. Please break this up into smaller, more-teestable and more-readable functions.

Comment on lines +35 to +52
assertThat(schemaProperty.toString()).isEqualTo("""
|val sliceSchema: arcs.core.data.Schema = arcs.core.data.Schema(
| listOf(
| arcs.core.data.SchemaName("Slice")
| ),
| arcs.core.data.SchemaFields(
| singletons = mapOf(
| "num" to arcs.core.data.FieldType.Number,
| "flg" to arcs.core.data.FieldType.Boolean,
| "txt" to arcs.core.data.FieldType.Text
| ),
| collections = mapOf(
| )
| ),
| "f4907f97574693c81b5d62eb009d1f0f209000b8"
| )
|
|""".trimMargin())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be very brittle and hard to adjust as schema-generation evolves over time. Instead - prefer testing the actual contents of the generated specs.

Copy link
Collaborator

@csilvestrini csilvestrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll wait for all of Jason's code style suggestions to be addressed before reviewing the Kotlin files properly


proto2schema(
name = "example_generation",
srcs = [":example"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could consider renaming example to something like example_pb so that it's easier to tell what it generates.


package(default_visibility = ["//visibility:public"])

kt_jvm_library(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

/** Produces a File object per user specification, or with default values. */
fun outputFile(inputFile: File): File {
val outputName = outfile ?: inputFile.nameWithoutExtension + ".kt"
val outputPath = outdir ?: System.getProperty("user.dir")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional, but you could consider not providing default values here. It might be better to opt for slightly more cumbersome but definitely more correct command line args. We don't expect people will be running this tool themselves anyway, they'll use our bazel macros, which will do the right thing.


args.add_all("--outfile", [output_name])
args.add_all("--outdir", [out.dirname])
args.add_all("--package-name", [ctx.attr.package])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these three can use args.add instead of args.add_all


return [DefaultInfo(files = depset([out]))]

proto2schema = rule(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if consolidating the schema and plan generation into a single rule would be a good idea? Are we likely to want to do them separately?

@alxmrs
Copy link
Contributor Author

alxmrs commented Mar 2, 2020

After discussions last week, the strategy for codegeneration has changed. As a result, I'm going to close this PR.

The feedback here will be useful to future PRs, however.

@alxmrs alxmrs closed this Mar 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants