Hack Codegen is a library for generating Hack code and writing it into signed files that prevent undesired modifications. Being able to generate code through automated code generation allows programmers to increase the level of abstraction by making frameworks that are declarative and that are translated into high-quality Hack code. We’ve been using Hack Codegen at Facebook for a while. After seeing so much internal success, we open-sourced this library so that more people could take advantage of it.
Motivation
We’ll detail what the code generation library can do and what we tried to solve for below. But first, a bit about our motivation for creating Hack Codegen. Before Hack Codegen was developed, we mainly generated code through concatenating strings and a few helper functions. On the product infrastructure team at Facebook, we had been looking into how to improve one of our internal systems for reading and writing data.
On the reading part, we had classes for each type of object, defining getters for the fields, such as:
class UserNode extends Node {
public function getName(): string {
return $this->data['name'];
}
public function getBirthdate(): ?int{
return $this->data['birthdate'] === 0 ? null : $this->data['birthdate'];
}
}
Each class had a loader that was relatively easy to define and that took care of accessing the data storage and populating the data array.
The counterpart was a mutator:
class UserMutator extends NodeMutator {
public function getFields(): array {
return array(
'Name' => string_field('name'),
'Birthdate' => timestamp_field('birthdate')->optional(),
);
}
}
Then, the mutator was used as follows:
UserMutator::create()
->setName('Jack Smith')
->setBirthday($day)
->save();
Notice that the mutator class didn’t have the setters. Instead, the mutator class was overriding the __call
method and defining the behavior for the setters. While we liked the approach of the user declaring the fields in the mutator, the way it was implemented with __call
had some disadvantages, such as:
- It was not typed, and with the advent of Hack, we wanted to have types.
- The actual setter method was not defined. If you searched the codebase, you wouldn’t find it. If you looked at the class, you couldn’t see which methods you could call, which made it harder to discover other calls, such as ifNotNullSetName.
- IDEs can’t autocomplete the code.
Also, having UserNode and UserMutator as two completely independent classes but still highly coupled didn’t seem right. Even more, when someone wanted to write a new type of object, it involved writing a few more classes, such as tests. This created a high entry barrier for creating new types of objects, since there was too much boilerplate.
We decided that it needed to be improved. We liked the declarative style of the mutator, but we wanted to have explicit code. Also, there were a lot of nodes and mutators already defined, so we didn’t want to rewrite all of those.
The solution we came up with was to write a higher-level abstraction, a schema, that would hold a detailed description of the object type. Then, we could write a script that would generate the node, mutator, loader, tests, etc., directly from the schema, as well as to set up the storage (e.g., MySQL db). For example, this could be the user schema:
class UserSchema extends NodeSchema {
protected function getFields(): Map<string, INodeField> {
return Map{
'Name' => string_field('name')
->description('Full name of the user')
->example('John Smith'),
'Birthdate' => timestamp_field('birthdate')->optional(),
'Gender' => int_enum_field('gender', UserGender::class),
};
}
}
The structure of the schema is similar to the mutator, but it offers more possibilities. For instance, it’s possible to write a description and an example of the field, which will be written automatically in the docblock of the getter method. Also, more fine-grained fields such as enum, URLs, and int/string ids are provided, and the generated code can take care of validation and storage.
Code generation library
We realized early on that we would need a good library to generate code, since concatenating strings to generate code don’t really scale. At the time, we didn’t do that much code generation at FB, mostly dumping values into arrays, so we didn’t have any good tools except for signing files.
This is the need that motivated us to write this library. At the core, we have hack_builder, which deals with the concatenation, new lines, indentation, braces, hack keywords, collections, and more.
For example, this:
hack_builder()
->startForeachLoop('$users', '$id', '$user')
->startIfBlock('$id === $search')
->addReturn('$id')
->endIfBlock()
->endForeachLoop();
. . . generates this code:
foreach ($users as $id => $user) {
if ($id === $search) {
return $id;
}
}
The builder can also deal with breaking lines automatically on certain points when the length is exceeded (see method addWithSuggestedLineBreaks
). On top of that, we defined ways of declaring other common constructs in Hack, such as classes, methods, variables, functions, traits, interfaces, and files. For example, you can define a class with a method as follows:
codegen_class('HelloWorld')
->addMethod(
codegen_method('sayHi')
->setBody('echo "hello world\n";')
);
The generated class looks like this:
class HelloWorld {
public function sayHi() {
echo "hello world\n";
}
}
Signed files
We wanted to make sure that engineers didn’t edit signed files, so that we could regenerate code automatically when they change a schema. We could have added comments saying the code must not be changed, but we were afraid they might be overlooked. Instead, we used a library that was already developed at Facebook to sign files. It works by hashing the contents of the file and writing it on the header of the file. Then, it can verify whether the hash matches the contents to know if it was modified, and have tools in place to stop this — with a commit hook, for example.
The header of a signed file looks something like this:
/**
* This file is generated. Do not modify it manually!
* Run php ./scripts/generate_code.php to regenerate
*
* @generated SignedSource<<d6168d52d82d350d4907c1e835f6f2f5>>
*/
However, we wanted to offer more flexibility in some parts of the generated code. For example, a field may need to do post-processing in the getter. One way of solving that was to keep the files completely auto-generated but allow classes to be extended or provide hooks to insert custom code in another file. The other way was to allow for sections of the file to be manually written. We opted for the latter because we thought it would make code easier to read and write. We extended the file signature library to support this by removing the manual sections from the signature. Also, we updated the code generation so that it would keep the manual parts.
Here’s an example of what a manual section looks like:
public function getName(): string {
/* BEGIN MANUAL SECTION User::geName */
return $this->data['name'];
/* END MANUAL SECTION */
}
The section is demarked by the BEGIN and END MANUAL SECTION comments. Notice that the manual section uses an ID to match it with the corresponding section when regenerating the code so that is placed in the same location.
Try Hack Codegen
After seeing so much internal use of Hack Codegen for diverse applications, it’s our pleasure to open-source this library for the external community to use. You’ll find it at https://github.com/facebook/hack-codegen.
The open source version includes DORM as an example, a simplified version of the read/write system described above. In this, you can define your own schema, mapped to a database, and run a script to generate the code to read and write to it.
The Hack Codegen framework was developed at Facebook by Drew Hoskins, Gaurav Kumar, Alejandro Marcu, and Matthieu Martin.