In the JavaScript community, engineers share hundreds of thousands of pieces of code so we can avoid rewriting basic components, libraries, or frameworks of our own. Each piece of code may in turn depend on other pieces of code, and these dependencies are managed by package managers. The most popular JavaScript package manager is the npm client, which provides access to more than 300,000 packages in the npm registry. More than 5 million engineers use the npm registry, which sees up to 5 billion downloads every month.

We've used the npm client successfully at Facebook for years, but as the size of our codebase and the number of engineers grew, we ran into problems with consistency, security, and performance. After trying to solve for each issue as it came up, we set out to build a new solution to help us manage our dependencies more reliably. The product of that work is called Yarn — a fast, reliable, and secure alternative npm client.

We're pleased to announce the open source release of Yarn, a collaboration with Exponent, Google, and Tilde. With Yarn, engineers still have access to the npm registry, but can install packages more quickly and manage dependencies consistently across machines or in secure offline environments. Yarn enables engineers to move faster and with confidence when using shared code so they can focus on what matters — building new products and features.

The evolution of JavaScript package management at Facebook

In the days before package managers, it was commonplace for JavaScript engineers to rely on a small number of dependencies stored directly in their projects or served by a CDN. The first major JavaScript package manager, npm, was built shortly after Node.js was introduced, and it quickly became one of the most popular package managers in the world. Thousands of new open source projects were created and engineers shared more code than ever before.

Many of our projects at Facebook, like React, depend on code in the npm registry. However, as we scaled internally, we faced problems with consistency when installing dependencies across different machines and users, the amount of time it took to pull dependencies in, and had some security concerns with the way the npm client executes code from some of those dependencies automatically. We attempted to build solutions around these issues, but they often raised new issues themselves.

Attempts at scaling the npm client

Initially, following the prescribed best practices, we only checked in package.json and asked engineers to manually run npm install. This worked well enough for engineers, but broke down in our continuous integration environments, which need to be sandboxed and cut off from the internet for security and reliability reasons.

The next solution we implemented was to check all of node_modules into the repository. While this worked, it made some simple operations quite difficult. For example, updating a minor version of babel generated an 800,000-line commit that was difficult to land and triggered lint rules for invalid utf8 byte sequences, windows line endings, non png-crushed images, and more. Merging changes to node_modules would often take engineers an entire day. Our source control team also pointed out that our checked-in node_modules folder was responsible for a tremendous amount of metadata. The React Native package.json currently lists just 68 dependencies, but after running npm install the node_modules directory contains 121,358 files.

We made one final attempt to scale the npm client to work with the number of engineers at Facebook and the amount of code that we need to install. We decided to zip the entire node_modules folder and upload it to an internal CDN so that both engineers and our continuous integration systems could download and extract the files consistently. This enabled us to remove hundreds of thousands of files from source control, but made it so engineers needed internet access not just to pull new code, but also to build it.

We also had to work around issues with npm's shrinkwrap feature, which we used to lock down dependency versions. Shrinkwrap files aren't generated by default and will fall out of sync if engineers forget to generate them, so we wrote a tool to verify that the contents of the shrinkwrap file matches what's in node_modules. These files are huge JSON blobs with unsorted keys, though, so changes to them would generate massive, difficult-to-review commits. To mitigate this, we needed to add an additional script to sort all the entries.

Finally, updating a single dependency with npm also updates many unrelated ones based on semantic versioning rules. This makes every change much larger than anticipated, and having to do things like committing node_modules or uploading it to a CDN made the process less than ideal for engineers.

Building a new client

Rather than continue building infrastructure around the npm client, we decided to try looking at the problem more holistically. What if instead we attempted to build a new client that addressed the core issues we were experiencing? Sebastian McKenzie in our London office started hacking on this idea and we quickly became excited about its potential.

As we worked on this, we began speaking with engineers across the industry and found that they faced a similar set of problems and had attempted many of the same solutions, often focused on resolving a single issue at a time. It became obvious that by collaborating on the whole set of problems the community was facing, we could develop a solution that worked for everyone. With the help of engineers from Exponent, Google, and Tilde, we built out the Yarn client and tested and validated its performance on every major JS framework and for additional use cases outside of Facebook. Today, we're excited to share it with the community.

Introducing Yarn

Yarn is a new package manager that replaces the existing workflow for the npm client or other package managers while remaining compatible with the npm registry. It has the same feature set as existing workflows while operating faster, more securely, and more reliably.

The primary function of any package manager is to install some package — a piece of code that serves a particular purpose — from a global registry into an engineer's local environment. Each package may or may not depend on other packages. A typical project could have tens, hundreds, or even thousands of packages within its tree of dependencies.

These dependencies are versioned and installed based on semantic versioning (semver). Semver defines a versioning scheme that reflects the types of changes in each new version, whether a change breaks an API, adds a new feature, or fixes a bug. However, semver relies on package developers not making mistakes — breaking changes or new bugs may find their way into installed dependencies if the dependencies are not locked down.

Architecture

In the Node ecosystem, dependencies get placed within a node_modules directory in your project. However, this file structure can differ from the actual dependency tree as duplicate dependencies are merged together. The npm client installs dependencies into the node_modules directory non-deterministically. This means that based on the order dependencies are installed, the structure of a node_modules directory could be different from one person to another. These differences can cause “works on my machine” bugs that take a long time to hunt down.

Yarn resolves these issues around versioning and non-determinism by using lockfiles and an install algorithm that is deterministic and reliable. These lockfiles lock the installed dependencies to a specific version, and ensure that every install results in the exact same file structure in node_modules across all machines. The written lockfile uses a concise format with ordered keys to ensure that changes are minimal and review is simple.

The install process is broken down into three steps:

  1. Resolution: Yarn starts resolving dependencies by making requests to the registry and recursively looking up each dependency.
  2. Fetching: Next, Yarn looks in a global cache directory to see if the package needed has already been downloaded. If it hasn't, Yarn fetches the tarball for the package and places it in the global cache so it can work offline and won't need to download dependencies more than once. Dependencies can also be placed in source control as tarballs for full offline installs.
  3. Linking: Finally, Yarn links everything together by copying all the files needed from the global cache into the local node_modules directory.

By breaking these steps down cleanly and having deterministic results, Yarn is able to parallelize operations, which maximizes resource utilization and makes the install process faster. On some Facebook projects, Yarn reduced the install process by an order of magnitude, from several minutes to just seconds. Yarn also uses a mutex to ensure that multiple running CLI instances don't collide and pollute each other.

Throughout this entire process, Yarn imposes strict guarantees around package installation. You have control over which lifecycle scripts are executed for which packages. Package checksums are also stored in the lockfile to ensure that you get the same package every single time.

Features

In addition to making installs much faster and more reliable, Yarn has additional features to further simplify the dependency management workflow.

  • Compatibility with both the npm and bower workflows and supports mixing registries.
  • Ability to restrict licenses of installed modules and a means for outputting license information.
  • Exposes a stable public JS API with logging abstracted for consumption via build tools.
  • Readable, minimal, pretty CLI output.

Yarn in production

At Facebook we're already using Yarn in production, and it's been working really well for us. It powers the dependency and package management for many of our JavaScript projects. With each migration we've enabled engineers to build offline and helped speed up their workflow. You can see how install times for Yarn and npm compare on React Native under different conditions, which you can find here.

Getting started

The easiest way to get started is to run:

npm install -g yarn

yarn

The yarn CLI replaces npm in your development workflow, either with a matching command or a new, similar command:

  • npm installyarn

    With no arguments, the yarn command will read your package.json, fetch packages from the npm registry, and populate your node_modules folder. It is equivalent to running npm install.

  • npm install --save <name>yarn add <name>

    We removed the “invisible dependency” behavior of npm install <name> and split the command. Running yarn add <name> is equivalent to running npm install --save <name>.

Future

Many of us came together to build Yarn to solve common problems, and we knew that we wanted Yarn to be a true community project that everyone can use. Yarn is now available on GitHub and we're ready for the Node community to do what it does best: Use Yarn, share ideas, write documentation, support each other, and help build a great community to care for it. We believe that Yarn is already off to a great start, and it can be even better with your help.

Leave a Reply

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy