Engineering Concerns of Build Systems

Many of the engineering concerns revolve around a few central areas:

  • Making it easier to write correct builds in the build system DSL

  • Making build systems more performant

  • Making build systems more responsive to users

  • Making build systems more ammenable to interactive use

  • Untracked dependencies

  • Non-determinism

  • Volatility (rules that change every build, eg phony rules)

  • key-dependent value types allow for more type safe representation of complicated tasks such as oracles

  • tasks that produce multiple outputs

  • self tracking: recomputing tasks if the task itself changes

  • build systems with file watch modes as optimization at scale.

    • some tasks may become obsolete and require canceling/restarting
    • concurrent modifications of build outputs can introduce errors
  • propagation of changes:

    • pull vs push model 1
  • query based architecture, ala 2

  • underlying file system

    • changing the file system can make certain operations much more expensive or cheaper to perform.

Additionally, how do you get really performant incremental builds in a CI system? Google bazel does this along with google’s internal CI/CD by basically never invalidating its cache, building only remotely, and having the latest code available. It’s not even close to “stateless”.

Specific to cloud implementations, additional concerns are:

  • communication: sending minimum amount of information. Optimize with respect to build system invariants.
  • offloading: building remotely with remote builders
  • eviction: how to purge old items from the store.
  • shallow builds: building an end target without materializing intermediate values.
    • Have a weaker correctness property: Instead of demanding all keys reachable from target match. Only require matches for input keys + target
    • buck allows for maximum shallowness, being deep constructive tracing (like nix), this has the advantage that you can get any build result from buck in one API call.

Remote execution and cloud shake (remote/shared caches) are different[^cddiff]:

A rule defines an action. An action can execute multiple commands. RE (remote execution) caches and runs at the level of commands. Cloud Shake caches at the level of rules. That means you can have an expensive Haskell computation cached in Cloud Shake, but if its run as an internal part of the build, it wouldn’t be suitable for RE.

Having RE might mean that you don’t care about cloud caches.

Deep constructive traces have incorrect results in the presence of non-determinism whereas other cloud build systems just get fewer cache hits.

further reading: