nixpkgs

mirrors/nixpkgs

Fork 1

mirror of https://github.com/NixOS/nixpkgs.git synced 2024-12-09 14:04:56 +00:00

Commit graph

Author	SHA1	Message	Date
Graham Christensen	09362bc3e8	references-by-popularity: cache computation to avoid memory bloat On very large graphs (14k+ paths), we'd end up with a massive in memory tree of mostly duplication. We can safely cache trees and point back to them later, saving memory.	2019-03-05 16:37:52 -05:00
Graham Christensen	54826e7471	references-by-popularity: create debug output	2019-03-05 16:32:06 -05:00
Graham Christensen	fd045173ce	referencesByPopularity: init to sort packages by a cachability heuristic Using a simple algorithm, convert the references to a path in to a sorted list of dependent paths based on how often they're referenced and how deep in the tree they live. Equally-"popular" paths are then sorted by name. The existing writeReferencesToFile prints the paths in a simple ascii-based sorting of the paths. Sorting the paths by graph improves the chances that the difference between two builds appear near the end of the list, instead of near the beginning. This makes a difference for Nix builds which export a closure for another program to consume, if that program implements its own level of binary diffing. For an example, Docker Images. If each store path is a separate layer then Docker Images can be very efficiently transfered between systems, and we get very good cache reuse between images built with the same version of Nixpkgs. However, since Docker only reliably supports a small number of layers (42) it is important to pick the individual layers carefully. By storing very popular store paths in the first 40 layers, we improve the chances that the next Docker image will share many of those layers.* Given the dependency tree: A - B - C - D -\ \ \ \ \ \ \ \ \ \ \ - E ---- F \- G Nodes which have multiple references are duplicated: A - B - C - D - F \ \ \ \ \ \- E - F \ \ \ \- E - F \ \- G Each leaf node is now replaced by a counter defaulted to 1: A - B - C - D - (F:1) \ \ \ \ \ \- E - (F:1) \ \ \ \- E - (F:1) \ \- (G:1) Then each leaf counter is merged with its parent node, replacing the parent node with a counter of 1, and each existing counter being incremented by 1. That is to say `- D - (F:1)` becomes `- (D:1, F:2)`: A - B - C - (D:1, F:2) \ \ \ \ \ \- (E:1, F:2) \ \ \ \- (E:1, F:2) \ \- (G:1) Then each leaf counter is merged with its parent node again, merging any counters, then incrementing each: A - B - (C:1, D:2, E:2, F:5) \ \ \ \- (E:1, F:2) \ \- (G:1) And again: A - (B:1, C:2, D:3, E:4, F:8) \ \- (G:1) And again: (A:1, B:2, C:3, D:4, E:5, F:9, G:2) and then paths have the following "popularity": A 1 B 2 C 3 D 4 E 5 F 9 G 2 and the popularity contest would result in the paths being printed as: F E D C B G A * Note: People who have used a Dockerfile before assume Docker's Layers are inherently ordered. However, this is not true -- Docker layers are content-addressable and are not explicitly layered until they are composed in to an Image.	2018-09-26 15:50:10 -04:00

Author

SHA1

Message

Date

Graham Christensen

09362bc3e8

references-by-popularity: cache computation to avoid memory bloat

On very large graphs (14k+ paths), we'd end up with a massive in
memory tree of mostly duplication.

We can safely cache trees and point back to them later, saving
memory.

2019-03-05 16:37:52 -05:00

Graham Christensen

54826e7471

references-by-popularity: create debug output

2019-03-05 16:32:06 -05:00

Graham Christensen

fd045173ce

referencesByPopularity: init to sort packages by a cachability heuristic

Using a simple algorithm, convert the references to a path in to a
sorted list of dependent paths based on how often they're referenced
and how deep in the tree they live. Equally-"popular" paths are then
sorted by name.

The existing writeReferencesToFile prints the paths in a simple
ascii-based sorting of the paths.

Sorting the paths by graph improves the chances that the difference
between two builds appear near the end of the list, instead of near
the beginning. This makes a difference for Nix builds which export a
closure for another program to consume, if that program implements its
own level of binary diffing.

For an example, Docker Images. If each store path is a separate layer
then Docker Images can be very efficiently transfered between systems,
and we get very good cache reuse between images built with the same
version of Nixpkgs. However, since Docker only reliably supports a
small number of layers (42) it is important to pick the individual
layers carefully. By storing very popular store paths in the first 40
layers, we improve the chances that the next Docker image will share
many of those layers.*

Given the dependency tree:

    A - B - C - D -\
     \   \   \      \
      \   \   \      \
       \   \ - E ---- F
        \- G

Nodes which have multiple references are duplicated:

    A - B - C - D - F
     \   \   \
      \   \   \- E - F
       \   \
        \   \- E - F
         \
          \- G

Each leaf node is now replaced by a counter defaulted to 1:

    A - B - C - D - (F:1)
     \   \   \
      \   \   \- E - (F:1)
       \   \
        \   \- E - (F:1)
         \
          \- (G:1)

Then each leaf counter is merged with its parent node, replacing the
parent node with a counter of 1, and each existing counter being
incremented by 1. That is to say `- D - (F:1)` becomes `- (D:1, F:2)`:

    A - B - C - (D:1, F:2)
     \   \   \
      \   \   \- (E:1, F:2)
       \   \
        \   \- (E:1, F:2)
         \
          \- (G:1)

Then each leaf counter is merged with its parent node again, merging
any counters, then incrementing each:

    A - B - (C:1, D:2, E:2, F:5)
     \   \
      \   \- (E:1, F:2)
       \
        \- (G:1)

And again:

    A - (B:1, C:2, D:3, E:4, F:8)
     \
      \- (G:1)

And again:

    (A:1, B:2, C:3, D:4, E:5, F:9, G:2)

and then paths have the following "popularity":

    A     1
    B     2
    C     3
    D     4
    E     5
    F     9
    G     2

and the popularity contest would result in the paths being printed as:

    F
    E
    D
    C
    B
    G
    A

* Note: People who have used a Dockerfile before assume Docker's
Layers are inherently ordered. However, this is not true -- Docker
layers are content-addressable and are not explicitly layered until
they are composed in to an Image.

2018-09-26 15:50:10 -04:00

3 commits