INDEX
    Explanations

    references to various layers in a context, likely related to structure or hierarchy

    New Auto-Interp
    Negative Logits
     Goodwin
    -0.53
    Com
    -0.50
     OnInit
    -0.49
    habitude
    -0.47
    Credit
    -0.45
     Phelps
    -0.45
    GOOD
    -0.44
    ußt
    -0.44
    com
    -0.44
     przys
    -0.43
    POSITIVE LOGITS
     Layer
    1.37
     layer
    1.30
    layer
    1.24
    Layer
    1.23
     LAYER
    1.22
     Layers
    1.14
     layers
    1.06
    layers
    1.00
    Layers
    0.99
    LAYER
    0.95
    Act Density 0.019%

    No Known Activations