INDEX
    Explanations

    terms that describe size and magnitude

    New Auto-Interp
    Negative Logits
    2
    -0.34
    1
    -0.34
    3
    -0.30
     explain
    -0.29
     perbuatan
    -0.28
    5
    -0.28
    based
    -0.28
    6
    -0.28
     elkaar
    -0.28
     jums
    -0.28
    POSITIVE LOGITS
    IBOutlet
    0.78
     surla
    0.74
     imagui
    0.73
    0.72
     zwiſchen
    0.72
    ſſung
    0.72
    <unused42>
    0.71
    <unused41>
    0.71
    <pad>
    0.71
    [@BOS@]
    0.71
    Act Density 0.046%

    No Known Activations