INDEX
    Explanations

    phrases indicating ongoing existence or duration

    New Auto-Interp
    Negative Logits
    ansen
    -0.16
    sing
    -0.16
    <!--↵
    -0.15
    ickey
    -0.14
    imson
    -0.14
    ensis
    -0.14
    gua
    -0.14
    ookie
    -0.14
    llib
    -0.14
    sein
    -0.14
    POSITIVE LOGITS
    ÅĻez
    0.15
    theless
    0.15
    ATAL
    0.14
    rier
    0.13
    nin
    0.13
     been
    0.13
    ed
    0.13
    orno
    0.13
     marg
    0.13
    ="../../../
    0.13
    Act Density 0.020%

    No Known Activations