INDEX
    Explanations

    expressions indicating clarity or understanding

    New Auto-Interp
    Negative Logits
    eph
    -0.15
    uant
    -0.15
    iros
    -0.15
    iro
    -0.15
    stad
    -0.14
    OMIC
    -0.14
    yc
    -0.14
    ÏĢιÏĥ
    -0.14
     gleich
    -0.14
    etto
    -0.14
    POSITIVE LOGITS
    -cut
    0.44
    cut
    0.38
    ances
    0.29
    -eyed
    0.28
    headed
    0.25
    Cut
    0.24
    -headed
    0.23
     cut
    0.23
     Cut
    0.23
     rÃłng
    0.23
    Act Density 0.045%

    No Known Activations