INDEX
    Explanations

    phrases indicating a generalization or simplification of complex ideas

    New Auto-Interp
    Negative Logits
    272
    -0.18
    orem
    -0.18
    ore
    -0.17
    orc
    -0.16
    446
    -0.16
     simples
    -0.15
    Ñģп
    -0.14
    ween
    -0.14
    ();++
    -0.14
    Stub
    -0.14
    POSITIVE LOGITS
     identical
    0.23
    -ÑĤаки
    0.17
    lesh
    0.16
     unchanged
    0.16
    å°±æĺ¯
    0.16
     же
    0.15
     imposs
    0.15
     speaking
    0.15
     ignored
    0.15
    raison
    0.15
    Act Density 0.053%

    No Known Activations