INDEX
    Explanations

    Not plain English

    New Auto-Interp
    Negative Logits
     Substitute
    -0.07
     Interested
    -0.07
    .enemy
    -0.07
     השת
    -0.07
     indispensable
    -0.06
    	Input
    -0.06
     הש
    -0.06
     eligible
    -0.06
     Herb
    -0.06
     dolore
    -0.06
    POSITIVE LOGITS
     breaking
    0.07
    .“
    0.06
    0.06
    步伐
    0.06
    心得
    0.06
    0.06
    0.06
    lose
    0.06
     formas
    0.06
    0.06
    Act Density 0.000%

    No Known Activations