INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ܓ
    0.84
    érg
    0.80
    Staying
    0.77
    Were
    0.76
    Took
    0.75
    Feeling
    0.75
     longterm
    0.75
     excellently
    0.74
    long
    0.74
     oblong
    0.74
    POSITIVE LOGITS
     precedent
    0.71
    ;"><
    0.66
     Learn
    0.66
     রাখে
    0.65
     уте
    0.65
    mless
    0.64
    0.63
    вых
    0.63
     смя
    0.63
     нет
    0.63
    Act Density 0.004%

    No Known Activations