INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.07
     mám
    -0.06
     Towards
    -0.06
     gözlem
    -0.06
     눈을
    -0.06
     likelihood
    -0.06
    scriber
    -0.06
    014
    -0.06
     Yin
    -0.06
    outil
    -0.06
    POSITIVE LOGITS
     buffalo
    0.16
     Buffalo
    0.15
     Buff
    0.11
    BUFF
    0.10
    buff
    0.10
    Buff
    0.09
     BUFF
    0.09
     buff
    0.08
     Bison
    0.08
    uffled
    0.08
    Act Density 0.002%

    No Known Activations