INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    3
    -0.07
     abolish
    -0.07
    aos
    -0.07
     free
    -0.06
     rtn
    -0.06
     fit
    -0.06
    _coin
    -0.06
     rulers
    -0.06
     did
    -0.06
     уб
    -0.06
    POSITIVE LOGITS
     fifty
    0.08
     sixty
    0.08
     twenty
    0.08
     ninety
    0.07
     Floyd
    0.07
    kový
    0.06
     خانو
    0.06
     Twenty
    0.06
    .den
    0.06
     ویژگی
    0.06
    Act Density 0.025%

    No Known Activations