INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ができ
    -0.99
    Én
    -0.92
    Elő
    -0.91
     faca
    -0.90
    Zubereitung
    -0.90
     Unidas
    -0.89
    Descriere
    -0.89
    Русский
    -0.88
    -0.87
    Алексей
    -0.86
    POSITIVE LOGITS
     "$
    1.33
     -
    1.16
    1.08
    $
    1.05
    !
    1.02
     $
    1.00
     ${
    1.00
     "
    1.00
     $-
    0.99
    ${
    0.97
    Act Density 0.012%

    No Known Activations