INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    foundland
    -1.00
    gulls
    -0.93
    ês
    -0.91
     있어
    -0.91
     cualquier
    -0.90
     любых
    -0.88
     любые
    -0.88
     և
    -0.87
    良いです
    -0.87
     mécanisme
    -0.86
    POSITIVE LOGITS
     things
    1.56
     thing
    1.43
     ones
    1.24
     stuff
    1.15
    1.09
    things
    0.94
    ilibre
    0.93
    weise
    0.93
     şekilde
    0.89
     form
    0.88
    Act Density 0.002%

    No Known Activations