INDEX
    Explanations

    negative prefixes and descriptions

    New Auto-Interp
    Negative Logits
    و
    0.74
    на
    0.57
    u
    0.52
    ان
    0.52
     THE
    0.51
    ра
    0.49
    ный
    0.48
    0.47
    ку
    0.46
    RawO
    0.46
    POSITIVE LOGITS
     at
    0.79
     of
    0.54
     on
    0.52
     a
    0.46
    2
    0.44
    ética
    0.43
     about
    0.43
     Schönheit
    0.43
    www
    0.42
     was
    0.42
    Act Density 0.161%

    No Known Activations