INDEX
    Explanations

    words that express uniqueness or distinctiveness

    New Auto-Interp
    Negative Logits
     censura
    -0.67
    </em>
    -0.66
    стма
    -0.60
    <em>
    -0.60
    <code>
    -0.57
     devriez
    -0.56
     fós
    -0.56
    shi
    -0.53
    on
    -0.53
    ędzy
    -0.53
    POSITIVE LOGITS
     unique
    1.92
    unique
    1.85
     UNIQUE
    1.85
     Unique
    1.84
    Unique
    1.78
    UNIQUE
    1.73
     uniques
    1.73
     uniqueness
    1.66
     uniqu
    1.65
     uniquely
    1.55
    Act Density 0.041%

    No Known Activations