INDEX
Explanations
words that express uniqueness or distinctiveness
New Auto-Interp
Negative Logits
censura
-0.67
</em>
-0.66
стма
-0.60
<em>
-0.60
<code>
-0.57
devriez
-0.56
fós
-0.56
shi
-0.53
on
-0.53
ędzy
-0.53
POSITIVE LOGITS
unique
1.92
unique
1.85
UNIQUE
1.85
Unique
1.84
Unique
1.78
UNIQUE
1.73
uniques
1.73
uniqueness
1.66
uniqu
1.65
uniquely
1.55
Activations Density 0.041%