INDEX
Explanations
words that indicate significance or value in various contexts
New Auto-Interp
Negative Logits
deo
-0.15
ода
-0.15
urge
-0.15
$GLOBALS
-0.14
izen
-0.14
lush
-0.14
ughs
-0.14
owell
-0.14
лаÑģÑĤи
-0.14
uw
-0.14
POSITIVE LOGITS
enough
0.32
ly
0.23
Enough
0.22
ness
0.20
ities
0.17
ITY
0.17
mente
0.17
lier
0.17
NESS
0.17
Enough
0.16
Activations Density 0.716%