INDEX
Explanations
quantitative descriptors indicating an increase or comparison
New Auto-Interp
Negative Logits
azzo
-0.17
olle
-0.15
ñana
-0.15
ureau
-0.15
omaly
-0.15
Hlav
-0.14
ours
-0.14
urette
-0.14
ovna
-0.14
riel
-0.14
POSITIVE LOGITS
than
0.21
dern
0.18
-than
0.18
cazzo
0.16
than
0.15
undry
0.15
arness
0.15
handful
0.14
æĸ¹éĿ¢
0.14
/e
0.14
Activations Density 0.062%