INDEX
Explanations
phrases indicating existence or state of being
New Auto-Interp
Negative Logits
azzi
-0.17
hill
-0.16
ivr
-0.14
aux
-0.14
fir
-0.14
xon
-0.14
wouldn
-0.14
las
-0.13
uya
-0.13
Monroe
-0.13
POSITIVE LOGITS
prung
0.17
OSH
0.15
Pazar
0.15
šit
0.15
.twig
0.14
_PD
0.14
orias
0.14
OMPI
0.14
htub
0.13
enth
0.13
Activations Density 0.008%