INDEX
Explanations
pronouns and indicators of quantity or presence
New Auto-Interp
Negative Logits
ogg
-0.15
Stan
-0.15
xfb
-0.14
reet
-0.14
",-
-0.14
Pou
-0.14
رÛĮÙħ
-0.14
пода
-0.14
isque
-0.14
quo
-0.14
POSITIVE LOGITS
848
0.18
ideo
0.16
imir
0.15
aset
0.15
acon
0.15
aths
0.14
Salv
0.14
ideos
0.14
yb
0.14
raid
0.14
Activations Density 0.001%