INDEX
Explanations
a variety of function words and indicators of grammatical structure
New Auto-Interp
Negative Logits
arrera
-0.17
ÙĨب
-0.15
æ°ı
-0.15
ccion
-0.14
horns
-0.14
esk
-0.14
antage
-0.14
arius
-0.14
ivative
-0.13
guide
-0.13
POSITIVE LOGITS
Dün
0.17
tez
0.14
Dum
0.14
_lite
0.13
dum
0.13
Trinidad
0.13
ancestral
0.13
ØŃاد
0.13
icontrol
0.13
oller
0.13
Activations Density 0.026%