INDEX
Explanations
specific phonetic or morphological patterns in words
New Auto-Interp
Negative Logits
ÌĢ
-0.21
Toy
-0.16
archy
-0.16
Ìģ
-0.16
Toy
-0.16
ória
-0.15
%+
-0.15
Ìģt
-0.15
odd
-0.15
_alias
-0.15
POSITIVE LOGITS
ulo
0.22
coles
0.21
icas
0.20
culos
0.20
culo
0.20
ULO
0.19
nicas
0.19
lico
0.19
rico
0.18
frica
0.17
Activations Density 0.026%