INDEX
Explanations
pronouns and relative clauses
New Auto-Interp
Negative Logits
addContainerGap
-0.54
udan
-0.52
arische
-0.51
ocyclic
-0.48
kwal
-0.48
peoples
-0.46
hoods
-0.45
OGND
-0.45
یریت
-0.45
PDO
-0.44
POSITIVE LOGITS
########.
1.03
rrggbb
0.79
enumi
0.76
]")]
0.72
autorytatywna
0.68
tdessen
0.68
++]
0.67
woordig
0.67
kaynağından
0.67
'\\;'
0.66
Activations Density 0.186%