INDEX
Explanations
phrases indicating variability or the existence of exceptions
New Auto-Interp
Negative Logits
heimer
-0.14
ritz
-0.14
@student
-0.14
Offensive
-0.14
hawk
-0.14
581
-0.14
&
-0.14
sk
-0.13
alon
-0.13
ulos
-0.13
POSITIVE LOGITS
conti
0.16
ازÙĦ
0.15
-либо
0.15
Weaver
0.15
ayar
0.15
place
0.15
particular
0.14
bÃŃr
0.14
ardi
0.14
icont
0.14
Activations Density 0.058%