INDEX
Explanations
punctuation marks, particularly periods
New Auto-Interp
Negative Logits
635
-0.17
ãĥ¼ãĥĦ
-0.15
si
-0.14
Č
-0.14
okus
-0.14
vert
-0.14
ÙĦÙħاÙĨ
-0.14
бой
-0.14
rast
-0.14
λιά
-0.13
POSITIVE LOGITS
com
0.27
org
0.21
IRO
0.17
org
0.17
edu
0.16
reta
0.16
Virgin
0.14
;
0.14
Org
0.14
usher
0.14
Activations Density 0.008%