INDEX
Explanations
high-frequency words and conjunctions in discourse
New Auto-Interp
Negative Logits
è±
-0.16
Ĥ¹
-0.16
eway
-0.15
alog
-0.15
EGIN
-0.15
fony
-0.14
antee
-0.14
fout
-0.14
ãĤ¹ãĥĨãĤ£
-0.14
Wed
-0.14
POSITIVE LOGITS
otts
0.18
oux
0.15
hausen
0.14
Roses
0.14
allet
0.14
ha
0.14
Moore
0.14
616
0.14
Martins
0.14
éħ
0.14
Activations Density 0.004%