INDEX
Explanations
instances of direct speech and personal statements
New Auto-Interp
Negative Logits
“
-0.30
âĢŀ
-0.26
(“
-0.23
“[
-0.23
``
-0.19
ãĢĮãģĤ
-0.18
“â̦
-0.18
”
-0.16
ãĢĮ
-0.16
ãĢĮãģĬ
-0.15
POSITIVE LOGITS
bsite
0.31
apons
0.29
gnore
0.28
gether
0.26
crease
0.24
ir
0.23
ory
0.21
adays
0.21
tempts
0.20
ORY
0.19
Activations Density 0.123%