INDEX
Explanations
personal pronouns and verbs related to reasons or explanations
pronouns and references to individuals in the text
New Auto-Interp
Negative Logits
sqor
-0.76
dds
-0.72
ĺħ
-0.68
ILCS
-0.66
oret
-0.63
courtesy
-0.62
FORMATION
-0.61
OSS
-0.61
¶ħ
-0.60
eatures
-0.59
POSITIVE LOGITS
chose
1.23
decided
0.99
bother
0.96
bothered
0.93
opted
0.91
mattered
0.90
Matters
0.88
couldn
0.88
hesitated
0.87
bothers
0.85
Activations Density 0.141%