INDEX
Explanations
personal pronouns followed by verbs
references to personal and collective identity
New Auto-Interp
Negative Logits
luster
-0.67
church
-0.66
aston
-0.61
whel
-0.60
Hague
-0.57
math
-0.57
vind
-0.56
Dram
-0.55
anytime
-0.55
guiActive
-0.54
POSITIVE LOGITS
intend
1.05
're
1.04
mean
1.02
entail
1.01
meant
0.99
've
0.98
'd
0.92
ate
0.92
accomplished
0.92
perceive
0.91
Activations Density 0.116%