INDEX
Explanations
personal pronouns referring to individuals
references to specific individuals and their attributes
New Auto-Interp
Negative Logits
CONCLUS
-0.69
ĵĺ
-0.64
dependence
-0.64
guiActiveUnfocused
-0.62
weakening
-0.62
Reward
-0.61
Nationwide
-0.61
Ĥ¬
-0.60
Improvement
-0.59
compromising
-0.59
POSITIVE LOGITS
'd
1.11
zbollah
1.10
'll
1.08
pherd
1.05
joins
1.02
sits
0.99
resy
0.96
owns
0.94
wore
0.94
gemony
0.93
Activations Density 0.250%