INDEX
Explanations
personal reasons or motivations behind actions or decisions
pronouns associated with a sense of victimization or blame
New Auto-Interp
Negative Logits
Eleven
-0.66
Millennium
-0.65
ntil
-0.64
Towards
-0.64
jri
-0.61
Around
-0.61
Dayton
-0.60
Voy
-0.58
Opening
-0.58
Deg
-0.58
POSITIVE LOGITS
lacked
1.06
knew
1.03
feared
1.00
dared
0.98
forgot
0.97
couldn
0.94
didn
0.94
lacks
0.91
believe
0.90
wanted
0.90
Activations Density 0.172%