INDEX
Explanations
mentions of actions performed or decisions made by someone other than the subject
references to accountability or actions taken by others
New Auto-Interp
Negative Logits
olid
-0.73
undown
-0.72
tnc
-0.72
rar
-0.71
Accessory
-0.67
Increases
-0.66
veyard
-0.60
osterone
-0.60
taining
-0.60
irth
-0.60
POSITIVE LOGITS
whom
0.84
who
0.83
worldly
0.81
's
0.75
chooses
0.73
ÃŃs
0.73
chose
0.72
zzi
0.68
fared
0.68
thinks
0.67
Activations Density 0.152%