INDEX
Explanations
events involving violence or significant actions
New Auto-Interp
Negative Logits
�
-0.69
REUTERS
-0.65
recently
-0.63
uesday
-0.62
Actress
-0.61
bible
-0.61
Presbyterian
-0.61
Valent
-0.61
James
-0.61
bestselling
-0.61
POSITIVE LOGITS
acters
0.82
elo
0.75
tarians
0.72
swer
0.71
glers
0.70
eters
0.70
sylv
0.68
aez
0.66
rity
0.65
eers
0.64
Activations Density 2.357%