INDEX
Explanations
sentences with words related to causing fear or deception
references to public sentiments or perceptions influenced by fear-mongering
New Auto-Interp
Negative Logits
played
-0.62
documented
-0.57
consulting
-0.57
confirmed
-0.56
ital
-0.54
roundup
-0.54
spelled
-0.53
tackle
-0.53
CLASSIFIED
-0.53
occupation
-0.53
POSITIVE LOGITS
into
0.97
into
0.92
Into
0.80
allo
0.75
gull
0.75
selves
0.72
ritical
0.72
minds
0.70
uces
0.69
iets
0.68
Activations Density 0.359%