INDEX
Explanations
political and societal manipulation and control-related phrases
New Auto-Interp
Negative Logits
ITNESS
-0.75
ifted
-0.58
VIDIA
-0.57
Canaver
-0.56
uana
-0.55
utz
-0.55
confir
-0.54
largeDownload
-0.51
omorphic
-0.50
ertodd
-0.50
POSITIVE LOGITS
itch
0.67
onto
0.65
ebted
0.64
into
0.63
into
0.63
prematurely
0.62
til
0.60
alike
0.57
goodbye
0.57
onto
0.57
Activations Density 0.895%