INDEX
Explanations
phrases related to terrorism
references to terror-related events or groups
New Auto-Interp
Negative Logits
bye
-0.69
resso
-0.68
Quartz
-0.66
galitarian
-0.64
rovers
-0.63
Ģ
-0.63
flush
-0.63
Salt
-0.63
lease
-0.61
Rhod
-0.59
POSITIVE LOGITS
istic
1.04
izing
1.04
fully
1.02
ization
0.95
ising
0.88
fulness
0.87
ophobic
0.87
ously
0.87
otropic
0.87
icious
0.86
Activations Density 0.020%