INDEX
Explanations
references to specific terrorist organizations and leaders
New Auto-Interp
Negative Logits
Laur
-0.68
livious
-0.66
Poles
-0.65
Wilde
-0.65
Warp
-0.64
EVE
-0.64
crochet
-0.64
erection
-0.62
Victoria
-0.61
Polar
-0.60
POSITIVE LOGITS
azeera
1.27
aeda
1.11
aghd
1.00
abi
0.98
qqa
0.98
awi
0.96
adr
0.95
adi
0.90
aq
0.90
aida
0.89
Activations Density 0.079%