INDEX
Explanations
references to a political entity called "junta"
references to specific political groups or entities
New Auto-Interp
Negative Logits
js
-0.77
chats
-0.74
chat
-0.68
wom
-0.66
BW
-0.66
compositions
-0.64
maiden
-0.64
forged
-0.64
Mith
-0.63
empath
-0.63
POSITIVE LOGITS
unta
4.74
addafi
1.40
abba
1.18
Lama
1.07
ihadi
1.00
ataka
0.99
hene
0.99
uana
0.96
azeera
0.96
apon
0.94
Activations Density 0.033%