INDEX
Explanations
mentions of the specific phrase "Guantánamo"
occurrences of the term "nam," likely in reference to names or specific entities
New Auto-Interp
Negative Logits
Blackwell
-0.64
Engels
-0.63
UID
-0.63
wings
-0.61
understatement
-0.60
Introduced
-0.59
inhib
-0.59
respect
-0.58
Dim
-0.57
cider
-0.57
POSITIVE LOGITS
nam
1.18
ned
0.99
orously
0.97
essage
0.96
ilitary
0.94
borgh
0.94
pered
0.91
emon
0.89
brate
0.88
daq
0.88
Activations Density 0.017%