INDEX
Explanations
references to political topics and controversies, particularly mentioning individuals and policies
New Auto-Interp
Negative Logits
..................
-0.73
BT
-0.69
Bomb
-0.68
ÙIJ
-0.67
Spr
-0.66
EEE
-0.64
ð
-0.63
Õ
-0.63
Lauder
-0.63
SHA
-0.62
POSITIVE LOGITS
own
1.01
biggest
1.00
youngest
0.99
inability
0.98
oldest
0.98
successor
0.98
eldest
0.95
newest
0.94
motto
0.93
reputation
0.91
Activations Density 0.134%