INDEX
Explanations
locations related to political events
mentions of specific locations or landmarks
New Auto-Interp
Negative Logits
raft
-0.92
betting
-0.75
runaway
-0.74
gambling
-0.71
erm
-0.67
blown
-0.66
withdrawals
-0.65
substance
-0.65
states
-0.65
orem
-0.64
POSITIVE LOGITS
rir
0.91
Square
0.88
Aven
0.76
ãĥĭ
0.75
rio
0.72
thouse
0.70
rence
0.69
vard
0.69
itbart
0.69
Beir
0.68
Activations Density 0.021%