INDEX
Explanations
mentions of potential scenarios with far-reaching effects
phrases related to potential legal or political actions
New Auto-Interp
Negative Logits
ĸļ
-0.74
orest
-0.70
@#&
-0.66
ALWAYS
-0.65
pection
-0.64
istent
-0.64
Detail
-0.63
fmt
-0.62
bey
-0.59
ateurs
-0.59
POSITIVE LOGITS
someday
1.27
anytime
0.93
jeopardy
0.91
blackmail
0.88
potentially
0.86
sooner
0.86
future
0.85
retaliation
0.83
damaging
0.83
triggering
0.82
Activations Density 0.528%