INDEX
Explanations
phrases related to surprising or impactful events
instances of the word "shock" and its variations
New Auto-Interp
Negative Logits
uties
-0.82
amins
-0.82
umbers
-0.75
allery
-0.71
ittees
-0.67
ende
-0.65
ccording
-0.64
ufact
-0.64
oug
-0.63
notor
-0.63
POSITIVE LOGITS
waves
1.11
wave
1.10
absor
1.04
tro
0.97
ingly
0.93
imaru
0.91
mong
0.84
er
0.83
shock
0.81
fully
0.75
Activations Density 0.016%