INDEX
Explanations
words related to surprising or shocking events
instances of the word "shock" and related concepts
New Auto-Interp
Negative Logits
uties
-0.68
iffin
-0.67
ickr
-0.65
vain
-0.65
allery
-0.63
Starg
-0.62
oreal
-0.62
ial
-0.61
amins
-0.61
umbers
-0.60
POSITIVE LOGITS
waves
1.36
wave
1.33
absor
1.08
ingly
0.94
ा
0.88
er
0.83
tro
0.83
imaru
0.81
crow
0.79
wave
0.77
Activations Density 0.038%