INDEX
Explanations
words related to surprising or shocking events
references to shock and its effects
New Auto-Interp
Negative Logits
uties
-0.72
allery
-0.71
amins
-0.71
ickr
-0.71
umbers
-0.66
atively
-0.64
arrang
-0.63
misunder
-0.62
ately
-0.62
oreal
-0.60
POSITIVE LOGITS
wave
1.22
waves
1.19
absor
1.09
ingly
0.93
tro
0.92
er
0.90
crow
0.86
imaru
0.81
pson
0.77
mong
0.76
Activations Density 0.033%