INDEX
Explanations
phrases related to physical destruction or collapse
New Auto-Interp
Negative Logits
uana
-0.67
pleasant
-0.64
entertained
-0.64
tains
-0.64
76561
-0.62
etch
-0.62
POL
-0.61
flattering
-0.60
alla
-0.58
rouse
-0.58
POSITIVE LOGITS
abruptly
0.92
mysteriously
0.91
miser
0.91
prematurely
0.88
catast
0.88
tragically
0.86
amid
0.81
intest
0.80
unexpectedly
0.80
horribly
0.79
Activations Density 0.267%