INDEX
Explanations
words related to imminent threats or important issues that are approaching
references to impending situations or threats
New Auto-Interp
Negative Logits
ive
-0.82
hib
-0.82
hibition
-0.80
ilic
-0.80
verts
-0.78
ives
-0.78
ves
-0.77
ilar
-0.74
uties
-0.73
expression
-0.72
POSITIVE LOGITS
omin
1.04
looms
1.02
looming
0.87
abouts
0.85
Pose
0.77
suspic
0.72
eclips
0.72
silhou
0.71
dusk
0.70
inev
0.70
Activations Density 0.020%