INDEX
Explanations
verbs indicating a threat or impending danger
phrases indicating impending threats or challenges
New Auto-Interp
Negative Logits
verts
-0.86
ials
-0.83
rosse
-0.81
portion
-0.80
xy
-0.80
tes
-0.80
alion
-0.79
%]
-0.76
girl
-0.75
bits
-0.75
POSITIVE LOGITS
doom
1.15
looming
1.00
omin
0.97
looms
0.85
threat
0.82
inev
0.82
deadlines
0.79
catastrophe
0.79
imminent
0.78
instability
0.77
Activations Density 0.052%