INDEX
Explanations
words and phrases associated with fear or alarm
New Auto-Interp
Negative Logits
iciency
-0.82
cellence
-0.76
cially
-0.71
ional
-0.70
edded
-0.69
inished
-0.68
arrang
-0.67
odore
-0.65
orem
-0.64
offic
-0.64
POSITIVE LOGITS
crow
1.86
mong
1.49
warts
0.94
tactics
0.84
cock
0.81
escape
0.78
ware
0.74
vana
0.74
fu
0.74
tactic
0.74
Activations Density 0.011%