INDEX
Explanations
expressions related to fear
expressions of fear
New Auto-Interp
Negative Logits
arb
-0.91
urgy
-0.84
available
-0.79
arbon
-0.76
authors
-0.75
properties
-0.74
arkable
-0.73
added
-0.71
sample
-0.70
options
-0.70
POSITIVE LOGITS
lessly
1.22
fear
1.02
mong
0.98
lessness
0.96
fears
0.96
ingly
0.94
lest
0.93
fully
0.92
afraid
0.88
fulness
0.87
Activations Density 0.014%