INDEX
Explanations
The neuron is highly sensitive to the word “Escape,” especially when it appears as a standalone title or heading.
New Auto-Interp
Negative Logits
Id
-0.08
shows
-0.07
commodity
-0.06
Willow
-0.06
Bid
-0.06
show
-0.06
Hill
-0.06
bins
-0.06
subdir
-0.06
087
-0.06
POSITIVE LOGITS
escape
0.14
Escape
0.12
Escape
0.10
escaped
0.10
escapes
0.09
escape
0.09
CAPE
0.09
逃
0.08
escap
0.08
.escape
0.08
Activations Density 0.009%