INDEX
Explanations
Stating an opinion
This neuron detects hedge or disclaimer phrases—especially first-person “I’m not…” or “not suggesting…” style qualifiers.
New Auto-Interp
Negative Logits
Sir
-0.07
recycled
-0.07
.previous
-0.06
_requests
-0.06
Vish
-0.06
ebilir
-0.06
mh
-0.06
female
-0.06
Wells
-0.06
percentage
-0.06
POSITIVE LOGITS
�
0.07
bàn
0.06
.Route
0.06
_EXTRA
0.06
ノ
0.06
adjust
0.06
식
0.06
Snake
0.06
(Size
0.06
trainable
0.06
Activations Density 0.029%