INDEX
Explanations
The neuron activates on modal auxiliary verbs (especially “can” and “could”) that express ability or possibility.
New Auto-Interp
Negative Logits
.false
-0.08
POST
-0.08
omidou
-0.07
ヨ
-0.07
毕
-0.07
_First
-0.07
RTE
-0.07
_Post
-0.07
Nicht
-0.06
Trou
-0.06
POSITIVE LOGITS
can
0.24
can
0.16
Can
0.16
could
0.15
Can
0.14
CAN
0.14
couldn
0.13
-can
0.13
could
0.13
CAN
0.12
Activations Density 0.389%