INDEX
Explanations
The neuron activates on the modal verb “can” (and its immediate context indicating ability or possibility).
New Auto-Interp
Negative Logits
'./../
-0.06
igate
-0.06
폰
-0.06
actory
-0.06
уже
-0.06
-power
-0.06
doorstep
-0.06
Sleeve
-0.06
Coach
-0.06
_resolve
-0.05
POSITIVE LOGITS
so
0.12
an
0.08
SO
0.07
슈
0.07
so
0.07
BK
0.07
remarked
0.07
�
0.07
ar
0.07
revital
0.06
Activations Density 0.073%