INDEX
Explanations
expressions or phrases related to language comprehension and communication.
The neuron strongly activates on occurrences of the modal verb “can” (and variants expressing ability), i.e. statements of what the assistant is able to do.
New Auto-Interp
Negative Logits
advantages
-0.07
accommodations
-0.07
analyze
-0.07
observer
-0.07
vyu
-0.07
belongings
-0.06
Nach
-0.06
Bo
-0.06
perator
-0.06
izace
-0.06
POSITIVE LOGITS
*****↵↵
0.06
][-
0.06
浩
0.06
Colum
0.06
могу
0.06
_MAXIMUM
0.06
Güney
0.06
gồm
0.06
으면
0.06
yoktu
0.06
Activations Density 0.039%