INDEX
Explanations
Permissions and rights
The neuron is detecting words that express ability, permission, or desire—that is, modal and volitional verbs.
New Auto-Interp
Negative Logits
lements
-0.07
izzie
-0.07
developer
-0.06
icap
-0.06
iseum
-0.06
root
-0.06
ENTER
-0.06
Suicide
-0.06
iger
-0.06
िट
-0.06
POSITIVE LOGITS
issues
0.07
름
0.07
neden
0.06
_FWD
0.06
班
0.06
Filed
0.06
歲
0.06
Filed
0.06
participants
0.06
……。
0.06
Activations Density 0.045%