INDEX
Explanations
The neuron activates on verbs and phrases that signal legal or regulatory obligations (e.g. “required to,” “requires,” “are required”).
New Auto-Interp
Negative Logits
_cred
-0.07
tainted
-0.07
_inicio
-0.06
โลย
-0.06
gravy
-0.06
rigged
-0.06
емого
-0.06
ал
-0.06
declining
-0.06
लब
-0.06
POSITIVE LOGITS
WindowSize
0.07
先
0.06
varlık
0.06
SAF
0.06
genitals
0.06
ORIGINAL
0.06
앞
0.06
!’
0.06
[__
0.06
maxY
0.06
Activations Density 0.024%