INDEX
Explanations
The neuron selectively activates on occurrences of the verb “take.”
New Auto-Interp
Negative Logits
will
-0.10
would
-0.09
must
-0.09
could
-0.08
doit
-0.08
WILL
-0.08
will
-0.08
might
-0.07
could
-0.07
may
-0.07
POSITIVE LOGITS
sob
0.07
DSP
0.07
,一
0.07
algum
0.07
bíl
0.06
TOM
0.06
|
0.06
cstring
0.06
-bl
0.06
แรง
0.06
Activations Density 0.222%