INDEX
Explanations
The neuron detects the proper-name token sequence corresponding to "Morris" (i.e., the Mor+ris name tokens).
New Auto-Interp
Negative Logits
onboard
-0.07
vasion
-0.07
eceğini
-0.07
бух
-0.06
Под
-0.06
inhibition
-0.06
.png
-0.06
ielding
-0.06
援
-0.06
seizing
-0.06
POSITIVE LOGITS
Morris
0.15
Mor
0.12
Mor
0.11
Harris
0.10
mor
0.10
Ellis
0.09
Warren
0.08
Dor
0.08
oris
0.08
is
0.08
Activations Density 0.019%