INDEX
Explanations
The neuron is activated by occurrences of the literal string “dot” (in any case) in the text.
New Auto-Interp
Negative Logits
Rei
-0.07
Refugee
-0.07
wives
-0.07
Reese
-0.07
Ware
-0.07
Weiner
-0.07
Age
-0.07
Milwaukee
-0.07
Schwe
-0.06
Wilhelm
-0.06
POSITIVE LOGITS
dot
0.13
Dot
0.11
Dot
0.11
-dot
0.09
dot
0.09
_dot
0.09
ot
0.09
Ds
0.08
.dot
0.08
dots
0.08
Activations Density 0.008%