INDEX
Explanations
The neuron fires on occurrences of the word “names” (as in “the names of…”).
New Auto-Interp
Negative Logits
ойно
-0.06
冬
-0.06
isNew
-0.06
Тур
-0.05
Iteration
-0.05
fetish
-0.05
antro
-0.05
InkWell
-0.05
Expose
-0.05
precondition
-0.05
POSITIVE LOGITS
$core
0.07
ANCELED
0.07
Certified
0.07
","
0.07
bombings
0.07
ิย
0.07
#[
0.07
fromDate
0.07
\",\
0.06
defe
0.06
Activations Density 0.001%