INDEX
Explanations
American
This neuron activates on occurrences of the nationality label “American” in the article’s category listings.
New Auto-Interp
Negative Logits
Abe
-0.07
PARTIC
-0.07
valid
-0.06
elucid
-0.06
erf
-0.06
tan
-0.06
With
-0.06
Trump
-0.06
Patri
-0.06
Lal
-0.06
POSITIVE LOGITS
tık
0.07
именно
0.07
знач
0.07
行動
0.06
Александ
0.06
肥
0.06
kiego
0.06
assertSame
0.06
子は
0.06
delight
0.06
Activations Density 0.004%