INDEX
Explanations
attributes
This neuron activates on demonstrative references and generic placeholders, especially words like “These” (e.g. “These things,” “These factors”) that point to previously mentioned items.
New Auto-Interp
Negative Logits
άρχ
-0.06
radix
-0.06
light
-0.06
hacked
-0.06
memes
-0.06
entreprise
-0.06
breve
-0.06
Cars
-0.06
十一
-0.06
foods
-0.06
POSITIVE LOGITS
pal
0.07
refer
0.06
_IC
0.06
】↵
0.06
ummy
0.06
’s
0.06
coach
0.06
odigo
0.06
συ
0.06
па
0.06
Activations Density 0.129%