INDEX
Explanations
the neuron lights up on descriptive content-scope words—nouns and adjectives that outline features or instructions (e.g. “survival,” “encounters,” “detailed,” “nuanced,” “content”).
New Auto-Interp
Negative Logits
ंतर
-0.06
пять
-0.06
ativ
-0.06
Orwell
-0.06
uvwxyz
-0.06
specifier
-0.06
opis
-0.06
_lc
-0.06
pars
-0.06
روست
-0.06
POSITIVE LOGITS
バス
0.07
ing
0.06
])->
0.06
.VisualStudio
0.06
',↵↵
0.06
Yun
0.06
призначення
0.06
Repos
0.06
เขา
0.06
tion
0.06
Activations Density 0.041%