INDEX
Explanations
This neuron fires on prominent content nouns that label key entities or topics (e.g. “method,” “machine,” “City,” “College,” “dance”).
New Auto-Interp
Negative Logits
-0.07
declared
-0.07
누
-0.06
certifications
-0.06
praying
-0.06
------------------------------------------------------------------------↵
-0.06
andWhere
-0.06
Й
-0.06
ARRY
-0.06
CLUSIVE
-0.06
POSITIVE LOGITS
.slide
0.07
chlorine
0.07
ponsors
0.06
Leave
0.06
-Pack
0.06
vent
0.06
czas
0.06
-dimensional
0.06
حث
0.06
civilization
0.06
Activations Density 0.214%