INDEX
Explanations
This neuron seems to activate on somewhat random words and phrases, perhaps short function words or verb phrases, and the content doesn't appear to create a coherent meaning.
New Auto-Interp
Negative Logits
mó
-0.07
moid
-0.06
slightly
-0.06
ãĤįãģĨ
-0.06
åħį
-0.06
invalid
-0.06
вполне
-0.06
irim
-0.06
Invalid
-0.06
æľī人
-0.06
POSITIVE LOGITS
limited
0.17
lack
0.17
absence
0.15
lacking
0.15
limited
0.14
lacks
0.14
lack
0.13
minimal
0.13
lacked
0.13
Lack
0.12
Activations Density 0.054%