INDEX
Explanations
computer layers
This neuron never activates—it does not respond to any token.
New Auto-Interp
Negative Logits
urse
-0.07
ylko
-0.07
uação
-0.07
urses
-0.06
afi
-0.06
degrade
-0.06
уст
-0.06
τά
-0.06
ruins
-0.06
曜日
-0.06
POSITIVE LOGITS
Sections
0.06
ση
0.06
Όμιλος
0.06
speeding
0.06
竞
0.06
_US
0.06
PositiveButton
0.06
참여
0.06
弟
0.06
ceasefire
0.05
Activations Density 0.021%