INDEX
Explanations
The neuron fires on title‐ or heading‐style words—i.e. capitalized or all‐caps tokens marking section headings, product names, or other headline‐like labels.
New Auto-Interp
Negative Logits
DD
-0.07
starving
-0.07
Fe
-0.07
immature
-0.06
inventor
-0.06
launching
-0.06
GER
-0.06
DEN
-0.06
ิกายน
-0.06
ku
-0.06
POSITIVE LOGITS
Institution
0.07
inspect
0.07
Meteor
0.06
unar
0.06
Monster
0.06
STS
0.06
HG
0.06
Sem
0.06
usize
0.06
ську
0.06
Activations Density 0.055%