INDEX
Explanations
The neuron is triggered by positive descriptive adjectives and qualifiers (e.g. “just,” “pretty,” “good”) that add praise or approval.
New Auto-Interp
Negative Logits
励
-0.07
alah
-0.07
.lr
-0.06
-brand
-0.06
org
-0.06
.act
-0.06
"},
-0.06
Text
-0.06
_lat
-0.06
SWT
-0.06
POSITIVE LOGITS
_PAGES
0.07
ÅŸ
0.07
esl
0.06
perman
0.06
어나
0.06
ऐ
0.06
억
0.06
Inset
0.06
ール
0.06
křes
0.06
Activations Density 0.001%