INDEX
Explanations
This neuron responds to positive evaluative words that express praise or favorable opinion (e.g. “great,” “good,” “nice,” “alright”).
New Auto-Interp
Negative Logits
bees
-0.08
brib
-0.07
fatty
-0.07
tn
-0.07
AUTHORS
-0.06
suppliers
-0.06
ヴ
-0.06
doma
-0.06
qualidade
-0.06
smallest
-0.06
POSITIVE LOGITS
_SOURCE
0.07
ційної
0.07
_POLL
0.06
ongan
0.06
omain
0.06
\',
0.06
embros
0.06
0.06
Sınıf
0.06
ArgsConstructor
0.06
Activations Density 0.070%