INDEX
Explanations
This neuron fires on spans of direct speech or quoted statements (i.e. dialogue/“said”‐style quotes).
New Auto-Interp
Negative Logits
Αν
-0.06
hlavně
-0.06
.Localization
-0.06
sociale
-0.06
vad
-0.06
επίσης
-0.06
Quarter
-0.06
.Copy
-0.06
minions
-0.06
-services
-0.06
POSITIVE LOGITS
outcome
0.07
POSIT
0.07
Parm
0.07
ें↵
0.07
SENT
0.06
getP
0.06
하는
0.06
apply
0.06
league
0.06
tous
0.06
Activations Density 0.034%