INDEX
Explanations
the main thing this neuron does is detect informal, enthusiastic first-person social-media style language (e.g. “I’m excited,” “love using,” “my thoughts,” exclamation and conversational tone).
New Auto-Interp
Negative Logits
zp
-0.07
镜
-0.07
unt
-0.07
karma
-0.07
forcing
-0.07
MF
-0.06
_port
-0.06
Or
-0.06
leth
-0.06
.orig
-0.06
POSITIVE LOGITS
tạm
0.07
()↵
0.07
'__
0.07
°}
0.07
"\
0.07
']) ↵
0.06
etmeye
0.06
.addHandler
0.06
reprint
0.06
mockMvc
0.06
Activations Density 0.066%