INDEX
Explanations
This neuron activates on the English greeting “Hello.”
New Auto-Interp
Negative Logits
infrastructure
-0.07
actor
-0.07
yahoo
-0.06
esper
-0.06
пти
-0.06
ROKE
-0.06
-An
-0.06
vf
-0.06
champions
-0.06
engulf
-0.06
POSITIVE LOGITS
.");↵
0.07
Pose
0.07
vodka
0.06
scene
0.06
Mature
0.06
Screens
0.06
COOKIE
0.06
namespaces
0.06
體
0.06
Xen
0.06
Activations Density 0.015%