INDEX
Explanations
This neuron activates on occurrences of the word “joint” (or “joints”).
New Auto-Interp
Negative Logits
.bitmap
-0.07
">$
-0.06
Burke
-0.06
ucle
-0.06
English
-0.06
Netflix
-0.06
Bitmap
-0.06
Buttons
-0.06
sizes
-0.06
Walk
-0.06
POSITIVE LOGITS
joints
0.12
joint
0.12
Joint
0.11
joint
0.10
_joint
0.10
аст
0.08
合
0.07
Joint
0.07
oje
0.07
JOHN
0.07
Activations Density 0.004%