INDEX
Explanations
The neuron selectively activates on the word “cause” (including its inflected form “causes”).
New Auto-Interp
Negative Logits
stepping
-0.07
<l
-0.07
Robert
-0.07
screenshots
-0.07
reimb
-0.07
popping
-0.07
Elliott
-0.06
membranes
-0.06
legt
-0.06
lazy
-0.06
POSITIVE LOGITS
cause
0.11
causes
0.09
Cause
0.08
cause
0.08
Aw
0.08
Cause
0.08
_CUR
0.07
-half
0.07
callable
0.07
_CUSTOMER
0.07
Activations Density 0.016%