INDEX
Explanations
The neuron activates specifically on the word “enhance.”
New Auto-Interp
Negative Logits
141
-0.08
traveled
-0.07
물
-0.07
Problem
-0.07
looping
-0.07
Sorted
-0.06
lives
-0.06
297
-0.06
_roll
-0.06
logically
-0.06
POSITIVE LOGITS
enhance
0.15
enhancing
0.13
enhanced
0.12
enhances
0.12
Enh
0.11
enhancement
0.10
Enhanced
0.10
Enh
0.10
-enh
0.09
Enhancement
0.09
Activations Density 0.016%