INDEX
Explanations
The neuron activates on occurrences of “API” (and related API mentions) in the text.
New Auto-Interp
Negative Logits
แห
-0.07
.include
-0.07
enclosed
-0.07
Grape
-0.06
erchant
-0.06
mercenaries
-0.06
allon
-0.06
헤
-0.06
engr
-0.06
Harding
-0.06
POSITIVE LOGITS
<>↵
0.07
0.07
افه
0.06
for
0.06
zusammen
0.06
JOHN
0.06
الأد
0.06
commenting
0.06
expecting
0.06
_API
0.06
Activations Density 0.006%