INDEX
Explanations
Analysis/Observation
This neuron fires on analytical framing words in hypothetical or conditional expressions (e.g. “we look,” “one takes,” “we consider”) that introduce examples or evidence.
New Auto-Interp
Negative Logits
твор
-0.08
yaşında
-0.07
866
-0.06
ским
-0.06
Dick
-0.06
מ
-0.06
าคา
-0.06
kova
-0.06
rette
-0.06
smart
-0.06
POSITIVE LOGITS
regarded
0.07
(se
0.06
doğrult
0.06
/connect
0.06
tr
0.06
="-
0.06
image
0.06
i
0.06
illustr
0.06
>v
0.06
Activations Density 0.032%