INDEX
Explanations
The neuron primarily activates on the word “simple” (as in “I have a simple …”) that appears when the user is describing their scenario.
New Auto-Interp
Negative Logits
วก
-0.07
Orchard
-0.06
)↵↵
-0.06
ewan
-0.06
ustain
-0.06
ultan
-0.06
()});↵
-0.06
pageInfo
-0.06
rack
-0.06
극
-0.06
POSITIVE LOGITS
simple
0.07
fatalities
0.07
özel
0.07
.scal
0.07
boto
0.07
cerca
0.07
controlled
0.06
_comment
0.06
toLowerCase
0.06
เฮ
0.06
Activations Density 0.026%