INDEX
Explanations
The neuron detects words expressing calls to support or urging action (e.g., Spanish imperatives like “apoye”).
New Auto-Interp
Negative Logits
elden
-0.06
risk
-0.06
Warranty
-0.06
Trad
-0.06
Perf
-0.06
_loading
-0.06
etk
-0.06
reinterpret
-0.06
Funktion
-0.06
berk
-0.06
POSITIVE LOGITS
support
0.10
SUPPORT
0.09
doubly
0.08
_UINT
0.08
Support
0.08
staunch
0.07
opposes
0.07
좀
0.07
援
0.07
ุม
0.07
Activations Density 0.030%