INDEX
Explanations
Negation
The neuron detects expressions of apology or uncertainty (e.g. “I’m sorry,” “I am not familiar”).
New Auto-Interp
Negative Logits
demokrat
-0.06
mascul
-0.06
undergo
-0.06
审
-0.06
humility
-0.06
cortical
-0.06
ивать
-0.06
visceral
-0.06
název
-0.06
_inventory
-0.06
POSITIVE LOGITS
window
0.07
element
0.06
جر
0.06
005
0.06
hora
0.06
torment
0.06
Donate
0.06
|:
0.06
Flexible
0.06
<t
0.06
Activations Density 0.058%