INDEX
Explanations
The neuron activates on occurrences of the word “reverse” (including “reversed”).
New Auto-Interp
Negative Logits
Lua
-0.06
'|'
-0.06
Jain
-0.06
_Height
-0.06
sticky
-0.06
зуст
-0.06
vacations
-0.06
Ship
-0.06
宋
-0.06
"What
-0.06
POSITIVE LOGITS
protester
0.08
devam
0.07
createState
0.06
ução
0.06
0.06
glog
0.06
مقابل
0.06
cepts
0.06
الناس
0.06
đoán
0.06
Activations Density 0.006%