INDEX
Explanations
This neuron detects reassurance language indicating that something is normal or common (e.g., words like “normal,” “common,” and similar).
New Auto-Interp
Negative Logits
.navigate
-0.07
callback
-0.06
Арх
-0.06
분야
-0.06
.loading
-0.06
injustice
-0.06
Masters
-0.06
слов
-0.06
кого
-0.06
Basically
-0.06
POSITIVE LOGITS
¿
0.08
itized
0.07
/L
0.07
/R
0.06
인지
0.06
settling
0.06
jezd
0.06
theoretical
0.06
Arist
0.06
fd
0.06
Activations Density 0.010%