INDEX
Explanations
This neuron detects language about breaking free from norms or constraints, especially phrases expressing liberation from “typical confines.”
New Auto-Interp
Negative Logits
categoryId
-0.07
Hastings
-0.07
γε
-0.06
_detalle
-0.06
bày
-0.06
=""↵
-0.06
.me
-0.06
rods
-0.06
مدير
-0.06
dikkate
-0.06
POSITIVE LOGITS
plot
0.07
OTOR
0.07
시간
0.07
λά
0.07
tape
0.06
》
0.06
ERA
0.06
plots
0.06
voting
0.06
Viewer
0.06
Activations Density 0.002%