INDEX
Explanations
opposite
This neuron detects references to hormone-based gender transition toward the opposite sex.
New Auto-Interp
Negative Logits
ávají
-0.07
Acer
-0.07
hek
-0.07
fue
-0.07
Poh
-0.06
Benton
-0.06
Pew
-0.06
Natur
-0.06
олаг
-0.06
stř
-0.06
POSITIVE LOGITS
Aqu
0.06
urb
0.06
">'.
0.06
='')
0.06
MRI
0.06
06
0.06
Πολι
0.06
isor
0.05
(ws
0.05
pleasures
0.05
Activations Density 0.268%