INDEX
Explanations
This neuron detects mentions of mental and physical health descriptors (e.g. “mental,” “mentally,” “physically”).
New Auto-Interp
Negative Logits
pop
-0.06
извест
-0.06
„
-0.06
},{-0.06
:"
-0.06
@"
-0.06
“All
-0.06
regularly
-0.05
pieces
-0.05
hammered
-0.05
POSITIVE LOGITS
umar
0.07
faithful
0.07
ruc
0.07
ADV
0.06
Ups
0.06
Smarty
0.06
�
0.06
νεφώσεις
0.06
дя
0.06
answered
0.06
Activations Density 0.016%