INDEX
Explanations
Not forcing/pressuring someone
advice related to courting or romantic interactions.
This neuron detects language about coercing or pressuring someone into doing something.
New Auto-Interp
Negative Logits
-sensitive
-0.06
�
-0.06
IID
-0.06
يير
-0.06
ertino
-0.06
Denn
-0.06
ürk
-0.06
оступ
-0.05
sorunu
-0.05
_typeDefinition
-0.05
POSITIVE LOGITS
anzi
0.08
NR
0.07
.numberOf
0.07
wcs
0.07
cardinal
0.06
TRUE
0.06
�
0.06
_cols
0.06
OUTPUT
0.06
ATURE
0.06
Activations Density 0.015%