INDEX
Explanations
words related to interpersonal relationships and communication
New Auto-Interp
Negative Logits
Cheng
-0.20
Äijâu
-0.16
rie
-0.15
innacle
-0.15
ighted
-0.14
RC
-0.14
lie
-0.14
grim
-0.14
lesc
-0.14
_IE
-0.14
POSITIVE LOGITS
itzer
0.18
uela
0.17
bat
0.17
bat
0.16
_frac
0.16
çķ
0.16
Bat
0.15
Bat
0.15
pro
0.14
smr
0.14
Activations Density 0.025%