INDEX
Explanations
mentions of "Family" and related terms
New Auto-Interp
Negative Logits
undi
-0.18
moz
-0.17
nem
-0.16
aries
-0.15
å®¶æĹı
-0.14
eton
-0.14
erotische
-0.14
夫人
-0.14
lush
-0.14
á»§
-0.14
POSITIVE LOGITS
friendly
0.24
Friendly
0.23
friendly
0.22
Owned
0.21
-friendly
0.21
Friendly
0.20
owned
0.20
oriented
0.18
-owned
0.17
owned
0.17
Activations Density 0.021%