INDEX
Explanations
elements related to personal identity and ownership
New Auto-Interp
Negative Logits
ouve
-0.16
esting
-0.14
kontakte
-0.14
-0.14
ubs
-0.14
lements
-0.14
าà¸ĵ
-0.14
llu
-0.14
оÑģоб
-0.14
utral
-0.13
POSITIVE LOGITS
names
0.79
Names
0.66
names
0.63
e
0.61
-names
0.60
Names
0.58
NAMES
0.55
_names
0.52
.names
0.48
(names
0.46
Activations Density 0.057%