INDEX
Explanations
references to personal experiences or feelings related to "me."
New Auto-Interp
Negative Logits
lain
-0.17
rous
-0.16
HING
-0.16
ashtra
-0.15
oard
-0.15
rente
-0.14
net
-0.14
ayd
-0.14
ستÙħ
-0.14
maybe
-0.14
POSITIVE LOGITS
/us
0.19
inerary
0.16
itable
0.16
zelf
0.15
SELF
0.15
andering
0.15
-même
0.15
elf
0.15
zzo
0.14
athed
0.14
Activations Density 0.071%