INDEX
Explanations
words related to personal beliefs, actions, and interactions
phrases expressing personal insecurity and interpersonal dynamics
New Auto-Interp
Negative Logits
leck
-0.67
helps
-0.60
bur
-0.59
ONSORED
-0.59
geist
-0.59
herry
-0.59
eele
-0.59
ullah
-0.58
sidx
-0.58
itual
-0.58
POSITIVE LOGITS
myself
1.23
thee
0.94
THEM
0.84
yours
0.83
them
0.81
ourselves
0.78
ya
0.77
YOU
0.76
you
0.75
my
0.74
Activations Density 1.045%