INDEX
Explanations
personal feelings or desires expressed in the first person
phrases expressing personal desires or relationships
New Auto-Interp
Negative Logits
jan
-0.68
darts
-0.63
sqor
-0.62
theless
-0.61
Roses
-0.59
heels
-0.59
Hero
-0.59
anytime
-0.58
jas
-0.58
Classes
-0.58
POSITIVE LOGITS
oyer
0.77
idding
0.67
otos
0.67
erous
0.64
ynski
0.64
equation
0.64
guiActiveUnfocused
0.61
xual
0.61
Older
0.59
selage
0.58
Activations Density 0.231%