INDEX
Explanations
phrases related to possessions and ownership
references to music albums or songs
New Auto-Interp
Negative Logits
)]
-0.73
aughs
-0.65
disag
-0.64
atri
-0.58
odan
-0.57
uously
-0.57
CRC
-0.55
license
-0.54
uay
-0.53
witz
-0.51
POSITIVE LOGITS
your
1.81
Your
1.77
YOUR
1.74
Your
1.73
your
1.69
yours
1.48
yourself
1.43
yourselves
1.40
YOU
1.36
you
1.34
Activations Density 0.870%