INDEX
Explanations
adjectives and nouns related to personal attributes and characteristics
expressions of emotional reactions and psychological states
New Auto-Interp
Negative Logits
)]
-0.74
cedented
-0.63
aughs
-0.60
cart
-0.58
TPPStreamerBot
-0.56
CVE
-0.55
EStream
-0.55
olen
-0.54
disag
-0.54
audi
-0.54
POSITIVE LOGITS
your
1.82
your
1.78
you
1.72
Your
1.63
YOUR
1.62
Your
1.59
yourself
1.58
you
1.55
YOU
1.48
You
1.45
Activations Density 0.791%