INDEX
Explanations
words related to expressing personal opinions and thoughts
phrases indicating honesty or directness in communication
New Auto-Interp
Negative Logits
backdrop
-0.75
retreating
-0.73
ransom
-0.72
unden
-0.69
dissolve
-0.63
enshr
-0.63
coming
-0.58
§
-0.57
strand
-0.56
arise
-0.56
POSITIVE LOGITS
NEVER
0.88
ROCK
0.75
amazed
0.70
ain
0.70
THANK
0.69
LOVE
0.68
ZI
0.65
impressed
0.64
neither
0.64
nobody
0.64
Activations Density 0.236%