INDEX
Explanations
phrases related to personal opinions or reflections
expressions related to personal thoughts and opinions
New Auto-Interp
Negative Logits
ALL
-0.80
Mamm
-0.70
ARDS
-0.69
Naz
-0.63
Claim
-0.60
horm
-0.60
LOT
-0.58
hetti
-0.57
Breach
-0.57
gm
-0.57
POSITIVE LOGITS
fulness
0.88
thoughts
0.88
aloud
0.85
ileaks
0.82
Thoughts
0.82
cience
0.81
umar
0.77
cient
0.75
eteen
0.75
cape
0.73
Activations Density 0.012%