INDEX
Explanations
phrases or words related to positive experiences or emotions
the word "Good" in various contexts
New Auto-Interp
Negative Logits
âĹ¼
-0.93
ptin
-0.77
oths
-0.73
ople
-0.73
pent
-0.72
EStream
-0.70
concoct
-0.69
RAW
-0.68
CHAT
-0.67
udic
-0.67
POSITIVE LOGITS
bye
1.20
enough
1.18
Enough
1.03
reads
1.02
bye
0.93
Luck
0.93
ness
0.91
sword
0.91
Smile
0.90
Samar
0.88
Activations Density 0.028%