INDEX
Explanations
positive expressions or sentiments
expressions of approval or positivity towards a subject
New Auto-Interp
Negative Logits
pper
-0.77
ople
-0.77
eters
-0.77
iper
-0.76
hip
-0.75
Hop
-0.73
istan
-0.71
hod
-0.70
ratom
-0.70
ptin
-0.66
POSITIVE LOGITS
enough
1.30
enough
1.03
reads
0.96
luck
0.96
sword
0.92
bye
0.90
Enough
0.87
intentions
0.84
karma
0.84
Samar
0.83
Activations Density 0.051%