INDEX
Explanations
praise or positive reactions in casual conversations
expressions that convey a sense of comparison or similarity
New Auto-Interp
Negative Logits
ourn
-0.83
irements
-0.76
Published
-0.75
Returns
-0.71
ependence
-0.70
icators
-0.69
ourse
-0.68
isition
-0.67
acia
-0.67
ribution
-0.66
POSITIVE LOGITS
liest
1.14
lihood
1.05
lier
0.93
wow
0.92
oh
0.81
crazy
0.81
hhh
0.77
ooo
0.75
idiots
0.74
crap
0.74
Activations Density 0.059%