INDEX
Explanations
phrases indicating contrast or continuation
phrases that indicate agreement or similarity in sentiments
New Auto-Interp
Negative Logits
]),
-0.81
arthed
-0.76
".[
-0.70
]).
-0.70
."[
-0.64
])
-0.63
).[
-0.62
è¦ļéĨĴ
-0.62
respectively
-0.61
âĨij
-0.60
POSITIVE LOGITS
elia
0.79
uh
0.73
unny
0.71
uin
0.70
kidding
0.70
romeda
0.69
obin
0.68
Courier
0.67
funn
0.67
yeah
0.66
Activations Density 0.310%