INDEX
Explanations
phrases that contrast different ideas
New Auto-Interp
Negative Logits
obyl
-0.71
\":
-0.70
reen
-0.69
Ñı
-0.68
esc
-0.67
ENG
-0.65
yn
-0.64
agan
-0.63
omore
-0.62
avour
-0.62
POSITIVE LOGITS
nevertheless
1.16
hey
1.06
nonetheless
1.04
alas
0.98
tons
0.90
fortunately
0.82
surely
0.80
suffice
0.79
luckily
0.75
damn
0.74
Activations Density 0.146%