INDEX
Explanations
phrases with contrasting language, often involving limitations or exceptions
New Auto-Interp
Negative Logits
erenn
-0.63
tnc
-0.59
edu
-0.58
代
-0.57
¢
-0.56
ADA
-0.55
oire
-0.55
pet
-0.55
oufl
-0.54
abre
-0.53
POSITIVE LOGITS
alas
1.08
nevertheless
0.94
secondly
0.93
nonetheless
0.91
unfortunately
0.90
luckily
0.90
beware
0.90
fortunately
0.88
tons
0.84
interestingly
0.82
Activations Density 0.178%