INDEX
Explanations
phrases expressing contrasts or conditions
New Auto-Interp
Negative Logits
atern
-0.15
ppo
-0.14
ksam
-0.14
kvin
-0.14
ypad
-0.14
additional
-0.14
ieux
-0.14
azal
-0.13
escorte
-0.13
.additional
-0.13
POSITIVE LOGITS
nor
0.20
plenty
0.19
nonetheless
0.18
Nevertheless
0.18
nevertheless
0.18
nor
0.16
åį´
0.15
enek
0.15
Plenty
0.15
sÃŃ
0.15
Activations Density 0.197%