INDEX
Explanations
contrasts and concessions in arguments
New Auto-Interp
Negative Logits
zwar
-0.16
/generated
-0.15
instein
-0.14
ulong
-0.14
.simps
-0.14
gren
-0.14
ardy
-0.14
okt
-0.14
ạ
-0.14
sogar
-0.14
POSITIVE LOGITS
inton
0.18
ASP
0.15
IPS
0.14
sheer
0.14
ch
0.14
pure
0.14
ensuring
0.14
handful
0.14
ãĥ©ãĤ¯
0.13
standing
0.13
Activations Density 0.087%