INDEX
Explanations
expressions of comparison and contrasting circumstances
New Auto-Interp
Negative Logits
het
-0.16
uster
-0.15
fatt
-0.15
agh
-0.15
Cov
-0.15
617
-0.15
soever
-0.14
herits
-0.14
ab
-0.14
institution
-0.14
POSITIVE LOGITS
otel
0.18
Worse
0.17
enia
0.16
å±±å¸Ĥ
0.15
worse
0.15
alore
0.14
Bits
0.14
å®¶ä¼Ļ
0.14
aeda
0.14
cea
0.14
Activations Density 0.172%