INDEX
Explanations
phrases that indicate comparison or contrast in evaluations
New Auto-Interp
Negative Logits
on
-0.25
عÙĦÙī
-0.17
trên
-0.16
äge
-0.16
на
-0.16
onaut
-0.15
auf
-0.15
lok
-0.15
erset
-0.15
à¸ļà¸Ļ
-0.15
POSITIVE LOGITS
behalf
0.49
occasions
0.34
basis
0.33
occasion
0.32
basis
0.29
occasion
0.24
_basis
0.23
grounds
0.23
Basis
0.22
dime
0.19
Activations Density 0.802%