INDEX
Explanations
phrases indicating exclusion or differentiation from others
New Auto-Interp
Negative Logits
RunWith
-0.48
toalha
-0.40
Kearns
-0.40
with
-0.40
PNC
-0.39
一出
-0.39
Wilkinson
-0.38
by
-0.36
a
-0.35
MMP
-0.35
POSITIVE LOGITS
else
1.48
Else
1.45
else
1.41
ELSE
1.36
Else
1.31
ELSE
1.09
Elsewhere
0.88
others
0.86
elsewhere
0.84
Elsewhere
0.83
Activations Density 0.013%