INDEX
Explanations
terms related to comparisons and contrasts
New Auto-Interp
Negative Logits
ahl
-0.17
Verd
-0.15
ade
-0.15
codegen
-0.14
equally
-0.14
ç¶
-0.14
ari
-0.13
ily
-0.13
Slav
-0.13
Ment
-0.13
POSITIVE LOGITS
unlike
0.82
Unlike
0.60
Unlike
0.60
like
0.48
compared
0.43
Like
0.38
whereas
0.37
rather
0.36
Whereas
0.35
rather
0.34
Activations Density 0.001%