INDEX
Explanations
comparative phrases that highlight differences between groups or categories
New Auto-Interp
Negative Logits
earnest
-0.15
Beyond
-0.14
\:
-0.14
upo
-0.14
resco
-0.13
ëĭ¤ìĸijíķľ
-0.13
erc
-0.13
erah
-0.13
ãĤįãģĨ
-0.13
Ñģам
-0.13
POSITIVE LOGITS
counterparts
0.33
counterpart
0.29
comparable
0.29
other
0.24
equivalent
0.23
corresponding
0.23
compar
0.22
Comparable
0.20
elsewhere
0.20
mere
0.20
Activations Density 0.248%