INDEX
Explanations
phrases implying comparison or alternatives
New Auto-Interp
Negative Logits
other
-0.26
autre
-0.20
others
-0.20
otherwise
-0.20
Other
-0.19
OTHER
-0.18
ãģĿãģ®ä»ĸ
-0.18
altri
-0.17
autres
-0.17
other
-0.17
POSITIVE LOGITS
besides
0.22
-than
0.20
vier
0.19
bes
0.19
world
0.18
ewise
0.17
niż
0.17
WISE
0.17
/new
0.17
_than
0.16
Activations Density 0.015%