INDEX
Explanations
the word "else" in various contexts, suggesting a focus on alternatives or comparisons
New Auto-Interp
Negative Logits
other
-0.29
otherwise
-0.23
autre
-0.22
others
-0.21
other
-0.20
Other
-0.19
autres
-0.19
itself
-0.18
otherwise
-0.18
Other
-0.18
POSITIVE LOGITS
world
0.20
besides
0.19
-than
0.19
niż
0.18
inois
0.17
WISE
0.17
vier
0.17
bes
0.16
_than
0.16
words
0.16
Activations Density 0.017%