INDEX
Explanations
words indicating alternatives or comparisons
New Auto-Interp
Negative Logits
кога
-0.61
للاسماء
-0.59
Grá
-0.58
Clik
-0.57
böz
-0.56
fois
-0.56
ovunque
-0.54
Rien
-0.53
๔
-0.53
continually
-0.52
POSITIVE LOGITS
other
1.04
autre
0.79
other
0.77
otra
0.75
másik
0.75
autres
0.75
다른
0.73
OTHER
0.72
Other
0.70
OTHER
0.69
Activations Density 0.162%