INDEX
Explanations
negative qualifiers and phrases indicating exceptions or disagreements in a discussion
New Auto-Interp
Negative Logits
незавершена
-0.56
UnusedPrivate
-0.54
PerformLayout
-0.53
AndEndTag
-0.53
pleaſure
-0.50
فريبيس
-0.48
arşivlendi
-0.48
----</
-0.44
houſe
-0.43
beſt
-0.42
POSITIVE LOGITS
rather
0.81
而不是
0.79
而非
0.77
instead
0.70
rather
0.66
Rather
0.65
Rather
0.64
statt
0.63
plutôt
0.63
anstatt
0.61
Activations Density 0.432%