INDEX
Explanations
phrases indicating established criteria or benchmarks
New Auto-Interp
Negative Logits
Standard
-0.18
Standards
-0.18
Standard
-0.17
_std
-0.16
standard
-0.16
standard
-0.16
ستاÙĨ
-0.16
std
-0.16
onder
-0.16
orman
-0.16
POSITIVE LOGITS
ised
0.51
ization
0.48
ize
0.42
-issue
0.40
izing
0.39
isation
0.38
-setting
0.38
ized
0.37
deviation
0.35
izes
0.35
Activations Density 0.034%