INDEX
Explanations
phrases that express limitations or impossibilities
New Auto-Interp
Negative Logits
ⓧ
-1.02
ніципалі
-0.61
Infór
-0.59
الرياضيه
-0.59
$_['
-0.59
فريبيس
-0.58
harapkan
-0.58
berdayakan
-0.57
antidesliz
-0.57
-0.56
POSITIVE LOGITS
other
0.53
never
0.39
unimaginable
0.39
nt
0.36
NT
0.36
Mac
0.35
其他人
0.34
Sab
0.33
Impossible
0.33
其他
0.33
Activations Density 0.342%