INDEX
Explanations
negative expressions or dismissals
New Auto-Interp
Negative Logits
waf
-0.59
ticles
-0.58
ArgumentParser
-0.54
OGND
-0.53
lomb
-0.52
Bender
-0.50
هما
-0.50
Népesség
-0.49
charbon
-0.48
Jacinto
-0.48
POSITIVE LOGITS
not
1.19
not
0.97
NOT
0.95
Not
0.94
NOT
0.91
不
0.90
Not
0.89
не
0.86
không
0.79
ไม่
0.79
Activations Density 0.151%