INDEX
Explanations
negations and warnings against specific actions
New Auto-Interp
Negative Logits
allAfrica
-0.53
متحده
-0.52
otomatig
-0.49
matchCondition
-0.47
تانيه
-0.47
насељу
-0.47
tvguidetime
-0.46
Exactos
-0.46
IContainer
-0.45
MLLoader
-0.45
POSITIVE LOGITS
Jangan
0.73
不要
0.64
Jangan
0.63
đừng
0.63
jangan
0.60
Dont
0.59
DoNot
0.57
你不要
0.57
Dont
0.57
avoid
0.54
Activations Density 0.209%