INDEX
Explanations
phrases indicating denial or refusals
New Auto-Interp
Negative Logits
iasi
-0.15
ÌĨ
-0.15
ddy
-0.15
ذ
-0.14
:CGRect
-0.14
utherford
-0.14
ibbon
-0.13
andro
-0.13
isible
-0.13
CTest
-0.13
POSITIVE LOGITS
need
0.54
must
0.53
needs
0.50
å¿ħé¡»
0.48
need
0.47
gotta
0.47
must
0.46
phải
0.44
Must
0.44
needs
0.44
Activations Density 0.559%