INDEX
Explanations
phrases indicating refusal or resistance to comply with requests or commands
New Auto-Interp
Negative Logits
.
-1.17
”.
-0.75
".
-0.69
.”
-0.63
).
-0.63
:
-0.63
’.
-0.63
'.
-0.60
–
-0.58
!
-0.56
POSITIVE LOGITS
الرياضيه
1.27
Мексичка
1.21
kasarigan
1.18
WriteBarrier
1.11
كومونز
1.07
__':
1.06
^(@)
1.05
)*/
1.04
StoryboardSegue
1.01
Efq
1.00
Activations Density 0.429%