INDEX
Explanations
aggressive and confrontational language
angry, aggressive dialogue with profanity and hostile commands directed at someone.
New Auto-Interp
Negative Logits
فريبيس
-0.71
الرياضيه
-0.66
autorytatywna
-0.57
kasarigan
-0.54
AsUp
-0.52
تانيه
-0.50
probable
-0.49
Probable
-0.49
怎麼辦
-0.48
Referències
-0.48
POSITIVE LOGITS
dare
0.52
insol
0.47
fucking
0.43
Shut
0.42
arrogant
0.40
shut
0.39
impert
0.39
哼
0.38
囂
0.38
dared
0.38
Activations Density 0.191%