INDEX
Negative Logits
attachments
0.45
pleads
0.45
vows
0.43
rotting
0.43
vomit
0.41
obscene
0.41
attachment
0.40
retails
0.40
مرف
0.40
rotten
0.39
POSITIVE LOGITS
accusing
0.68
考验
0.60
accuse
0.60
给我
0.57
என்னை
0.55
給我
0.54
আমাকে
0.54
interacting
0.53
把我
0.53
treating
0.52
Activations Density 0.058%