INDEX
Explanations
abuse, exploitation, harmful acts
New Auto-Interp
Negative Logits
dimiliki
0.39
overrated
0.38
рк
0.37
प्रोसेसिंग
0.36
वाद
0.36
परंतु
0.36
Approach
0.35
ők
0.35
пациента
0.35
skiprows
0.35
POSITIVE LOGITS
perpetrated
0.88
rampant
0.67
tactics
0.59
tendencies
0.57
envers
0.56
perpetrators
0.55
মূলক
0.54
accusations
0.53
perpetuated
0.53
disguised
0.51
Activations Density 0.131%