INDEX
Explanations
defamation libel slander smearing
New Auto-Interp
Negative Logits
Optim
0.49
оптими
0.46
optim
0.45
索引
0.42
entusi
0.41
optim
0.41
আধ
0.41
༄
0.40
متجه
0.40
Optim
0.40
POSITIVE LOGITS
slander
1.76
defamatory
1.58
defamation
1.49
smear
1.47
诽
1.44
smears
1.38
defam
1.33
baseless
1.27
libel
1.26
誣
1.23
Activations Density 0.050%