INDEX
Explanations
phrases related to war crimes
references to war crimes and human rights violations
New Auto-Interp
Negative Logits
BIT
-0.79
ŃĶ
-0.78
ãĤ¨ãĥ«
-0.73
ãĥ¥
-0.73
rez
-0.70
upper
-0.70
ivas
-0.69
onom
-0.68
ãĥĥ
-0.68
Nav
-0.68
POSITIVE LOGITS
bikini
0.69
ioxide
0.69
imitation
0.67
analogy
0.62
Tuls
0.60
extraord
0.60
pamph
0.59
TED
0.58
billionaires
0.58
equivalents
0.58
Activations Density 0.320%