INDEX
Explanations
information related to conflict, human rights abuses, and mistreatment documented in reports
New Auto-Interp
Negative Logits
caster
-0.73
username
-0.72
çͰ
-0.70
linger
-0.67
Android
-0.66
attm
-0.66
ingred
-0.66
lly
-0.66
invoke
-0.66
(@
-0.65
POSITIVE LOGITS
accordance
1.32
captivity
1.18
lieu
1.13
spite
1.09
humane
1.06
relation
0.97
animate
0.97
shelters
0.97
situ
0.95
vitro
0.95
Activations Density 0.290%