INDEX
Explanations
personal names, possibly associated with a position or title
mentions of people and their roles or contributions
New Auto-Interp
Negative Logits
deport
-0.77
punishments
-0.72
deportation
-0.71
detainees
-0.71
politics
-0.70
symbolic
-0.70
executions
-0.70
judgments
-0.70
retaliation
-0.69
killings
-0.68
POSITIVE LOGITS
Shap
0.87
PhD
0.82
QC
0.79
Found
0.76
Minion
0.75
Architects
0.73
Argon
0.71
âķIJ
0.70
booth
0.70
LL
0.70
Activations Density 0.797%