INDEX
Explanations
terms related to prisoners and their situations
New Auto-Interp
Negative Logits
afd
-0.15
Lâm
-0.14
neighbourhood
-0.14
ndef
-0.14
Attack
-0.13
Ludwig
-0.13
æ¹
-0.13
olis
-0.13
ган
-0.13
slaught
-0.13
POSITIVE LOGITS
detained
0.31
detainees
0.30
detain
0.29
prisoner
0.28
detention
0.27
imprisoned
0.27
Britt
0.26
diplomat
0.26
diplomatic
0.26
diplomats
0.24
Activations Density 0.013%