INDEX
Explanations
phrases related to being imprisoned or held captive
references to prisoners of war
New Auto-Interp
Negative Logits
Boll
-0.78
orp
-0.73
OPA
-0.71
Remastered
-0.71
orie
-0.68
lag
-0.67
Ples
-0.65
wig
-0.65
amera
-0.64
Blizz
-0.64
POSITIVE LOGITS
prisoners
1.01
prisoner
0.89
inmates
0.89
captives
0.89
sentenced
0.87
detainees
0.81
incarcerated
0.80
prison
0.78
jailed
0.76
inmate
0.76
Activations Density 0.032%