INDEX
    Explanations

    terms related to prisoners and their situations

    New Auto-Interp
    Negative Logits
    afd
    -0.15
     Lâm
    -0.14
     neighbourhood
    -0.14
    ndef
    -0.14
     Attack
    -0.13
     Ludwig
    -0.13
    æ¹
    -0.13
    olis
    -0.13
    ган
    -0.13
    slaught
    -0.13
    POSITIVE LOGITS
     detained
    0.31
     detainees
    0.30
     detain
    0.29
     prisoner
    0.28
     detention
    0.27
     imprisoned
    0.27
     Britt
    0.26
     diplomat
    0.26
     diplomatic
    0.26
     diplomats
    0.24
    Act Density 0.013%

    No Known Activations