INDEX
    Explanations

    mentions of people's names

    New Auto-Interp
    Negative Logits
    uristic
    -0.66
    afety
    -0.63
    teness
    -0.59
    ocaust
    -0.58
    ":"/
    -0.58
    humane
    -0.57
    bara
    -0.57
    ventory
    -0.57
    rina
    -0.57
    =/
    -0.57
    POSITIVE LOGITS
     respectively
    2.27
     jointly
    1.45
     alike
    1.44
     together
    1.38
     respective
    1.32
     combined
    1.29
     mutually
    1.25
    together
    1.19
     Together
    1.17
     separately
    1.14
    Act Density 6.197%

    No Known Activations