INDEX
    Explanations

    references to moral dilemmas and judgments about survival

    New Auto-Interp
    Negative Logits
    (/[
    -0.47
    -0.46
     Зачем
    -0.45
    edon
    -0.45
    articolo
    -0.45
    Читати
    -0.45
     pax
    -0.45
    Cama
    -0.45
     aad
    -0.45
     oligo
    -0.45
    POSITIVE LOGITS
    //
    0.68
    SharedCtor
    0.64
    يكب
    0.62
    InitVars
    0.61
    发表于
    0.61
    >{@
    0.60
     发表于
    0.59
    Alike
    0.59
    Климат
    0.59
     belangrij
    0.59
    Act Density 0.049%

    No Known Activations